An alphabet for a letter-perfect protein Just 26 letters, linked together in a myriad of permutations, capture all the richness of the English language. Likewise, all proteins are formed from only 20 amino acids strung into long chains and folded upon themselves into functional, three-dimensional shapes. Now, researchers at the University of Washington in Seattle have found that a reduced alphabet of just five amino acids is enough to make much of a working protein. They successfully replaced the bulk of a small, biologically important protein with an equivalent structure made only of the amino acids isoleucine, lysine, glutamate, alanine, and glycine. Their findings appear in the October Nature Structural Biology. "It's a splendid paper," says William F. DeGrado of the University of Pennsylvania in Philadelphia. "It dovetails nicely with what's known about the rules of protein folding. It shows that it's a simpler process than one would think." That's good news for scientists interested in designing new proteins. It also supports the argument that life on Earth could have started with simple biochemical processes and gradually built to greater complexity. The new results suggest that early life forms could have created functional proteins with only a few amino acids, says study coauthor David Baker. Baker and his colleagues focused on a 57-unit chain, containing 18 different amino acids, from a protein subunit called the SH3 domain. This sequence is found in many large proteins that act as chemical messengers within and between cells. The chain folds back and forth on itself to create a structure, called a beta sheet, that resembles accordion pleats. Similar sheets occur in many proteins, helping define their three-dimensional forms. The researchers created millions of candidate proteins by considering each amino acid position and choosing potential substitutes from a small set. They then screened the different combinations for ones that folded into the proper configuration. Isoleucine repels water, making it a good choice for the amino acids that point inward toward the protein core. Lysine and glutamate both attract water, making them ideal for the amino acids on the protein's outer surface. Previous work had shown that combinations of just three amino acids are sufficient to form bundles of spiral-shaped structures called alpha helices. The bundles are rather formless, however. To make the more ordered beta sheet, the researchers found they had to use alanine and glycine at certain positions. The researchers screened the candidate structures for the ability to bind to a peptide that attaches at a specific binding site within the SH3 domain. They did not try to replace the amino acids forming the binding site, Baker says. "It's wise not to do that on the first pass," says Michael H. Hecht of Princeton University. That way, they could be sure that the new sequences had the same overall shape "because it places the binding site in the same place," he explains. The reconstructed protein also folded at the same rate as the natural protein, Baker says, suggesting that today's protein sequences didn't evolve through a race to bend more efficiently. The work is an "eye-opener," says Hecht. He compares the natural and simplified protein sequences to two very different languages. "If you only look at China, you may think it takes hundreds of characters to write a language. But then if you go to England, you see that you can write a perfectly good, functional language with only 26 characters." Although five amino acids seem to be enough to form beta sheets, "proteins have to do more than fold," says Baker. More amino acids "allow enzymes to carry out more specialized tasks." As part of a biochemical language, proteins must benefit from being able to choose from a complete alphabet.