Science and Reason: The amino acid alphabet

Amino acid alphabet soup

All living creatures on this planet use the same 20 amino acids, even though there are hundreds available in nature. Scientists therefore have wondered if life could have arisen based on a different set of amino acids. And what's more, could life exist elsewhere that utilizes an alternate collection of building blocks?

It really is rather remarkable that such a small subset of possible amino acids make up (almost) all the proteins in every known living organism on the planet. What enforces this strict discipline is the fact that all life forms on Earth use the same genetic code – a remarkable fact in itself – and this code does not specify any amino acids other than the same 20 ones. The way the code works makes substitutions impossible.

The reason for this inflexibility lies in the nature of transfer RNA, which is a critical part of the process in which genetic information encoded in DNA is converted to specific sequences of amino acids making up proteins. The DNA sequence of genes is first transcribed (in a process that is actually rather complicated) into another form of RNA – messenger RNA. All forms of RNA consist of a sequence of nucleotides, with every 3 nucleotides grouped together into "words". Since there are 4 possible nucleotides, there are 64 (=4³) possible distinct words.

In a molecule of transfer RNA, which typically comprises 73 to 93 nucleotides altogether, the three nucleotides at one end will match the sequence of one particular word of messenger RNA. The other end of the transfer RNA can bind to covalently to only one of 20 possible amino acids, completely ignoring any other amino acids. For any particular one of the 20 amino acids there are usually several different transfer RNAs that the amino acid can bind to, with each type corresponding to a specific 3-letter sequence of nucleotides. In this way there is a established a many-to-1 relationship between the 64 3-letter nucleotide words and the 20 amino acids. This is the genetic code.

The 20 amino acids can be considered as letters of another alphabet, in which sequences of letters (sometimes thousands of each) make up specific proteins. There are several interesting questions about this genetic code. Why are only 20 amino acids used, even though hundreds exist in nature? How did this small subset happen to be chosen – and be the same subset in all living organisms on Earth? If there is life on other planets that still encodes genetic information with DNA and RNA for making proteins, must the same 20 amino acids be used?

There is a range of possible answers to these questions. At one extreme, the subset of amino acids could have come about completely at random, perhaps being the first viable subset that emerged by chance and them became "frozen" in all successor life forms. At the other extreme, it could be that the amino acids actually used are the only ones that are able to build a suitable set of proteins. The intermediate case is that very early in the history of life many different subsets were in use, but in a process of evolution over time, the subset now used proved to be sufficiently superior to all others that it is the only one that survived in the conditions of the time.

Stephen Freeland and Gayle Philip performed a computer study to investigate whether the exact subset of 20 amino acids in the alphabet were more likely to be a completely random selection, or instead to represent a set that emerged as somehow the best suited for constituting the proteins of life on Earth. They reasoned that there were various properties any amino acid could have that would affect its suitability as a constituent of proteins. Among the properties were size and electric charge of the molecule, and the molecule's degree of attraction to water (hydrophilicity).

What they found was that the 20 amino acids actually occurring in proteins had a wide range of values for each of the properties, and that the range of properties was more evenly distributed over the subset than should occur if selection were random. In other words, the building blocks of proteins appear to be especially diverse in order to accommodate a large diversity of proteins that could be useful in living organisms. Thus evolution in the earliest stages of life on Earth probably favored the availability of many types of building blocks.

Abstract: Did evolution select a nonrandom "alphabet" of amino acids?: The last universal common ancestor of contemporary biology (LUCA) used a precise set of 20 amino acids as a standard alphabet with which to build genetically encoded protein polymers. Considerable evidence indicates that some of these amino acids were present through nonbiological syntheses prior to the origin of life, while the rest evolved as inventions of early metabolism. However, the same evidence indicates that many alternatives were also available, which highlights the question: what factors led biological evolution on our planet to define its standard alphabet? One possibility is that natural selection favored a set of amino acids that exhibits clear, nonrandom properties-a set of especially useful building blocks. However, previous analysis that tested whether the standard alphabet comprises amino acids with unusually high variance in size, charge, and hydrophobicity (properties that govern what protein structures and functions can be constructed) failed to clearly distinguish evolution's choice from a sample of randomly chosen alternatives. Here, we demonstrate unambiguous support for a refined hypothesis: that an optimal set of amino acids would spread evenly across a broad range of values for each fundamental property. Specifically, we show that the standard set of 20 amino acids represents the possible spectra of size, charge, and hydrophobicity more broadly and more evenly than can be explained by chance alone.

Labels: astrobiology, molecular biology