Your genetic code has lots of ‘words’ for the same thing – information theory may help explain the redundancies

Nearly all life, from bacteria to humans, uses the same genetic code. This code acts as a dictionary, translating genes into the amino acids used to build proteins. The universality of the genetic code indicates a common ancestry among all living organisms and the essential role this code plays in the structure, function and regulation of biological cells.

Understanding how the genetic code works is the foundation of genetic engineering and synthetic biology. But there are still many unsolved mysteries, such as why the code is important for various biological processes such as protein folding.

As a scholar working at the interface of biology and physics, I apply information theory – the mathematics of how information is stored and communicated – to study some of these intriguing questions. Just as computers need strings of binary code to function, biological processes also rely on bits of information.

In my recent research, I propose that optimization theory may provide a potential explanation for a long-standing mystery about a certain redundancy in how amino acids are encoded.

Different words for the same thing

The genetic codebook is made of “words” composed of four letters: A, C, G and U. Each of these letters stands for a different chemical building block called a nucleotide: adenine, cytosine, guanine and uracil. A molecular machine called a ribosome reads the codebook to translate genes into proteins.

Circular diagram encoding all 64 possible combinations of the letters A, C, G, and U, which are colored red, yellow, blue, and green, respectively. Abbreviations for different codons are listed around the outer edge of the circle.

The codon sequence is read from the center of the wheel of genetic code.
Mouagip via Wikimedia Commons

Ribosomes read three-letter words called codons, and there are 64 different possible combinations of the four letters that make different codons. In this list of 64 words, 61 encode amino acids, and three signal the ribosome to stop protein synthesis in the cell. For example, “AUG” codes for the amino acid methionine and also indicates the start of a protein.

But just as in any other language, there are synonyms – different codons can encode the same amino acid. In fact, since there are only 20 amino acids but 61 different words to encode them, there is quite a lot of overlap. An amino acid can have anywhere from one to six different codons that encode it. There are only two amino acids that have exactly one codon, methionine and trytophan. This redundancy helps ribosomes perform their tasks correctly even when there’s a typo in the genetic code.

Engineering nature’s guidelines

Why certain amino acids have more synonyms than others is a mystery that has puzzled scientists for decades. Is there a pattern to this variability, or is it random? To answer this question, scientists study the rules that govern nature’s decision-making.

If a human engineer designed the genetic code, they would want to make sure that each amino acid had a similar degree of redundancy to protect against errors and to promote uniformity. The mapping of the 61 codes onto the the 20 amino…

Access the original article