luni, 26 ianuarie 2009
Reading frame of a sequence
The actual frame in which a protein sequence is translated is defined by a start codon, usually the first AUG codon in the mRNA sequence. Mutations that disrupt the reading frame by insertions or deletions of a non-multiple of 3 nucleotide bases are known as frameshift mutations. These mutations may impair the function of the resulting protein, if it is formed, and are thus rare in in vivo protein-coding sequences. Often such misformed proteins are targeted for proteolytic degradation. In addition, a frame shift mutation is very likely to cause a stop codon to be read, which truncates the creation of the protein (example [2]). One reason for the rareness of frame-shifted mutations' being inherited is that, if the protein being translated is essential for growth under the selective pressures the organism faces, absence of a functional protein may cause lethality before the organism is viable.
Start/stop codons
Translation starts with a chain initiation codon (start codon). Unlike stop codons, the codon alone is not sufficient to begin the process. Nearby sequences (Such as the Shine-Dalgarno sequence in E.Coli) and initiation factors are also required to start translation. The most common start codon is AUG, which also codes for methionine. There are sometimes other alternative start codons (depending on the organism), such as "UUG," which normally codes for leucine. However, when used as a start codon, these alternative start codons are usually translated as methionine (regardless of their normal meaning).
The three stop codons have been given names: UAG is amber, UGA is opal (sometimes also called umber), and UAA is ochre. "Amber" was named by discoverers Richard Epstein and Charles Steinberg after their friend Harris Bernstein, whose last name means "amber" in German. The other two stop codons were named 'ochre" and "opal" in order to keep the "color names" theme. Stop codons are also called termination codons and they signal release of the nascent polypeptide from the ribosome due to binding of release factors in the absence of cognate tRNAs with anticodons complementary to these stop signals.
Degeneracy of the genetic code
A position of a codon is said to be a fourfold degenerate site if any nucleotide at this position specifies the same amino acid. For example, the third position of the glycine codons (GGA, GGG, GGC, GGU) is a fourfold degenerate site, because all nucleotide substitutions at this site are synonymous; i.e., they do not change the amino acid. Only the third positions of some codons may be fourfold degenerate. A position of a codon is said to be a twofold degenerate site if only two of four possible nucleotides at this position specify the same amino acid. For example, the third position of the glutamic acid codons (GAA, GAG) is a twofold degenerate site. In twofold degenerate sites, the equivalent nucleotides are always either two purines (A/G) or two pyrimidines (C/U), so only transversional substitutions (purine to pyrimidine or pyrimidine to purine) in twofold degenerate sites are nonsynonymous. A position of a codon is said to be a non-degenerate site if any mutation at this position results in amino acid substitution. There is only one threefold degenerate site where changing three of the four nucleotides has no effect on the amino acid, while changing the fourth possible nucleotide results in an amino acid substitution. This is the third position of an isoleucine codon: AUU, AUC, or AUA all encode isoleucine, but AUG encodes methionine. In computation this position is often treated as a twofold degenerate site.
There are three amino acids encoded by six different codons: serine, leucine, arginine. Only two amino acids are specified by a single codon; one of these is the amino-acid methionine, specified by the codon AUG, which also specifies the start of translation; the other is tryptophan, specified by the codon UGG. The degeneracy of the genetic code is what accounts for the existence of silent mutations.
variations to the standard genetic code
In certain proteins, non-standard amino acids are substituted for standard stop codons, depending upon associated signal sequences in the messenger RNA: UGA can code for selenocysteine and UAG can code for pyrrolysine as discussed in the relevant articles. Selenocysteine is now viewed as the 21st amino acid, and pyrrolysine is viewed as the 22nd. A detailed description of variations in the genetic code can be found at the NCBI web site.
Notwithstanding these differences, all known codes have strong similarities to each other, and the coding mechanism is the same for all organisms: three-base codons, tRNA, ribosomes, reading the code in the same direction and translating the code three letters at a time into sequences of amino acids.
Theories on the origin of the genetic code
The genetic code is not a random assignment of codons to amino acids. For example, amino acids that share the same biosynthetic pathway tend to have the same first base in their codons,and amino acids with similar physical properties tend to have similar codons.
There are three themes running through the many theories that seek to explain the evolution of the genetic code (and hence the origin of these patterns).:
* Recent aptamer experiments show that some amino acids have a selective chemical affinity for the base triplets that code for them.This suggests that the current complex translation mechanism involving tRNA and associated enzymes may be a later development, and that originally, protein sequences were directly templated on base sequences.
* That the standard modern genetic code grew from a simpler earlier code through a process of "biosynthetic expansion". Here the idea is that primordial life 'discovered' new amino acids (e.g., as by-products of metabolism) and later back-incorporated some of these into the machinery of genetic coding. Although much circumstantial evidence has been found to suggest that fewer different amino acids were used in the past than today, precise and detailed hypotheses about exactly which amino acids entered the code in exactly what order has proved far more controversial.
* That natural selection has led to codon assignments of the genetic code that minimize the effects of mutations.
