The following programs investigate simulated and actual PCC7120 DNA condon bias. Several large differences were observed between the codon frequencies of the simulated and actual PCC1720 DNA. Based off of the algorithm that I employed and the given GC content of 42%, we would expect to produce a sequence with the following relative nucleotide fractions: G = 0.21, C = 0.21, A = 0.29, and T = 0.29. That being said, if our sequence is 1000 nucleotides long, it would contain 160 more A and T nucleotides than G and C nucleotides. The largest difference were observed in codons that encode for the stop codons. The next highest difference was observed in the codon GAA (E, Diff: -28.3), which was significantly underrepresented in the simulated DNA. Glutamic acid is an amino acid which has charged side-chains. This amino acid is also very charged and hydrophilic. It can typically be seen facing the outside in an aqueous environment. That be being said, having the correct ratio of this specific codon could play a crucial role in the development of this protein’s overall structure.
The next highest difference was observed in the codon CAA (Q, -22.1). Like Glutamic acid, Glutamine is polar and also hydrophobic. But unlike Glutamic acid, Glutamine has polar side chains. Again, this codon was unrepresented; in actual DNA sequences having the correct ratio of this specific codon could play a crucial role in the development of this protein’s overall structure and in the development of transmembrane proteins (just a thought). As far as why this specific codon is more heavily expressed than say “CAG” (the other synonymous codon), may have to do with differential expression of the tRNA that actually hybridizies to codon or CAA may infer increased translational efficiency. That being said, after generations of codon optimization this codon may increase the organism's fitness.