How does the presence of pauses affect the practical task of expressing a protein? On the simplest level, the pauses are likely
to down-regulate a highly translated (polysomal) mRNA, because the rate of translation initiation will soon saturate and the
slowest translation step becomes rate-limiting. Secondly, at least in bacteria, a significant pause can result in premature
transcription termination or messenger degradation. Even in eukaryotes, there is a coupling between the export of mRNA from
the nucleus and translation, so a different, but still effective system of clearing untranslated mRNA, exists in eukaryotes.
Heterologous Gene Expression Creates Inappropriate Translation Pausing Signals and Inefficient Codon Usage
Taken together, both organism-specific codon usage and the presence of organism-specific pause sites mean that the biologically
appropriate translation of a gene is highly adapted to its original host organism. Ribosomal pausing sites that may be functional
in a human cell will almost certainly not be recognized in a bacterium, and even worse, a cDNA has a random but high probability
of encoding a bacterial pause site that decouples translation from transcription, leading to expression failure. It is little
wonder then that most cDNA clones do not smoothly express high levels of protein in bacteria. Even differences between pause
signal coding among bacteria or among vertebrates are sufficient to make cross-family gene expression unpredictable.
Enter Synthetic Biology
The simplest test of translation pausing as a general regulator of protein synthesis is to compare a series of genes that
have random pauses with synthetic genes from which the pauses have been removed intentionally. Genes moved from their source
organism and expressed in a heterologous host with an altered set of over-represented codon pairs have a drastically altered
configuration of presumed pause sites. Experimentally, creating codon-pair optimized genes has a dramatic effect on expression:
of more than 60 genes tested, expression was either seen for the first time or improved, sometimes more than 100 fold.12
Radical Overhaul: De Novo Design of Synthetic Genes
Building a novel gene sequence to express a target protein sequence can have several advantages. Because of the redundancy
of the triplet code, it is possible to preserve amino acid sequence coding while varying the nucleic acid sequence. In fact,
a tremendous amount of variation is possible — approximately 3
N
sequence permutations for a protein of length N. Even a small protein of 100 amino acids thus has a total sequence space of approximately 1050 possible ways to encode the peptide. Using this space, genes can be designed which are malleable and specifically tailored
to a certain host and vector system. The resulting gene can:
- eliminate translational problems caused by inappropriate ribosome pausing
- have codon usage rationalized to avoid over-reliance on rare codons
- be specifically designed to avoid oligos that mis-hybridize. Genes can be easily assembled from optimized oligonucleotides
that by thermodynamic necessity can only pair up in a specific order.
- be free of selected restriction sites, internal Shine-Delgarno sequences, or other sites that may cause problems in cloning
or in interactions with the host organism
- have a controlled RNA sequence and secondary structure to avoid detrimental termination or processing sites.
The first two items on this list can re-tune the gene of interest to express in a particular host organism. Several programs
exist for the purpose of identifying codon usage changes needed to suit the tRNA availability for a particular organism.13,14
|