New concepts in gene design emphasize the control of protein translation kinetics as a means to improve yield and alter protein
solubility and activity. Many expression systems are now available for producing proteins in diverse organisms. Adapting genes
for predictable performance in these systems requires more than just controlling transcription and translation initiation.
Translation elongation pausing encoded by pairs of codons plays a key role in host-specific expression. The required modifications
in the so–called silent nucleotides in the open reading frame can dictate wholesale redesign of the gene. This requirement
for host-specific adaptation of the gene will drive drug development programs into the era of synthetic biology. The convergence
of large-scale DNA synthesis capability, better technology for gene expression, and the massive computational capacity needed
to design genes de novo make the practical methodology of gene design a new necessity.
Modern drug development programs depend heavily on targets identified first as gene sequences. Although information from various
"omic" sciences provides a map to treasured target genes, the drug development process really depends on obtaining sufficient
quantities of each protein that is encoded in these "druggable targets." A key to enabling expression is being able to control
transcription and translation of recombinant genes.
Controlling Transcription and Translation Initiation of Recombinant Genes
As a way to better understand the problem of how to control protein expression, let us consider how recombinant proteins currently
are produced. Early Escherichia coli expression systems emphasized overall yield, but high protein expression levels in E. coli often result in insoluble protein that forms inclusion bodies. Although these insoluble particles can be purified easily,
and sometimes can be refolded to yield soluble protein, there is a lot of interest in controlling factors that determine protein
folding as a means of obtaining soluble, active protein.
Refinements introduced in the 1990s and in the past five years have made production in E. coli more sophisticated.1,2 One technique that has been used to overcome difficult transcription problems is to put the gene under the control of a
robust promoter from the RNA phage T7 system and introduce the phage polymerase to drive transcription. Another approach is
to use highly repressible promoters (derived from the arabinose operon) to maintain control over genes that are problematic
for maintaining vector plasmids in the cell. In addition to transcription initiation control, the end of the message (the
3' untranslated region) can be stabilized by sequences that promote RNAse III cleavage.
New Hosts to Consider
Perhaps the most dramatic change in protein expression has been the broadened choice of host organisms with expression systems
that have well controlled recombinant protein expression systems.3-6 Recombinant proteins now are produced in a variety of bacteria, yeasts, fungi, insect cells, tissue culture, and even whole
plants and animals.
More complex systems such as tissue culture or recombinant animals and plants are time-consuming and expensive. Thus, these
systems are appropriate for producing proteins with mammalian-specific post-translational modifications or proteins that depend
on ancillary proteins such as chaparonins to fold.
By contrast, producing protein in micro-organisms such as E. coli, yeast, or fungal hosts is cheaper and often faster. In addition, when these simpler systems can produce an appropriate active
molecule, they can produce very large amounts of protein.
For any given choice of host, however, a choice of transcription promoters and vectors exists, and no host is universally
successful in producing proteins of interest. When contemporary "omic" sciences generate a list of dozens or hundreds of target
proteins that do not have a history of successful protein expression, the complexity of finding an expression system customized
for each gene may be unmanageable.
Clearly, there is a collision between the opportunities and demands placed on protein expression programs by the "omic" sciences
and the capabilities of current systems. It is difficult to justify either abandoning hard-won targets or performing a survey
of various combinations of host and recombinant expression system available for every protein in a list.
So, what is lacking in the technology of heterologous gene expression that makes producing an arbitrary protein so unpredictable?
Transcription and Translation Elongation