Radical Changes in the Engineering of Synthetic Genes for Protein Expression

Feb 09, 2006

New concepts in gene design emphasize the control of protein translation kinetics as a means to improve yield and alter protein solubility and activity. Many expression systems are now available for producing proteins in diverse organisms. Adapting genes for predictable performance in these systems requires more than just controlling transcription and translation initiation. Translation elongation pausing encoded by pairs of codons plays a key role in host-specific expression. The required modifications in the so–called silent nucleotides in the open reading frame can dictate wholesale redesign of the gene. This requirement for host-specific adaptation of the gene will drive drug development programs into the era of synthetic biology. The convergence of large-scale DNA synthesis capability, better technology for gene expression, and the massive computational capacity needed to design genes de novo make the practical methodology of gene design a new necessity.

Modern drug development programs depend heavily on targets identified first as gene sequences. Although information from various "omic" sciences provides a map to treasured target genes, the drug development process really depends on obtaining sufficient quantities of each protein that is encoded in these "druggable targets." A key to enabling expression is being able to control transcription and translation of recombinant genes.

Controlling Transcription and Translation Initiation of Recombinant Genes

As a way to better understand the problem of how to control protein expression, let us consider how recombinant proteins currently are produced. Early Escherichia coli expression systems emphasized overall yield, but high protein expression levels in E. coli often result in insoluble protein that forms inclusion bodies. Although these insoluble particles can be purified easily, and sometimes can be refolded to yield soluble protein, there is a lot of interest in controlling factors that determine protein folding as a means of obtaining soluble, active protein.

Refinements introduced in the 1990s and in the past five years have made production in E. coli more sophisticated.1,2 One technique that has been used to overcome difficult transcription problems is to put the gene under the control of a robust promoter from the RNA phage T7 system and introduce the phage polymerase to drive transcription. Another approach is to use highly repressible promoters (derived from the arabinose operon) to maintain control over genes that are problematic for maintaining vector plasmids in the cell. In addition to transcription initiation control, the end of the message (the 3' untranslated region) can be stabilized by sequences that promote RNAse III cleavage.

New Hosts to Consider

Perhaps the most dramatic change in protein expression has been the broadened choice of host organisms with expression systems that have well controlled recombinant protein expression systems.3-6 Recombinant proteins now are produced in a variety of bacteria, yeasts, fungi, insect cells, tissue culture, and even whole plants and animals.

More complex systems such as tissue culture or recombinant animals and plants are time-consuming and expensive. Thus, these systems are appropriate for producing proteins with mammalian-specific post-translational modifications or proteins that depend on ancillary proteins such as chaparonins to fold.

By contrast, producing protein in micro-organisms such as E. coli, yeast, or fungal hosts is cheaper and often faster. In addition, when these simpler systems can produce an appropriate active molecule, they can produce very large amounts of protein.

For any given choice of host, however, a choice of transcription promoters and vectors exists, and no host is universally successful in producing proteins of interest. When contemporary "omic" sciences generate a list of dozens or hundreds of target proteins that do not have a history of successful protein expression, the complexity of finding an expression system customized for each gene may be unmanageable.

Clearly, there is a collision between the opportunities and demands placed on protein expression programs by the "omic" sciences and the capabilities of current systems. It is difficult to justify either abandoning hard-won targets or performing a survey of various combinations of host and recombinant expression system available for every protein in a list.

So, what is lacking in the technology of heterologous gene expression that makes producing an arbitrary protein so unpredictable?

Transcription and Translation Elongation

lorem ipsum