Applying Fusion Protein Technology to E. coli

May 2, 2007
John P. Hall

Volume 2007 Supplement, Issue 3

Protein expression remains an arduous task that involves a complex decision tree. Establishing tools and optimal conditions for each protein remains an empirical exercise.

Rapid, efficient, and cost-effective protein expression and purification strategies are required for high throughput structural genomics and the production of therapeutic proteins. Fusion protein technology represents one strategy to achieve these goals. Fusion protein technology can facilitate purification, enhance protein expression and solubility, chaperone proper folding, reduce protein degradation, and in some cases, generate protein with a native N-terminus. No technology or reagent is a panacea, however, and establishing tools and optimal conditions for each protein remains an empirical exercise. With this in mind, protein fusions are a leading option to produce difficult-to-express proteins, especially in Escherichia coli.

E. coli, a simple and low cost host, has long been the preferred system for recombinant protein expression. Recently however, difficult-to-express proteins (e.g., G-protein coupled receptors [GPCRs], kinases, ion channels, blood plasma proteins, vaccines, and antibodies) have become the norm instead of the exception. This trend is steering researchers away from E. coli to more costly higher organisms such as baculoviruses and mammalian cells.

The recombinant therapeutic protein business had sales of more than $34 billion in 2004.1 This total excludes therapeutic antibodies, vaccines, DNA–RNA synthetics, small molecules, and gene and cell therapies. Projections to 2010 show sales amounting to $52 billion. Table 1 lists top selling recombinant therapeutic proteins.

Table 1. Top eight recombinant therapeutic proteins and their global sales between 2002 and 2004.

In the future, the industry is likely to be hampered by an imposing and growing bottleneck resulting from an inability to efficiently express large quantities of biologically active protein at low costs. The so-called "low hanging fruits" have been picked and a new era involving difficult-to-express proteins is upon us. This situation may drive the industry to use more costly higher organisms for protein production.

The application of gene fusion technology to E. coli, however, offers a promising alternative. Commercial success is more likely when the expression host is simple and inexpensive.


Typical problems with expressing difficult-to-express proteins include: low or null expression yields; insoluble protein; purification difficulties; degradation of the expressed protein; and incorrect folding. Some advances in improving recombinant protein expression in E. coli include the development of strong promoters,2 co-expression with chaperones,3 and most influential of all, fusion tags. Examples of popular fusion tags include: glutathione-s-transferase (GST),4 maltose binding protein (MBP),5,6 NusA,7 thioredoxin (TRX),8 polyhistidine (HIS),9–11 small ubiquitin-like modifier (SUMO),12–15 split SUMO, and ubiquitin (Ub).16

Gene fusion technology can facilitate purification, enhance protein expression and solubility, chaperone proper folding, reduce protein degradation, and in some cases, generate protein with a native N-terminus. Nevertheless, protein expression remains an arduous task that involves a complex decision tree. Whether or not to use gene fusion technology is just one choice. Other factors include the expression system, host strain, mRNA stability, codon bias, inclusion body formation and prevention, site-specific proteolysis, secretion, post-translational modification, and co-overexpression. The complexity is compounded by the diversity of proteins. To date, no technology or reagent is a panacea. Thus, establishing tools and optimal conditions for each protein remains an empirical exercise.

Recently, several comparative studies have examined the effects of various fusion partners on total and soluble expression yield (Table 2). Marblestone et al. evaluated the expression and solubility of three model proteins fused to the C termini of MBP, GST, TRX, NusA, Ub, and SUMO tags.16 The tags were ranked in terms of increased total expression as TRX > SUMO ~ NusA > Ub ~ MBP ~ GST and increased soluble expression as SUMO ~ NusA > Ub ~ GST ~ MBP ~ TRX. Hammarstrom et al. cloned 27 human proteins (MW < 20 kDa) into various expression vectors and ranked the tags' ability to promote soluble expression as TRX ~ MBP ~ Gb1 > ZZ > NusA > GST > His6.17 Braun et al. compared the expression of 32 human proteins (molecular weight of which varied from 17 to 110 kDa) and ranked tags in terms of increased expression and yield after purification as GST ~ MBP > CBP > HIS6.18 Shih et al., in a study of 40 different proteins with eight different tags, observed that MBP gave the best overall results in terms of total and soluble expression.19 In one of the studies in Dyson et al., the solubility of 20 mammalian proteins was compared and the fusion tags were ranked in terms of increased soluble expression as TRX ~ MBP > HIS10 > GST > GFP.20 De Marco et al. demonstrated that NusA was better than GST at enhancing the solubility and stability of recombinant proteins.21

Table 2. Summary of several comparative studies that examine the effects of various fusion partners on total and soluble expression yield.

The inconsistency of the data from these comparative studies only solidifies the statement that tools and optimal conditions for each protein remain empirical and that no technology or reagent is a panacea. Nevertheless, it is likely that in the future generalities about specific fusion tags may be made (e.g., for entire protein classes). As the comparative studies suggest, gene fusion tag systems range dramatically in efficiency.


Factors that influence yield and biological activity include: a) the affinity purification scheme; b) enhancement of recombinant protein expression; c) protein folding and enhanced solubility; d) protection from degradation; e) size of the fusion tag; and f) the specificity, efficiency, and site of cleavage. Herein lies a further discussion of these factors.

Affinity Purification Scheme

Protein fusion tags are often used to purify proteins from crude extracts. GST and MBP are two examples. Other protein fusion systems, such as NusA, SUMO, Split SUMO, thioredoxin, and ubiquitin require an affinity tag, such as polyhistidine (HIS). Numerous examples of affinity purification exist for fusion proteins, including nickel-nitriloacetic acid to isolate hexahistidine-fused proteins,22 amylose to isolate MBP-fused proteins,23 and GSH-sepharose to isolate glutathione (Table 3).24 Successful purification schemes achieve high quality and quantity with inexpensive, high capacity resins and mild elution conditions. For effective purification, high affinity between the fusion tag and resin is essential, but affinity must not be too high, because harsh elution conditions can disrupt tertiary structure.

Table 3. Affinity tags influence protein expression yield and activity.

Enhancing Recombinant Protein Expression

Protein expression depends on transcriptional regulation, mRNA stability, and translational efficiency, whereas enhanced recombinant protein expression is governed by a high mRNA copy number, efficient translational initiation and elongation, stability of the mRNA, and the translational enhancers (reviewed by Makrides).25 Codon bias is another factor that affects expression,26 yet it has been overcome by engineering new strains or cell lines that contain rare tRNAs or by altering the problematic codons to more common prokaryotic codons.27

Promoters also play a fundamental role in the transcription of heterologous genes and recombinant protein expression. Strong and highly regulated promoters are now commonplace for E. coli, yeast, and insect cells.28–30 On the other hand, there is still much to be learned about gene fusion technology, which has been shown to dramatically enhance expression.15,28,31 The exact mechanism by which fusion proteins enhance expression remains unknown. Some speculate that it is the result of the highly conserved structure of the fusion tag.32

Protein Folding and Enhanced Solubility

Cost and simplicity are the primary driving forces when choosing a recombinant expression organism. As a result, E. coli is usually the first choice. However, E. coli has various shortcomings as a recombinant expression organism. Many eukaryotic proteins, especially proteins with disulfide bridges or sugar moieties, cannot be expressed as soluble, active, and properly folded proteins in E. coli.33 Over-expression in E. coli often yields macromolecular crowding (200–300 mg/mL in the cytoplasm), which presents an unfavorable environment for protein folding and results in a high concentration of incorrectly folded proteins that form undesirable inclusion bodies that require re-folding. Inclusion bodies afford protection from proteolytic degradation, which may be their only advantage.

To circumvent these problems, several strategies have been implemented to enhance solubility, promote properly folded protein, and reduce the percentage of inclusion bodies. These strategies include co-expression of molecular chaperones and foldases, expression of secreted proteins, and expression of protein fusions.34-37

Protein fusion tags have been shown to act as solubility enhancers and chaperones.38 Neither mechanism is well understood, but the hypotheses include:

  • Fusion of a stable or conserved structure to an insoluble recombinant protein may serve to stabilize and promote proper folding of the recombinant protein.

  • Fusion tags may act as a nucleus of folding ("molten globule hypothesis").39,40

It should be noted that even though fusion partners promote solubility, this is not a universal indicator of correct folding, and researchers recommend taking additional measurements (including monodispersity by light scattering,41 NMR,42,43 CD spectropolarimetry, bis-ANS binding,44 ligand binding or enzymatic activity) to provide supporting evidence for correct folding.

Protection from Degradation

Recombinant proteins often are considered unwanted by cells and are subjected to proteolytic degradation.45 Several strategies have been developed to protect recombinant proteins from degradation, including the use of protease inhibitors,46 secretion into the periplasm or culture medium,47,48 and generating protective fusions.12,28,49–53 The compartmentalization hypothesis describes the mechanism by which gene fusions protect against proteolytic degradation.54 Fusions can promote the translocation of their partner proteins to different cellular compartments, thereby decreasing the concentration of the recombinant protein in the protease-rich cytosol. For example, SUMO can translocate from the cytosol to the nucleus and maltose binding protein (MBP) can translocate to the membrane compartment of the cell.55,56

Size of the Fusion Tag

The size of the fusion tag plays a crucial role in the overall yield of the purified protein. As seen in Table 4, the yield of the purified target protein is dictated by the yield of the fusion protein: the larger the fusion tag, the lesser the overall yield. Split SUMO, which is 47 amino acids long, is only 19% of the fusion when Split SUMO is fused to a target protein that is 200 amino acids in length (47/247). In contrast, NusA, which is 495 amino acids long, is 71% of the fusion with the same target protein (495/695). Therefore, if expression yielded one gram of fusion protein for both Split SUMO–Target and NusA–Target, the yield of the purified target after cleavage would be 0.810 g and 0.288 g, for Split SUMO and NusA, respectively.

Table 4. The size of the fusion tag influences the yield of the purified protein.

Specificity, Efficiency, and Site of Cleavage

The quality, quantity, and activity of the purified protein are influenced by the specificity, efficiency, and site of cleavage. Cleavage of the fusion usually is necessary because the fusion interferes with the structural or functional properties of the recombinant protein.57 Fusions can be cleaved by either chemical or enzymatic strategies.58,59 These methods include the use of engineered cleavage sites that are recognized by the proteases and are positioned between the fusion tag and the protein target. Proteases that have been employed to cleave fusion tags include tobacco etch virus (Tev) protease,60 factor Xa, thrombin protease,59 and the SUMO protease.12–16 Problems associated with proteolytic cleavage of fusion tags are low yield, precipitation of the protein of interest, labor-intensive optimization of cleavage conditions, expense of proteases, failure to recover active, structurally intact protein,61 and the generation of non-native N-terminal amino acids (not the case with SUMO and ubiquitin fusions12–16 ). As a result, choosing the right chemical or enzymatic strategy is crucial for achieving active protein of high quality and quantity.

Table 5. Abbreviations

JOHN HALL is vice president of business development at LifeSensors, Inc., Malvern, PA, 610.644.8845 x 305,


1. Panlou AK, Reichert JM. Recombinant protein therapeutics-success rates, market trends and values to 2010. Nature Biotechnol. 2004; 12:1513–9.

2. Studier FW, Rosenberg AH, Dunn JJ, Dubendorff JW. Use of T7 RNA polymerase to direct expression of cloned genes. Methods Enzymol. 1990; 185:60–89.

3. Ikura K, Kokubu T, Natsuka S, Ichikawa A, Adachi M, Nishihara K, et al. Co-overexpression of folding modulators improves the solubility of the recombinant guinea pig liver transglutaminase in Escherichia coli. Prep. Biochem. Biotechnol. 2002; 32:189–205.

4. Smith DB, Johnson, KS. Single-step purification of polypeptides expressed in Escherichia coli as fusions with glutathione S-transferase. Gene. 1988; 67:31–40.

5. Bedouelle H, Duplay P. Production in Escherichia coli and one-step purification of bifunctional hybrid proteins which bind maltose. Export of the Klenow polymerase into the periplasmic space. Eur. J. Biochem. 1988; 171:541–549.

6. Di Guan C, Li P, Riggs PD, Inouye H. Vectors that facilitate the expression and purification of foreign peptides in Escherichia coli by fusion to maltose-binding protein. Gene. 1988; 67:21–30.

7. Davis GD, Elisee C, Newham DM, Harrison RG. New fusion protein systems designed to give soluble expression in Escherichia coli. Biotechnol. Bioeng. 1999; 65:382–388.

8. LaVallie ER, DiBlasio EA, Kovacic S, Grant KL, Schendel PF, McCoy JM. A thioredoxin gene fusion expression system that circumvents inclusion body formation in the E. coli cytoplasm. Biotechnology. 1993; 11:187–193.

9. Gentz R, Certa U, Takacs B, Matile H, Döbeli H, Pink R, et al. Major surface antigen p190 of Plasmodium falciparum: Detection of common epitopes present in a variety of plasmodia isolates. EMBO J. 1988; 7:225–230.

10. Hochuli E, Bannwarth W, Döbeli H, Gentz R, Stüber D. Genetic approach to facilitate purifaction of recombinant proteins with a novel metal chelate adsorbent. Biotechnol. 1988; 6:1321–1325.

11. Smith MC, Furman TC, Ingolia TD, Pidgeon C. Chelating peptide-immobilized metal ion affinity chromatography. A new concept in affinity chromatography for recombinant proteins. J. Biol. Chem. 1988; 263:7211–7215.

12. Malakhov MP, Mattern MR, Malakhova OA, Drinker M, Weeks SD, Butt TR. SUMO fusions and SUMO-specific protease for efficient expression and purification of proteins. J. Struct. Funct. Genomics. 2004; 5:75–86.

13. Zuo X, Li S, Hall J, Mattern MR, Tran H, Shoo J, et al. Enhanced expression and purification of membrane proteins by SUMO fusion in E. coli. J. Struct. Funct. Genomics. 2005; 6:103–111.

14. Zuo X, Mattern MR, Tan R, Li S, Hall J, Sterner DE, et al. Expression and purification of SARS Coronavirus proteins using SUMO fusions. Protein Express Purif. 2005; 42:100–110.

15. Butt TR, Edavettal SC, Hall JP, Mattern MR. SUMO fusion technology for difficult-to-express proteins. Protein Express Purif. 2005; 43:1–9.

16. Marblestone JG, Edavettal SC, Lim Y, Lim P, Zuo X, Butt TR. Comparison of SUMO fusion technology with traditional gene fusion systems: Enhanced expression and solubility with SUMO. Protein Sci. 2006; 15:182–189.

17. Hammarstrom M, Hellgren N, van Den Berg S, Berglund H, Hard T. Rapid screening for improved solubility of small human proteins produced as fusion proteins in Escherichia coli. Protein Sci. 2002; 11(2):313–321.

18. Braun P, Hu Y, Shen B, Halleck A, Koundinya M, Harlow E, LaBaer J. Proteome-scale purification of human proteins from bacteria. Proc. Natl. Acad. Sci. 2002; 99(5):2654–2659.

19. Shih YP, Kung WM, Chen JC, Yeh CH, Wang AH, Wang TF. High-throughput screening of soluble recombinant proteins. Protein Sci. 2002; 11(7):1714–1719.

20. Dyson MR, Shadbolt SP, Vincent KJ, Perera RL, McCafferty J. Production of soluble mammalian proteins in Escherichia coli: identification of protein features that correlate with successful expression. BMC Biotechnol. 2004; 4:32.

21. De Marco V, Stier G, Blandin S, de Marco A. The solubility and stability of recombinant proteins are increased by their fusion to NusA. Biochem. Biophys. Res. Commun. 2004; 322:766–771.

22. Sharma SK, Evans DB, Vosters AF, McQuade TJ, Tarpley WG. Metal affinity chromatography of recombinant HIV-1 reverse transcriptase containing a human renin cleavable metal binding domain. Biotechnol. Appl. Biochem. 1991; 14:69–81.

23. Di Guan C, Li P, Riggs PD, Inouye H. Vectors that facilitate the expression and purification of foreign peptides in Escherichia coli by fusion to maltose-binding protein. Gene. 1988; 67:21–30.

24. Smith DB, Johnson KS. Single-step purification of polypeptides expressed in Escherichia coli as fusions with glutathione-S-transferase. Gene. 1988; 67:31–40.

25. Makrides AC. Strategies for achieving high-level expression of genes in Escherichia coli. Microbiol. Reviews. 1996; 60:512–538.

26. Narum DL, Kumar S, Rogers WO, Fuhrmann SR, Liang H, Oakley M, et al. Codon optimization of gene fragments encoding Plasmodium falciparum merzoite proteins enhances DNA vaccine protein expression and immunogenicity in mice. Infect. Immun. 2001; 69:7250–3.

27. Brinkmann U, Mattes RE, Buckel P. High-level expression of recombinant genes in Escherichia coli is dependent on the availability of the dnaY gene product. Gene. 1989; 85:109–114.

28. Butt TR, Jonnalagadda S, Monia BP, Sternberg EJ, Marsh JA, Stadel JM, et al. Ubiquitin fusion augments the yield of cloned gene products in Escherichia coli. Proc. Natl. Acad. Sci. 1989; 86:2540–4.

29. Dubendorff JW, Studier FW. Controlling basal expression in an inducible T7 expression system by blocking the target T7 promoter with the lac repressor. J. Mol. Biol. 1991; 219:45–59.

30. Thiem SM, Miller LK. Differential gene expression mediated by late, very late and hybrid baculovirus promoters. Gene. 1980; 91:87–94.

31. Sachdev D, Chirgwin JM. Fusions to maltose-binding protein: Control of folding and solubility in protein purification. Methods Enzymol. 2000; 326:321–321.

32. Arechaga I, Miroux B, Runswick MJ, Walker JE. Over-expression of Escherichia coli F1F(o)-ATPase subunit a is inhibited by instability of the uncB gene transcript. FEBS Lett. 2003; 547;97–100.

33. Prinz A, Aslund F, Holmgren A, Beckwith J. The role of the thioredoxin and glutaredoxin pathways in reducing protein disulfide bonds in the Escherichia coli cytoplasm. J. Biol. Chem. 1997; 272:15661–7.

34. Ikura K, Kokubu T, Natsuka S, Ichikawa A, Adachi M, Nishihara K, et al. Co-overexpression of folding modulators improves the solubility of the recombinant guinea pig liver transglutaminase in Escherichia coli. Prep. Biochem. Biotechnol. 2002; 32:189–205.

35. Wulfing C, Pluckthun A. Correctly folded T-cell receptor fragments in the periplasm of Escherichia coli. Influence of folding catalysts. J. Mol. Biol. 1994; 242:655–669.

36. Sugamata Y, Shiba T. Improved secretory production of recombinant proteins by random mutagenesis of hlyB, an a-hemolysin transporter from Escherichia coli. Appl. Environ. Microbiol. 2005; 71:656–662.

37. Chopra AK, Brasier AR, Das M, Xu XJ, Peterson JW. Improved synthesis of Salmonella typhimurium enterotoxin using gene fusion expression systems. Gene. 1994; 144:81–85.

38. Fox JD, Kapust RB, Waugh DS. Single amino acid substitutions on the surface of Escherichia coli maltose-binding protein can have a profound impact on the solubility of fusion proteins. Protein Sci. 2001; 10:622–630.

39. Englander SW. Protein folding intermediates and pathways studied by hydrogen exchange. Annu. Rev. Biophys. Biomol. Struct. 2000; 29:213–38.

40. Creighton TE. How important is the molten globule for correct protein folding? Trends Biochem. Sci. 1997; 22:6–10.

41. Bach H, Mazor Y, Shaky S, Shoham-Lev A, Berdichevsky Y, Gutnick DL, Benhar I. Escherichia coli maltose-binding protein as a molecular chaperone for recombinant intracellular cytoplasmic single-chain antibodies. J. Mol. Biol. 2001; 312(1):79–93.

42. Nomine Y, Ristriani T, Laurent C, Lefevre J-F, Weiss E, Trave G. A strategy for optimizing the monodispersity of fusion proteins: application to purification of recombinant HPV E6 oncoprotein. Protein Eng. 2001; 14(4):297–305.

43. Sachdev D, Chirgwin JM. Properties of soluble fusions between mammalian aspartic proteinases and bacterial maltose-binding protein. J. Protein Chem. 1999; 18(1):127–136.

44. Scheich C, Leitner D, Sievert V, Leidert M, Schlegel B, Simon B, et al. Fast identification of folded human protein domains expressed in E. coli suitable for structural analysis. BMC Struct. Biol. 2004, 4(1):4.

45. Rozkov A, Enfors SO. Analysis and control of proteolysis of recombinant proteins in Escherichia coli. Adv. Biochem. Engin./Biotechnol. 2004; 89:163–195.

46. Prouty W, Goldberg A. Efffects of protease inhibitors on protein breakdown in Escherichia coli. J. Biol. Chem. 1972; 247:3341–3352.

47. Talmadge K, Gilbert W. Cellular location affects protein stability in Escherichia coli. Proc. Natl. Acad. Sci. 1982; 79:1830–1833.

48. Hogset A, Blingsom OR, Saether O, Gautvik VT, Holmgren E, Hartmanis M, et al. Expression and characterization of a recombinant human parathyroid hormone secreted by Escherichia coli employing the staphylocccal protein A promoter and signal sequence. J. Biol. Chem. 1990; 265:7338–7344.

49. Murby M, Uhlen M, Stahl S. Upstream strategies to minimize proteolytic degradation upon recombinant production in Escherichia coli. Protein Express. Purif. 1996; 7:129–36.

50. Martinez A, Knappskog PM, Olafsdottir S, Doskeland AP, Eiken HG, Svebak RM, et al. Expression of recombinant human phenylalanine hydroxylase as fusion protein in Escherichia coli circumvents proteolytic degradation by host cell proteases. Isolation and characterization of the wild-type enzyme. Biochem. J. 1995; 306:589–97.

51. Koken MH, Odijk HH, van Duin M, Fornerod M, Hoeijmakers JH. Augmentation of protein production by a combination of the T7 RNA polymerase system and ubiquitin fusion: overproduction of the human DNA repair protein, ERCC1, as an ubiquitin fusion protein in Escherichia coli. Biochem. Biophys. Res. Commun. 1993; 195:643–653.

52. Murby M, Cedergren L, Nilsson J, Nygren PA, Hammarberg B, Nilsson B, et al. Stabilization of recombinant proteins from proteolytic degradation in Escherichia coli using a dual affinity fusion strategy. Biotechnol. Appl. Biochem. 1991; 14:336–346.

53. Shen SH. Multiple joined genes prevent product degradation in Escherichia coli. Proc. Natl. Acad. Sci. 1984; 81:4627–31.

54. Varshavsky A. The N-end rule: functions, mysteries, uses. Proc. Natl. Acad. Sci. 1996; 93:12142–9.

55. Kishi A, Nakamura T, Nishio Y, Maegawa H, Kashiwagi A. Sumoylation of Pdx1 is associated with its nuclear localization and insulin gene activation. Am. J. Physiol. Endocrinol. Metab. 2003; 284:E830–40.

56. Nikaido H. Maltose transport system of Escherichia coli: an ABC-type transporter. FEBS Lett. 1994; 346:55–8.

57. Balbas P. Understanding the art of producing protein and nonprotein molecules in Escherichia coli. Mol. Biotech. 2001; 19:251–267.

58. Chong S, Mersha FB, Comb DG, Scott ME, Landry D, Vence LM, et al. Single-column purification of free recombinant proteins using a self-cleavable affinity tag derived from a protein splicing element. Gene. 1997; 192:277–281.

59. Jenny RJ, Mann KG, Lundblad RL. A critical review of the methods for cleavage of fusion proteins with thrombin and factor Xa. Protein Express. Purif. 2003; 31:1–11.

60. Carrington JC, Cary SM, Parks TD, Dougherty WG. A second proteinase encoded by a plant potyvirus genome. Embo. J. 1989; 8:365–70.

61. Baneyx F. Recombinant protein expression in Escherichia coli. Opin. Biotech. 10: 411–421.