Transcriptomics and the Production of Recombinant Therapeutics

Transcriptomics plays a role in influencing the production of recombinant therapeutics in microbial and mammalian hosts.
Jun 01, 2018
Volume 31, Issue 6, pg 22–28

Andrii Vodolazhskyi/Shutterstock.comVarious recombinant therapeutics are available for treatment of arthritis, inflammation, immune disorders, central neural system, diabetes and endocrinology, genetic disorders, hematology, infectious diseases including HIV, and oncology. A significant development in this field is the emergence of genome editing tools that allow gene manipulation at the expression system and genome level.

Gene expression is the most intrinsic level at which the genotype gives rise to the phenotype (observable trait). The genetic information is stored in DNA, which is converted in mRNA by RNA polymerase (Figure 1). This conversion process is called a transcription, an essential step in gene expression.

CLICK FIGURE TO ENLARGE Figure 1. Process of transcription. Transcription is a process in which ribonucleic acid (RNA) is synthesized from DNA. It requires RNA polymerase, which is a complex holoenzyme (mol wt. 465 kDa) with five polypeptide subunits (2X, 1X, and 1X) and one sigma (X) factor. The sigma factor of RNA polymerase can recognize for initiation of transcription. RNA polymerase utilizes ribonucleotide triphosphates (ATP, GTP, CTP, and UTP) for the formation of RNA. Deoxyribonucleoside 5-triphosphates (dNTPs) to form the growing DNA chain during RNA-DNA hybrid formation. RNA is synthesized from 5 end to 3 end (5X3) antiparallel to the DNA template.Transcription initiation in bacteria requires specific proteins known as sigma factors. These factors depend on the genome and are required for proper promoter recognition by RNA polymerase. These factors are divided into two main phylogenetic families: sigma 70 and sigma 54. The conversion of mRNA to complementary DNA (cDNA) is called as reverse transcription. Reverse transcriptase uses an RNA template and a short primer complementary to the 3’ end of the RNA to direct the synthesis of the first strand of cDNA.

DNA microarray, quantitative real-time polymerase chain reaction (q-RT-PCR), and ultra-high throughput sequencing are powerful tools that enable researchers to study transcriptional expression at the global scale. Microarray-based transcriptome analysis has been applied to elucidate the regulation of the host cell during protein production. Most researchers have studied different transcriptional expression patterns of the recombinant strain under stress conditions such as temperature, oxygen, cell cycle control, and metabolic activity to understand the molecular mechanism of heterologous gene expression (Figure 2). The three most common microarray platforms are cDNA arrays (hundreds of bases to several kilobases) in which cDNA clones are cultured in bacteria and stored in 96- or 384-well plates; short oligonucleotide arrays (15–20 mers) in which perfectly matching oligomers—and for some types of arrays also oligomers with one mismatch—are synthesized in situ directly on the array; and long oligonucleotide arrays (40–80 mers) in which long specific oligomers of equal length are bioinformatically designed.


Figure 2. Illustrating use of information obtained from microarray data analysis to improve microbial host for recombinant protein production.Further, RNA-Sequencing (RNA-Seq) is a quantitative method that is typically used to quantify RNA levels in the cell. This method has high sensitivity and, thus, is potentially more accurate than microarray-based methods, at least for genes expressed at a low level. The steps involved in the various methods related to transcriptomics study are summarized in Table I.

Table I. Steps involved in various methods related to transcriptomics study. RT-PCR is real-time polymerase chain reaction.

This 39th article in the Elements of Biopharmaceutical Production series focuses on the role that transcriptomics can play in influencing production of recombinant therapeutics in microbial and mammalian hosts.


Transcriptome study of recombinant protein production in prokaryotic system

Escherichia coli
So far, Escherichia coli (E. coli) is the most favored host for recombinant protein production via microbial route. It has been thoroughly characterized, many of its metabolism processes are understood, and there are several genetic tools readily available for gene manipulation. Though E. coli can be used for large-scale production of recombinant proteins, the system suffers from some major disadvantages that limit its application. Many recombinant proteins are produced in soluble, insoluble, and denatured forms (1). It has been reported that protein degradation is high in E. coli (1–2). Further, expression of complex proteins such as monoclonal antibodies (mAbs) in E. coli is not feasible as mAbs require post-transcriptional modification for activity. Also, production of therapeutic proteins in E. coli requires complex and expensive purification steps because retrieval of product from E. coli typically involves cell lysis resulting in relatively higher levels of host cell impurities (host cell protein, proteases, host cell DNA, and endotoxins) (3–6).

The recombinant protein production in E. coli can be enhanced significantly by engineering the cell physiology. These modifications reduce physiological stress during high-level recombinant protein production in E. coli. Due to change in intracellular and extracellular environment, physiological stress can be exerted on the E. coli. The presence of plasmid can develop the metabolic burden on the cell through its excessive replication as well as new interaction between the host and the plasmid.

A few transcriptome studies have been performed on the recombinant protein production in E coli. One transcriptome study was performed to identify the reason for amino acid shortage during recombinant protein production. It was observed that many of the amino acid biosynthesis pathways were uniformly upregulated or downregulated (7). The researchers reported that recombinant protein production results in a stringent response generated by the high amino acid synthesis demand during protein synthesis. The transcriptome profiles of recombinant E. coli were analyzed and compared for three culture conditions: normal growth, no external stress; L-serine hydroxamate addition (to mediate a stringent response); and isopropyl-b-D-thiogalactoside induction to produce the recombinant protein chloramphenicol acetyltransferase. It was observed that in the histidine biosynthesis pathway, all but two genes were upregulated but in arginine biosynthesis, the entire pathway was upregulated and the degradation pathway was entirely downregulated during stringent response. Genes in the methionine pathway were either not regulated or downregulated and it influenced recombinant protein production. The classical stringent response beyond the simple cellular response to an amino acid shortage included phospholipid synthesis and protease upregulation.

In another study, Baig et al. (8) determined the dynamic transcriptional response of E. coli to inclusion body (IB) formation during recombinant protein production. They characterized the dynamic gene expression variability in E. coli due to insoluble and soluble protein production. The increased expression levels of certain amino acid synthesis genes could potentially be attributed to the amino acid compositions of a particular protein. They studied stringent response under nutritional limitations and reported that stringent response is initiated by a high uncharged-tRNA to aminoacylated tRNA ratio, caused by intracellular amino acid limitations. They observed that only 10 stringent response genes had higher levels compared to the uninduced cultures (clpB, deoA, dnaK, gdhA, groEL, hisG, htpG, and thrCS). The classical heat shock genes had increased expression due to IB formation, including DnaK-DnaJ-GrpE chaperone system and proteases.

Expression levels were compared between soluble and insoluble samples to identify reproducible changes in host-cell transcription in response to protein solubility status by Smith, (9). They identified a pattern of gene expression that correlates strongly with protein solubility, by comparing transcriptional profiles for multiple examples from the soluble or insoluble class. The heat-shock sigma factor (r32), which includes genes involved in protein folding and degradation, was found to be highly expressed in response to induction of insoluble protein. This same group of genes was also upregulated by insoluble protein accumulation under a different growth regime, indicating that r32-mediated gene expression is a general response to protein insolubility. This knowledge provides a starting point for the rational design of growth parameters and host strains with improved protein solubility (Figure 2).

Bacillus species
Among microbial host, Bacillus gets more attention because of its generally recognized as safe (GRAS) status, high plasmid stability, high-secretion capacity, broad substrate spectrum, or the absence of extracellular proteases. In addition, it provides better folding conditions in comparison to the reducing environment in the cytoplasm, thereby preventing the formation of inclusion bodies. This organism contains six thioldisulfide oxidoreductases (DsbA–E and DsbG). Disulfide bond formation is one of the most important events for the activity and stability of many exported heterologous proteins; hence, this study has the potential to give researchers insight into product stability (10).

A transcriptional study has been performed to study the effect of expression of heterologous protein production in Bacillus subtilis (B. subtilis). It has been observed that expression of the heat shock genes of class I (dnaK, groEL, and grpE), class III (clpP and clpC), pyrimidine, and purine synthesis enzymes increase under overproducing conditions (11). A comparison of the transcriptional pattern of the PorA overproducing strain with that of the control showed increased mRNA levels from genes belonging to class I (dnaK, groEL, and grpE) and class III (clpP, clpC, and clpE) of heat shock-induced genes. This information will be useful to redesign the expression system (use of specific chaperone system under specific stress condition) and thereby improve production of recombinant protein.

Lactococcus lactis
Lactic acid bacteria (LAB) are GRAS microorganisms because they do not produce lipo-polysaccharides (as in E. coli). LAB also do not produce any proteases like B. subtilis. Therefore, these bacteria are suitable for the production of pharmaceutically important therapeutic proteins and are also used as vaccines. High levels of recombinant protein production can be achieved in Lactocuccus lactis (L. lactis) by using Lactococcal constitutive or inducible promoters. The surface serine protease htrA (for stability at high temperatures), which is involved in stress resistance and degradation of misfolded proteins, has been expressed under the control of nisin promoter in L. lactis.

Transcriptional analysis of htrA, a protease involved in degradation of many foreign proteins, demonstrated that the suppression of acid tolerance response (ATR) leads to a decrease in amount of HtrA in L. lactis (12). The mRNA level of htrA increased several-fold during the transition to stationary phase of L. lactis MG1363 cultured in G5-M17 medium in comparison to the G5-3xM17 medium (more buffered medium) (13). It was observed that the mRNA level of streptokinase during ATR suppression (obtained from real-time PCR experiments) is actually lower than the mRNA level during fully developed ATR. Therefore, the increase in the accumulation of the recombinant protein may be due to enhanced translation rates or a reduction in proteolysis by other undetermined proteases. Eight strong promoters were identified by transcriptional analysis (RT-qPCR) where nisZ and ermC (Erythromycin-resistant proteins) were used as reporter genes (14).

They have been identified as four strong promoters (P8, P5, P3, and P2) based on their transcriptional activity. The synthesis of recombinant proteins in a microbial host was reported to be strongly influenced by the promoter selected for the expression. The screening of promoters was used to identify suitable candidate for protein production in L. lactis.


Transcriptome study of recombinant protein production in eukaryotic systems

Pichia pastoris
Pichia pastoris (P. pastoris) is also a commonly used expression system for heterologous protein production as it allows controlled expression of foreign genes as well as achieves higher cell density. Thus far, more than 200 different heterologous proteins have been expressed in P. pastoris. It also offers post-transcriptional modifications such as proteolytic processing, folding, disulfide bond formation, and glycosylation. These post-transcriptional modifications are required for proper folding of certain proteins (such as several vaccines).

Baumann et al. (15) have studied distribution of up- and down-regulated genes in hypoxic and with normoxic conditions in Pichia. They observed that number of up- and down-regulated genes were equally distributed under these conditions. They also examined the effect of substrate (glucose) on transcriptional level of peroxisome-related genes, revealing that the expression of house-keeping genes linearly increases with increasing specific growth in P. pastoris. They also reported that transcriptional induction of glyoxlate shunts enzymes (isocitrate lyase and malate synthase) at low growth rate and that genes related to translocation, glycosylation, protein folding, and the proteasome are mainly upregulated at higher specific growth rate. They also studied the effect of specific growth rate on intracellular proteolytic degradation and investigated regulation of these genes. The downregulation of 9 out of 11 ATG (autophagy-related) genes with increasing specific growth rate showed a decrease of protein turnover at faster growth, which may be advantageous for recombinant protein production. Transcriptional (microarray-based transcriptome analyses) study was performed in condition of aerobic, glucose-limited growth for P. pastoris at different growth rate (16), and it was reported that expression of several genes related to amino acid biosynthesis (29 out of 119) was affected by growth rate with approximately half of the genes up- and the other half down-regulated with decreasing growth rate. Based on this information, limiting amino acid can be supplied externally to improve the production of recombinant protein. This will help in optimization of medium composition to enhance specific recombinant protein production.

Sachharomyces cerevisiae
Sachharomyces cerevisiae (S. cerevisiae) is also an attractive organism for therapeutic protein production because it is non-pathogenic and has GRAS status. It is capable of post-translational processing (i.e., folding, glycosylation, phosphorylation, or removal of part of their initial sequence) of eukaryotic proteins and has a secretory expression system that simplifies the downstream protein purification process.

A transcriptome study was performed during respirofermentative growth (growth on glucose, ethanol, and early stationary phase) of native and recombinant S. cerevisiae strains for production of human super-oxide dismutase (rhSOD) (17). The aim of this study was to identify the level of transcription factor activities (TFAs) in a recombinant strain of S. cerevisiae. The researchers identified 45 transcriptional factors that showed a significant change in their activities at least in one of the three phases of diauxic growth (glucose, ethanol, and early stationary phase).

Chinese hamster ovary cell lines
Chinese hamster ovary (CHO) cell lines are the preferred hosts of production of complex proteins primarily because they offer efficient and accurate post-transcriptional modifications. These cells can be adapted to grow in serum-free (SF) suspensions, thereby facilitating protein production at large scale.

Jamnikar et al. (18) performed transcriptome profiling by DNA microarray and RT-PCR of the stable and unstable clones in CHO-Dhfr expression system. They identified certain marker genes that can predict stable expression of recombinant genes in particular clones early in the development stage. The CHO transcriptome reveals the presence of all genes that code for enzymes involved in the major steps in the N-glycosylation pathway in CHO cells (19). The transcriptional study revealed that the transcripts of heavy chain and light chain IgG genes were higher in comparison to other genes transcript in recombinant CHO cell line (20).

Another transcriptome study was performed in CHO cell line for a methotrexate-induced dihydrofolate reductase (DHFR) gene (21), and it was reported that the transcripts of product genes (the heavy chain and light chain of immunoglobulin G) were reached at high levels upon the selection of cells bearing the vector and increased little after amplification. This is contrary to the conventional notion that amplification serves to increase the copy number of the transgene and its transcript level. Transcriptome study also helps to estimate the quantity of degree of glycosylation and its pattern. It was reported that addition of sodium butyrate altered the sialic acid content of protein (22). Transcriptional study can reveal chemical medication of DNA and histone. Methylation on CG sites at CMV promoter resulted in decrease in productivity of certain molecules (23).



In this article, the authors have briefly presented some of the transcriptomics-based studies that have been reported across the microbial and mammalian host for recombinant protein production. It is evident that the key objective behind understanding transcriptomics is to get better insights in term of its physiology and gene expression level of various metabolic pathway about host during recombinant protein production. The information obtained from transcriptomics can be used to design better strains for recombinant protein production as well as to identify optimal conditions for upstream processing. Various applications of transcriptomics have been summarized in Table II. It is evident that there is a lot of untapped potential in this tool, and its appropriate exploitation can result in significant improvements in product titer as well as product quality.

Table II. Application of transcriptomics data.

At genetic level:

Strain engineering (Over-expression of rate-limiting gene or deletion of unwanted gene)

Use of strong promoter to improve the expression of downregulated gene

Identification of clone (higher producer or low producer)

Identification of rate limiting step (bottleneck) in pathway

Provides better picture of physiology

Evaluation of similarity between two organisms

Identification of degree of post-translational modifications (glycosylation, fucosylation)

Identification of level of protease in different culture condition

Determination of promoter strength

Development of model based on transcriptional study


At bio-process level:

Development of feeding strategy

Optimization of culture condition (temperature, aeration, pH)



1. S.C. Makrides, Microbiological Reviews 60 (3), 512–538 (1996).
2. S.S. Yazdani and K.J. Mukherjee, Biotechnology Letters 20 (10), 923–927 (1998).
3. L.C. Eaton, Journal of Chromatography A, 705 (1), 105–114 (1995).
4. A. Kuroda et al., Science 293, (5530), 705–708 (2001).
5. V. Schellenberger et al., Nature Biotechnology 27 (12), 1186–1190 (2009).
6. D. Petsch and F. B. Anspach, Journal of Biotechnology 76 (2), 97-119 (2000).
7. F. T. Haddadin, H. Kurtz, and S. W. Harcum, Applied Biochemistry and Biotechnology 157 (2), 124–139 (2009).
8. F. Baig et al., Biotechnology and Bioengineering 111 (5), 980–999 (2014).
9. H.E. Smith, Journal of Structural and Functional Genomics 8 (1), 27–35 (2007).
10. J.C.A. Bardwell, Molecular Microbiology 14 (2), 199–205 (1994).
11. B. Jurgen, Britta, Applied Microbiology and Biotechnology 55 (3), 326–332 (2001).
12. A. Miyoshi et al., Appl. Environ. Microbiol. 68 (6), 3141–3146 (2002).
13. K. Sriraman and G. Jayaraman, Applied and Environmental Microbiology 74 (23), 7442–7446 (2008).
14. D. Zhu et al., FEMS Microbiology Letters 362 (16), (2015).
15. K. Baumann et al., BMC Genomics 12 (1), 218 (2011).
16. C. Rebnegger et al., Applied and Environmental Microbiology 82 (15), 4570–4583 (2016).
17. C. Contador et al., Biotechnology Progress 27 (4), 925–936 (2011).
18. U. Jamnikar et al., BMC Biotechnology 15 (1), 98 (2015).
19. J. Becker et al., BMC Proc 5 Suppl 8, P6 (2011).
20. N.M. Jacob et al., Biotechnology and Bioengineering 105 (5), 1002–1009 (2010).
21. N. Vishwanathan et al., Biotechnology and Bioengineering 111 (3), 518–528 (2014).
22. S. M. Lee et al., Journal of Biotechnology 171, 56–60 (2014).
23. M. Kim et al., Biotechnology and Bioengineering 108 (10), 2434–2446 (2011).

Article Details

BioPharm International
Vol. 31, Number 6
June 2018
Pages: 22–28


When referring to this article, please cite as A. Rathore and A. Chauhan, “Transcriptomics and the Production of Recombinant Therapeutics,” BioPharm International 31 (6) 2018.


lorem ipsum