Accelerating Bioprocess Optimization

April 1, 2011

BioPharm International

Volume 24, Issue 4

A series of advancements has changed the way bioprocesses are developed and optimized.

The introduction of new technology often means a significant step forward in the performance of one subprocess, but only an incremental improvement in the full product development process. In the case of Next Generation Genomics (NGG) Technologies, a series of incremental technology advancements has brought about the ability to radically change the way bioprocesses are developed and optimized. Bioprocessing in pharmaceuticals and in industrial biotech have significantly different economic drivers, but both can realize significant economic benefit from the application of these new technologies. Pharmaceutical applications are driven foremost by the cost of development, regulatory approval, and compliance, and only secondarily by process productivity. In contrast, the primary market driver of industrial bioprocesses is productivity, particularly in commodity and biofuel applications. In this paper, we focus on the practical application to biopharmaceuticals, which need to increase emphasis on productivity of manufacturing due to the continual rise in health care costs, and the expansion of access to pharmaceuticals in developing countries. In addition, there is the potential for significant impact of NGG in the emerging FDA initiative, known as quality by design (QbD).

For new biologics to be profitable, they must be developed in a cost-effective manner and optimized to produce the highest possible titers. For existing biologics to remain profitable, especially with the emergence of biosimilars, they must be efficiently optimized in order to improve productivity and scales, with the resultant lowering of cost–of–goods. Remarkably, most of the research currently conducted uses outdated tools and is performed generally on model cell lines that have been subjected to numerous population doubling events that, over time, induce extensive genetic polymorphisms, ultimately decreasing product quality and process stability (1). When it comes to production of these newly developed biologics, total economic pressure is a key driver of success. Aside from the inherent complexity (structural, glycosylation, folding, stability, etc.) of the biopharmaceutical products themselves, bioprocess engineers are also faced with the intricacy of the production process itself. For each product, a cell line with sufficient production phenotypes has to be developed. Current strategies involve time consuming, labor–intensive steps, from the introduction of the product genes to the isolation and characterization of candidate clones. Cell-line development spans several months, or in some cases, years, and involves the screening of several hundred cell clones for high productivity before a few dozen are selected as candidate production lines. The process typically lasts for up to six months for each candidate before it can enter the evaluation phase, where its efficacy and safety in animal and human subjects are determined (2). Once the process has been established and approved, follow–on improvements become very costly, each of which must address FDA's requirements for quality and safety. In the past, insufficient knowledge of the biology of the production organism and the impact of the conditions it is grown under made it difficult to maintain stabile product quality attributes when variables had changed. We believe that NGG provides a modern and comprehensive approach to address this gap.

There is also a regulatory benefit to implementing NGG techniques. In an innovative and forward–thinking move, FDA's QbD initiative emphasizes the achievement of product quality by thorough process understanding, monitoring, and control. The approach allow manufacturers to identify critical process parameters (CPPs) and the direct effects they have on product quality. Adopting QbD principles and process analytical technology (PAT) guidelines, can help ensure an overall understanding of the bioprocess, ultimately assisting manufactures to achieve process robustness, stability and quality. PAT has been defined by FDA as a mechanism to design, analyze, and control pharmaceutical manufacturing processes through the measurement of CPPs that affect various critical quality attributes (CQA). The belief is that with more complete understanding comes the ability to not only develop products more quickly, but also to knowledgably and safely optimize products and processes downstream, while continuing to maintain higher levels of quality control than previously achievable.


In contrast to small molecules, the production of biopharmaceuticals is a process–within–a–process, with the cultivation process supporting the metabolic production process within the cell. In order to develop a detailed process understanding, one must understand the biology of the organism and its environment. In short, what is required is a systems biology approach, with a focus on building a solid understanding regarding the biology of the cells themselves. Ultimately, the cells are the primary production vehicles, and by following PAT guidelines and QbD principles, one can develop a clear understanding of the biology of the cells and control of the overall bioprocess. By identifying both genomic and metabolic factors that play into the production of therapeutic biologics, process engineers can identify CPPs and characterize the impact that each of the variable CPPs (i.e., media conditions, fermentation conditions, pH, temperature, dissolved oxygen content) has on titers and quality. Process engineers can identify how well these variables can be controlled, and subsequently establish the criticality of these variables within the overall bioprocess. By understanding these CPPs and their importance in the overall bioprocess (e.g., effects on product yield, quality, and process stability), process engineers can rapidly and effectively improve production titers and process robustness. We took the approach of initially focusing on the biology of the production cells. By focusing on the genetic changes associated with the overall bioprocess, one develops a very clear understanding of how changes in CPPs influence production yields and overall product quality. Once manufacturers understand the intricacies of the overall bioprocess, one can consider bioprocess flexibility.

For existing processes and products already in revenue–generating production, the cost of one change, even if it results in significant increases in yield and stability, can often be too high to be recuperated within the product lifecycle. Process changes, even if they could be made technically, are rarely made because of the costs associated with requalification and validation of the overall process. Bioprocess flexibility in the recent past has been associated with increased safety risks, primarily due to our lack of understanding of the actual biology of the production cells and how it affects process quality and stability. Flexible manufacturing, implemented with a systems–biology understanding, fits well within FDA's current QbD initiative and allows for ongoing process improvement. We believe NGG is a key tool for the acquisition and rational use of this understanding. In a flexible manufacturing environment, once a design space has been defined, using NGG and other tools, manufacturers will have the flexibility to make process changes within that design space with no prior approval from FDA regulators. Manufacturers that adopt this approach will be able to regularly improve their performance and efficiencies, while maintaining higher stability and quality control, and while also reducing the costs of recertification.


One such project that the authors have been involved with is a collaborative partnership between a major pharmaceutical company and ArrayXpress, a contract genomics services company. While the specific organism and target compound are confidential, the tools, techniques and processes utilized provide a great example for demonstrating the benefits of the NGG systems biology approach. The primary objective of the project under discussion was to increase production titers of an essential target compound used in the manufacturing process of a current large revenue generating commercial product. The secondary objective was to build knowledge that will allow for faster and more efficient manufacturing of other products using the same organism/expression platform. As with all bioprocesses, the organism itself is only a single variable influencing productivity, with many environmentally tunable variables making up the remainder. We have CPPs that influence production, but due to prior technology limitations, the manufacturing engineers did not know their full impact on the metabolic processes and production efficiencies of the cells. Therefore, these parameters had previously simply been lumped together as an unknown called "process variability" or "biological variability", and as such, their manufacturing was completely at the mercy of the process itself, with limited process stability and dramatic product titer variability.

By bringing together the cells and the CPPs in a systems model, we can now see the entire equation. The cells are the primary production machinery; therefore our approach was to evaluate the physiological condition and the state of the cells during the various media and fermentation development stages. We first generated a working hypotheses by developing a fishbone diagram that showed all the confirmed and putative CPPs associated with the overall target compound production process (see Figure 1). This allowed for the identification of critical areas to be characterized in more detail, which was subsequently experimentally tested.

Figure 1: Fishbone diagram showing all the confirmed and putative CPPs associated with the overall target compound production process.

Our approach was to design highly focused and statistically sound microarray experiments with complementing standard analytical chemistry tests. We wish to emphasize the importance of having a very well thought out experimental design and analysis strategy prior to project initiation. This approach made it possible to identify key genes and their associated molecular pathways that were differentially affected due to changes of various CPPs in the overall production process. The use of DNA microarrays provides a detailed qualitative snapshot of the state of the transcriptome at the time of sampling, somewhat like a molecular fingerprint, that can reveal subtle process variations in great detail. This approach is especially useful in time course experiments like the ones we faced, to determine whole transcriptome changes associated with different CPPs, monitored across different growth phases (different time points) of the cells during the media and fermentation optimization stages.

Strong bioinformatics, both in statistical design and data analysis and mining, are the next key to success. A particularly important aspect of statistical inference in high throughput problems, such as microarray experiments, is the assessment of statistical significance exhibited by the data in the presence of a tremendous multiplicity of hypotheses. A single experiment can involve tens of thousands of hypothesis tests. This assessment requires efficient estimation of experimental error and careful control of false discovery rates. We applied two interconnected analysis–of–variance models: A normalization model that accounts for experiment–wide systematic effects that could bias inferences made from the data on individual genes, and a gene model that is fit to the normalized data from each gene, allowing inferences to be made using separate estimates of variability. Expression differences are then parameterized as factorial effects in linear mixed effects models appropriate to the experimental design. These effects can be estimated efficiently using statistical softwaresuch as JMP Genomics or SAS PROC MIXED. Resulting least square estimates are then mapped onto their associated metabolic pathways using KEGG metabolic pathway maps ( in combination with proprietary software mapping tools (3).

The ability to map differentially expressed genes onto their associated biochemical pathways provides the opportunity to "zoom in" on each of the metabolic pathways associated with protein production. Key metabolites that are either depleted or produced are relatively easy to identify, but true process understanding comes from identifying how the compounds are used in the metabolic machinery. Amino acids, for example, could be depleted by translation, interconversion to other amino acids, or detoxification by the cell. Each of these routes has dramatically different impacts on cell health and productivity. With the application of NGG techniques you do not have to wait until the end of the project to begin seeing results. Each individual experiment contributes to the "systems" knowledge but in the short run provides specific information on variables that can be tuned for performance. Over the past three years we have completed numerous microarray experiments as part of our primary media and fermentation optimization objectives. A few examples will be highlighted here that will demonstrate the power of microarray technology to improve bioprocess stability and production yields as part of a larger NGG initiative.

In the manufacturing process of the target compound of interest, the original growth medium components were not well defined. As a result, different medium lots varied dramatically in protein yield and product titers. One of the primary objectives was to develop a chemically defined medium that would yield consistent titers. In our experiments, we evaluated whether stress response mechanisms of the production cells caused a reduction in titer during phase transition, and how media and fermentation conditions impacted these stress responses. We carefully designed time course experiments to cover transition through growth phases with trial versions of different defined media. Complimentary to this, we completed analytical chemistry tests to assign putative roles to transcription regulators that might be involved in stress response. By ultimately correlating differentially expressed genes of sigma factors with their associated biochemical pathways, we were able to optimize and change certain media components that led to improved protein production.

The real success of the microarray studies came, however, when we discovered a medium supplement that dramatically enhanced protein production for a chemically defined medium. This particular medium was characterized by high titers when monitored over time. However, at a particular stage during the fermentation process, protein yields suddenly dropped off. We carefully designed a time course microarray experiment, and mapped the resulting differentially expressed genes to their associated metabolic pathways. To our surprise, analyses of interconverting pathways led to the identification of a particular amino acid (see Figure 2). The ability to map differentially expressed genes to their associated pathways clearly made it possible to "zoom" in" and identify a key component that led to medium optimization and that targeted a truly essential amino acid. We also used microarrays to survey the dynamics of gene expression in media with varying productivities, as well as to examine process conditions that enhanced productivity. The results identified genes within the cells cultured in media yielding high product concentrations/titers that are related to growth and cell division and were expressed at significantly higher levels compared to those cells grown in media yielding lower titers. This enabled the cells to remain viable longer at the end of cultivation, when the cell concentration is highest, thus allowing more product molecules to accumulate.

Figure 2: Mapping differentially expressed genes onto their associated metabolic pathways led to the identification of a truly essential amino acid directly involved with increased protein titers.

Despite the utility and versatility of DNA microarray technology, it only provides for a qualitative picture of the overall transcriptome and only reveals the activity of genes for which probes are present on the array. Furthermore, based on many recent research reports for both prokaryotic and eukaryotic organisms, we know that cell physiology, as well as many functional cellular and biological processes, including cell cycle progression and induction and suppression of apoptosis, are not entirely dependant on the level of gene expression, but rather controlled by upstream regulatory regions and the increasingly important "non-coding" regions of the genome (e.g., microRNAs/small RNAs) (4,5). This is where NGG comes to the forefront.

One of the key technologies in NGG is Next Generation Sequencing (NGS). Recent technological advances in DNA sequencing have dramatically improved overall throughput and quality and have led to the development of methods to characterize whole transcriptomes of entire cell populations in a way that was never before possible (6–8). RNA–sequencing (mRNA–seq) involves the direct sequencing of complementary DNAs (cDNAs) using high throughput, massively parallel NGS technologies (Illumina's Genome Analyzer IIx; Illumina's HiSeq2000, Roche's 454 FLX system, to name a few), followed by mapping of the resulting sequencing reads to a reference genome (see Figure 3 for a detailed RNA–seq work-flow diagram).

Figure 3: A detailed breakdown of a typical mRNA-seq work flow. This work flow is for a bacterial species for which comprehensive genome information is available. Partially adapted from Wilhelm and Landry (2009).

In a single RNA-Seq experiment, one can derive not only an accurate, quantitative measure of transcriptome-wide gene expression levels (as with real–time quantitative polymerase chain reaction technologies), but also discover novel transcribed regions (new exons/genes) in an unbiased manner (as with a whole genome tiling microarray approach), map their boundaries, and identify the 5' and 3' ends of the genes (9,10). In addition, this methodology enables a global survey of the usage of the alternative splice sites (similar to a custom designed splicing microarray). It allows for the identification of transcription start sites, the identification of new splicing variants, and the monitoring of allele expression (9,10). Based on the power of the RNA–Seq approach, it is clear, that at least for comprehensive studies in higher eukaryotes where surveys of differential splicing activity, antisense transcription, and discovery of novel regions of transcription are desired, high throughput sequencing of RNA has augmented and is beginning to supersede microarray-based methods (9,10). Not only do the economics of faster development and better optimization support it, but it also allows for a host of new quality control and bioprocess monitoring capabilities after the research and optimization is completed. All of this dovetails perfectly with the intent of FDA's QbD guidelines.

We have developed methods to use various current and next generation genomics tools to increase the performance and cost effectiveness of bioprocesses during manufacturing. NGS technologies, in particular, have several potential applications in this scenario. These include the generation of valuable genomics resources, development of molecular fingerprints for improved product yield and quality and bioprocess monitoring, quantitative expression analysis (RNA–seq), and identification of metabolic bottlenecks, all leading to evidence–based bioprocess optimization (see Figure 4). For example:

Figure 4: Integration of NGG technologies to develop a species specific genomics platform that can be used to optimize bioprocesses, while building a solid understanding of the biology of the particular production species.

(1) Sequencing genomic DNA, mRNA, micro/smallRNAs, and immunoprecipitated DNA fragments from production strains or cell lines provides genomic resources that have a direct impact on understanding the overall biology of an organism. Specifically, it enables the understanding of gene regulation (e.g., the role of noncoding regulatory RNA elements and transcription factors/sigma factors in gene regulation) and genome structure and dynamics (chromosomal rearrangements, alternative splicing events, etc.). These resources have a direct impact on understanding the genome architecture of the production species, laying the foundation for intelligent, biology-guided process development.

(2) To develop functional gene-based markers, NGS of mRNA of contrasting phenotypes for the biomolecule of interest (for example, yield and quality) can be used to identify candidate genes involved in or associated with the production phenotype. These genetic markers can then be used as molecular fingerprints to assist with the selection of production lines, and to monitor process development and optimization with the goal of guiding early stage bioprocesses. At later stages, these molecular fingerprints can also be used to monitor process scaleup and manufacturing. Active monitoring throughout the entire process life cycle maximizes product yield and quality while minimizing associated costs.(3) Coupled to metabolic pathway analysis, RNA–seq of production strains/cell lines in different growth and environmental conditions sheds light on key metabolic pathways controlling biomolecule production and identifies potential metabolic bottlenecks. The ability to zero in on the control points governing metabolic flow towards increased production allows for process manipulation based on empirical knowledge instead of large DOE fishing expeditions and brute force methods.


Intelligently designed NGG experiments have become the hallmark of research and manufacturing design and optimization. With the advent of NGG technologies, the old and the new can be combined for very powerful results. Short term gains in production (in our case, sometimes more than 300%) were recognized from DNA microarray and/or mRNA-seq experiments that have provided foundational information for our NGG initiatives. We have dramatically improved our ability to rapidly optimize both growth media and fermentation conditions associated with the production of a key protein used in the manufacturing of a major commercial product. The technology has enabled us to better understand the overall bioprocess, as well as the physiology of the production cells themselves. We have incorporated various current and next generation genomics tools to form the basis of a bioprocessing genomics platform that will enable us to ultimately support FDA's QbD initiative. Not only will this improve our understanding of bioprocesses and the effect of all CPPs on protein yield, quality and process stability, but it will also make flexible bioprocessing possible and safe.

Len van Zyl*, PhD, is the CEO and CSO of ArrayXpress and a faculty member at NC State University. Michael Zapata III is the chairman of the board at ArrayXpress Inc.


1. A. Kantardjieff et. al., Biotechnol. Adv. 27, 1028–1035 (2009).

2. N.M. Jacob et. al., Chemical Engineering Progress 105 (11), 35–42 (2009).

3. M. Kanehisa, Trends Genet. 13 (9), 375–376 (1997).

4. P. Brodersen and O. Vionnet, Nat. Rev. Mol. Cell. Biol. 10, 141–148 (2009).

5. L.S. Water and G. Storz, Cell 136, 615–628 (2009).

6. N.M. Jacob et. al., Biotechnol. Bioeng. 105, 1002–1009 (2009).

7. D.J. Turner et. al., Mamm. Genome. 20, 327–338 (2009).

8. P.K. Wall et. al., BMC Genomics.10, 347–366 (2009).

9. Z. Wang, M. Gerstein, and M. Snyder, Nat. Rev. Genet.10, 57–63 (2009).

10. B.T. Wilhelm and J.R. Landry, Methods. 48, 249–257 (2009).