Use of Multivariate Data Analysis in Bioprocessing

Published on: 
BioPharm International, BioPharm International-06-01-2015, Volume 28, Issue 6

The authors review major developments in use of MVDA in bioprocessing applications.


The ever increasing demand of biotherapeutics, together with the pressure to contain healthcare costs have motivated biotech manufacturers to focus on process optimization (1). Of a variety of approaches, use of advanced sampling techniques, new sensor technologies, and analyzers has emerged as a topic of interest to the scientific community at large (2). Implementation of these tools, however, inadvertently results in large complex datasets with underlying multivariate interactions. Further, any optimization efforts targeted to improve product yield or productivity need to be carefully monitored for any possible negative impact on a product’s safety and/or efficacy. To achieve this, a tool that can effectively deal with these complexities and extract the relevant information from these highly correlated multivariate data sets is required. Multivariate data analysis (MVDA) has emerged as a significant enabler in this regard (3-5). 

A thorough search on use of MVDA reveals its applicability in fields as diverse as polymers, semiconductors, food, and environment (6-8). 

The biopharmaceutical industry, however, has accrued greater benefits as evidenced by publications highlighting the use of MVDA tools both in upstream as well as downstream processes (9). This is in part because of the inherent complex nature of the datasets generated by the biopharma industry that make extraction of meaningful and relevant information a difficult task (4).

The increasing use of MVDA has also been fueled by the increasing acceptance of quality by design (QbD) and process analytical technology (PAT) among regulators and the biotech industry. Implementation of these initiatives requires enhanced process and product understanding (3). 

Among many others, some of the common applications of MVDA include analysis of data originating from spectroscopic measurements, analysis of data profiles from unit operations such as cell culture and chromatography, quantitative assessment of process comparability, root cause analysis, and raw material characterization (3, 4).

In this 33rd article in the “Elements of Biopharmaceutical Production” series, the authors review major developments in use of MVDA in bioprocessing applications that have occurred in the past five years. A few examples have been provided to illustrate the usefulness of MVDA in this context to the readers. 

Multivariate data analysis

In a recent publication, a step-by-step procedure for performing MVDA of bioprocessing data was presented (9). 

The proposed approach has been illustrated in Figure 1. Prior to analysis with MVDA software such as SIMCA (Umetrics AB, Kinnelon, NJ), the data are assembled in a systematic manner in Microsoft Excel. This step is followed by preprocessing of the data wherein the raw data are converted into units/scales that allow direct comparison of measurements for different samples. Subsequently, data are analyzed by employing different data reduction approaches such as principle component analysis (PCA) and partial least squares (PLS) regression for analysis and modeling of the dataset. While PCA gives in-depth information with regard to the structure of the dataset at hand, PLS is effective in analyzing covariance between process variables and process outcomes.

Figure 1: Flowchart illustrating the general workflow of multivariate data analysis (MVDA) of bioprocessing data.


MVDA applications in cell-culture operations

Cell-culture operations are by far the most common platform used for production of protein therapeutics. In this endeavor, various advanced offline and online measurement tools are used to ensure a consistent process and product quality. The ensuing datasets, however, form measurements that are quite large in size and complexity. Applications involving multivariate data analysis of these complex data sets to extract relevant information could be aimed at process monitoring in a manufacturing setting by detection of process faults or deviation, enhancing understanding of any underlying relation or interaction between process variables and the product and process attributes. To this end, the literature is replete with works of many researchers engaged in various fields as diverse as environment to food to production of therapeutic proteins. This article focuses on MVDA-based applications in bioprocessing.

Researchers have applied PCA to monitor a bioreactor producing Penicillin acylase in Bacillus  by generating online multivariate control charts (10). The dataset contained information exclusively related to the process, and therefore, MVDA models were used to assess process performance. This in turn enabled detection of process faults and deviation, highly desirable for process monitoring in commercial manufacturing. Another group employed unsupervised PCA and PLS to analyze data from inline Fourier Transform-near-infrared (FT-NIR) spectroscopy of a mammalian cell culture process for identification of batch homogeneity between lots and detecting abnormal fermentation runs (11). In similar work, researchers have succeeded in observing compositional changes and predicting product yield by implementing fluorescence spectroscopy in conjunction with an MVDA approach of multi-way robust principle component analysis (MROBPCA) and n-partial least squares discriminant analysis and regression (NPLS-DA and NPLS-R) (12). All these applications involve reduction in multidimensionality of these datasets to a lower number of uncorrelated variables that can explain most of the variance obtained in the original data. They demonstrate the potential of MVDA to become an integral part of upstream process control by effectively eliminating the major sources of variability, thereby leading to significant improvement in consistency with respect to process performance and product quality (13).

Researchers have monitored performance of mammalian (Chinese hamster ovary [CHO]-based) bioprocesses using raman spectroscopy (14). The authors used MVDA algorithms such as Competitive adaptive reweighted sampling (COadReS) and ant colony optimization (ACO) to remove the unnecessary spectral information. COadReS and ACO are variable selection methods that are used to enhance the predictive ability of the chemometric model by removing unnecessary spectral information. In yet another application, researchers have used Raman spectroscopy and MVDA for online control of a Saccharomyces cerevisiae-based fermentation process (15). They demonstrated that MVDA of spectra can be used for effective fault detection. Similarly, application of surface-enhanced Raman scattering (SERS) spectroscopy for simple and fast analysis of cell-culture media degradation has been demonstrated (16). In this work, chemometric tools were used to rapidly monitor compositional changes in the chemically defined media, and the authors concluded that significant chemical changes in terms of cysteine/cysteine concentration occur even when media are stored in the dark at 2-8 °C.

A general workflow for building and assessing MVDA regression models for the quantification of multiple analytes in bioprocesses by Fourier Transform Infrared (FTIR) spectroscopy has been recently presented (17).

The authors specifically assessed the suitability of quantification of Penicillin V and phenoxyacetic acid with online high-performance liquid chromatography (HPLC) and MVDA tools like PLS and multivariate curve resolution-alternating least squares (MCR-ALS). Other researchers have also successfully used FT-NIR spectroscopy coupled with MVDA for qualitative and quantitative analysis in solid-state fermentation (SSF) of protein feed (18). They integrated approaches such as discrete wavelet transform (to filter the raw spectra to extract latent information), PCA (to explore structures with time course of SSF) and extreme learning machine (ELM) modeling (for model calibration). Methods such as ordinary least square (OLS), principle component regression (PCR), and non-negative matrix factorization (NMF) have been used to extract the spectrum of a pure component from NIR spectra containing a known diluent (19). A hybrid electronic tongue system based on the various potentiometric/voltammetric sensors and appropriate MVDA techniques has been used to provide correct qualitative and quantitative classification of the samples collected during standard Aspergillus niger culture and culture infected with yeast. 

More recently, researchers have applied MVDA toward analysis of early bioprocess development data for achieving increased understanding of a PER.C6 cell cultivation process (20). The authors reported the application of PCA identified causes for batch deviations and revealed process differences between the 2-L and 10-L batches that were previously considered comparable.

For the purpose of illustrating further, Figure 2 presents results from MVDA modeling of data from a cell-culture process performed at 2 L and 2000 L scales. Several input parameters (pCO2, pO2, glucose, pH, lactate, ammonium ions) and output parameters (purity, viable cell density, viability, and osmolality) were evaluated in this analysis. While loading plots and variable importance plot (VIP), plots were used for assessing scale-up and comparability of the cell-culture process, batch-control charts aided in fault diagnosis during routine manufacturing (not shown). Figure 2 suggests changes with respect to parameters like osmolality and ammonia levels, indicating altered cell-culture performance upon scale-up. The change in osmolality was attributed to the buildup of CO2 as a result of less efficient gas transfer upon scale-up. It was shown that while the relative importance of variables remained unchanged for most variables, an exception was the pO2 level, which had a more significant impact at large scale (highlighted in Figure 2).

Figure 2: Multivariate data analysis and modeling of representative data from small-scale (2 L) and large-scale (2000 L) batches of a cell-culture process. Adapted with modification from Ref. 21.

Thus, the usefulness of MVDA in supporting key activities of successful manufacturing of biopharmaceutical products including scale-up, process comparability, process characterization, and fault diagnosis is highlighted.

Figure 3 illustrates an example of use of near-infrared spectroscopy/multivariate data analysis (NIR-MVDA) for screening of lots of basal medium powders based on their impact on process performance and product attributes. A uniform composition for all the lots manufactured at different scales using identical process condition was claimed by the supplier. However, some variability of the raw material lots was evident from the NIR spectra in the 4000-7000 cm-1 wavelength region. Upon application of the MVDA to the spectral data, different groupings of media components during the milling and blending process at different scales of operation were observed, thereby attributing the source of variation among different raw material lots to uniformity of blending, impurity levels, chemical compatibility, and/or heat sensitivity during the milling process for batches of large-scale media powder. This approach made it possible to fingerprint the raw materials and distinguish the performance between good and poor media lots.


Figure 3: Use of near-infrared spectroscopy (NIR-MVDA) for screening of lots of basal medium powders. Legends: Group I: Chemically defined media + hydrolysate, Group II: hydrolysate, Group III: Three different chemically defined basal powder types, Group IV: Feed powder. Group A: small-scale media batch group, Group B: Large-scale media batch group. Adapted with modifications from Ref. 22.

Chemometrics in downstream operations

A cursory review of the literature indicates that MVDA applications in downstream bioprocessing are quite fewer than those in upstream processing. The following section discusses the various applications that have been published. 

Recently, researchers have described an application of MVDA towards development and optimization of a reversed phase (RP)-HPLC method for separation of metaxalone from its hydrolytic impurities (23). A mathematical model depicting the relationship between the experimental variables with the response for RP-HPLC method for separation of the said compounds was developed in that study. In yet another application, chemometrics has been applied to predict column integrity and impurity clearance during reuse of the chromatographic resin (24). The authors in that study presented a methodology with the aid of chemometric tools to predict column underperformance at the manufacturing scale over product lifecycle. This approach allows the operators to unpack and repack the column beforehand without risking batch loss.

Single ultraviolet absorbance is routinely used for monitoring protein purification process. With the aim to develop a simple, fast and cost-effective methodology for protein quantification, researchers have applied PLS to quantify a protein mixture in chromatographic separation using multi-wavelength UV spectra (25). The proposed approach had sufficient sensitivity (Relative error was 4.8%, 12.0%, and 6.8% for the three proteins monitored) and accuracy (RMSEP was 0.036, 0.088, and 0.049, respectively, for the monitored protein) vis-a-vis monitoring using single-UV absorbance for estimation of protein concentration in a mixture. This approach can be readily applied to various kind of protein purification processes to achieve consistent process performance and product quality. In a similar application, researchers have addressed the issue of late detection of irregularities in a chromatographic process (caused by offline analytics) by using a selective inline quantification of co-eluting proteins in chromatography (26). This was achieved by employing PLS of spectral data.

Chemometrics for assessing process and product comparability


A series of papers demonstrating the utility of MVDA in establishing comparability of both the process and product have been published previously (25-28). Establishing comparability forms an integral and crucial aspect of biosimilar development. In fact, the underlying essence of biosimilar development is that if the manufacturer is able to demonstrate high similarity between innovator and their biosimilar version, the regulatory authorities may consider the approval of the drug for market authorization with minimal clinical studies data.

A quantitative approach has been presented using various chemometrics algorithms for assessing process comparability of two different process versions and successfully identifying the unit operations where differences existed (27). The approach can be applied to make comparisons across different phases of manufacturing (i.e., Phase I vs. Phase II vs. Phase III vs. commercial) and in support of various key activities related to product commercialization (process scale-up, technology transfer, and process improvement). Specifically, application of MVDA was assessed for examination of process comparability and identification of unit operation and parameters responsible for the variability in a process comprising of nine unit operations, and the data were accrued from 229 batches. PLS-DA of the data exhibited clustering between the two datasets, indicating major differences between the two sets (Figure 4A). PLS-DA VIP plot for the entire dataset as well as that for the production scale revealed important parameters that are deemed responsible for the separate clustering observed among the datasets (Figure 4B). Parameters with VIP >1 are considered significant.

Figure 4: A) Typical platform process for mAb production. B) (B1)Application of multivariate data analysis (MVDA) for establishing process comparability across lab and production scale. (B2) PLS–DA VIP plot for the entire dataset (229 batches). (B3) PLS–DA score plot showing two clusters of batches, red cluster belong to dataset 1 and green belong to dataset 2 for the production fermenter step. (B4) PLS–DA VIP plot showing important parameters responsible for data clustering in the PLS–DA score plot of production fermenter. C) Final approach for assessing comparability. PLS–DA is partial least squares discriminant analysis.)



With continued efforts towards implementation of QbD and PAT in the biotech industry, MVDA is expected to continue to serve as an enabler of this change. The authors see great scope for further refinements and improvements in MVDA modeling approaches to widen its applicability towards analysis of structured datasets, and perhaps in the future, for unstructured datasets as well.


1. F. Li, N. Vijayasankaran, A. (Yijuan) Shen, R. Kiss, and A. Amanullah, mAbs, 2 (5), pp. 466-479 (September 2010).

2. A. S. Rathore and R. Mhatre, Eds., Quality by Design for Biopharmaceuticals (John Wiley & Sons, Hoboken, NJ, 2009).

3. A.S. Rathore, N. Bhushan and S.Hadpe, Biotech Prog. 22 (2), pp.307-15 (March-April 2011).

4. S. M. Mercier, B. Diepenbroek, R. H. Wijffels, and M. Streefland, Trends Biotechnol. 32 (6), pp. 329-36 (June 2014).

5. S. Charaniya, W.-S. Hu, and G. Karypis, Trends Biotechnol. 26 (12), pp. 690-9 (December 2008).

6. N. Abu-Khalaf, S. Khayat, and B. Natsheh, Science and Technology 3 (4), pp. 99-104 (2013).

7. E. Frauendorfer, A. Wolf, and W. D. Hergeth, Chemical Engineering & Technology 33 (11), pp. 1767-78 (November 2010).

8. I. Miletic et al., J. Process Control 14 (8), pp. 821-836, December 2004.

9. A. S. Rathore, S. Mittal, M. Pathak, and A. Arora, Biotechnol. Prog. 30 (4) pp. 963-973 (April 2014).

10. E.R. Nucci, A.J.G. Cruz, and R.C. Giordano, Bioprocess Biosyst. Eng. 33 (5), pp. 557-564 (2010).

11. M. Clavaud et al., Talanta, 111, pp. 28-38 (July 2013).

12. P. W. Ryan et al., Anal. Chem., 82 (4), pp. 1311-7 (February 2010).

13. J. Gomes, V. R. Chopda, and A. S. Rathore,  J. Chem. Technol. Biotechnol. (December 2014).

14. B. Li, B. H. Ray, K. J. Leister, and A. G. Ryder, Anal. Chim. Acta 796, pp. 84-91 (September 2013).

15. T.C Avila et al. Biotechnology Progress, 28 (6), pp. 1598-1604 (November-December 2012).

16. A. Calvet and A. G. Ryder, Anal. Chim. Acta 840, pp. 58-67 (August 2014).

17. C. Koch et al., Anal. Chim. Acta  807, pp. 103-10 (January 2014).

18. H. Jiang, G. Liu, C. Mei, and Q. Chen, Anal. Methods 5 (7), p. 1872-1880 (2013).

19. A. K. Baikadi et al., “Extraction of pure component spectrum from mixture spectra containing a known diluent.” Preprints of the 10th IFAC International Symposium on Dynamics and Control of Process Systems, Mumbai, India, Dec. 18-20, 2013.

20. S. M. Mercier et al., J. Biotechnol., 167 (3), pp. 262-70 (September 2013).

21. A. O. Kirdar, J. S. Conner, J. Baclaski, and A. S. Rathore, Biotechnol. Prog. 23 (1), pp. 61-7 (2007).

22. A. O. Kirdar, G. Chen, J. Weidner, and A. S. Rathore, Biotechnol. Prog., 26 (2), pp. 527-31 (2010).

23. P. K. Sahu and C. S. Patro, J. Liq. Chromatogr. Relat. Technol., 37 (17), pp. 2444-2464 (May 2014).

24. A. S. Rathore, S. Mittal, S. Lute, and K. Brorson, Biotechnol. Prog., 28 (5), pp. 1308-14 (2012).

25. M.-H. Kamga, H. Woo Lee, J. Liu, and S. Yoon, Biotechnol. Prog., 29 (3), pp. 664-71 (2013).

26. N. Brestrich et al., Biotechnology and Bioengineering, 111 (7), pp. 1365-73 (July 2014).

27. A. S. Rathore, S. Mittal, M. Pathak, and V. Mahalingam, J. Chem. Technol. Biotechnol. 89 (9) pp. 1311-1316 (2014).

28. N. Bhushan, S. Hadpe, and A. S. Rathore, Biotechnol. Prog. 28 (1), pp. 121-8 (2011). 

Article DetailsBioPharm International
Vol. 28, No. 6
Pages: 26–31
Citation: When referring to this article, please cite it as A. Rathore and S. Singh, “Use of Multivariate Data Analysis in Bioprocessing,” BioPharm International 28 (6) 2015.