Leveraging Data Analytics Innovations to Improve Process Outcomes

Published on: 
BioPharm International, BioPharm International-11-01-2016, Volume 29, Issue 11
Pages: 18–22

New data analytics tools help solve complex problems in a biotherapeutic development process.

Scientists and engineers working with R&D, pilot-, and production-scale processes in the life-sciences industries need new ways to harness the potential of the data they gather to drive innovation. Crucial elements include:

  • Understanding the process

  • Identifying the appropriate process sensors and technology to acquire and store data

  • Connecting disparate data sources

  • Implementing data analytics and visualization applications to analyze and make changes for improving the quality and quantity of medicines produced.

Companies often capture the data needed to improve operations within data historians and other databases. Creating insight from this information, however, can be difficult, expensive, and time-consuming using traditional approaches such as spreadsheets. New data analytics applications can address these issues by providing:

  • A strong connection with data. Historians typically reside near the sensors and equipment. Laboratory information systems and other data stores are often spread across the business network. It is now possible to connect to all of these sources and view all relevant data in one application.

  • Automatic indexing of the sensor names or tags in the historian. Automatic indexing makes these data points easy to search, and simplifies access to related data.

  • A comprehensive connection to other data sources. The selected data analytics solution should include connectors to all main data repositories.

  • Rapid visualization by putting context in the data. New search tools provide much more than simple trending as they enable the use of visual patterns, process variable limits, and other features. This functionality gives users the ability to dimensionalize data with context from other data sources and provides more comprehensive insight.

  • Streamlined future analysis that facilitates collaboration. Work can be stored for reuse or shared with colleagues, either as a way to capture expertise, or in real time to enable distributed discussions across an organization.

Comprehensive data analytics and visualization strategies enabled by tools offering the ability to search and interact with past and present time-series data in real time allow drug companies to make business-critical decisions with more confidence.

In this article, the author shows how this type of a data analytics solution was implemented in an upstream bioprocessing application (1).

Scale-Up Issues

Scale up of a new upstream bioreactor process often happens across a multitude of equipment sizes. The environment for the cell culture can vary greatly, beginning with milliliter quantities in shake flasks and through to the 2-3 L bench-scale bioreactors, then the 100-1000 L pilot-scale bioreactors and eventually commercial scale. With a changing micro-environment, there is potential for a new physical situation to arise, requiring a review of the process conditions needed for successful production of the desired protein.

In this example, protein degradation was observed at the production scale (1000 L), which had not been observed previously at the smaller equipment bioreactor scales. A low molecular weight species was appearing over time in the production bioreactor, resulting in a low concentration of the desired protein (Figure 1 top). The viable cell density and the high-performance liquid chromatography (HPLC) titer data, as shown at the bottom of Figure 1, suggested the culture was successful at the 1000-L scale, while in reality the process needed modifications to achieve the desired final product concentration before scale-up could continue. In response to the issue, significant resources were quickly deployed to develop and test science-based hypotheses using multiple master cell bank vials and 3-L and 100-L bioreactors.

Figure 1. Top portion of figure indicates evidence of a protein degradation issue at the 1000L scale, and the upstream process data graphs in the bottom of the figure show target viable cell density and titer achieved at the 1000L scale. [All figures courtesy of author]

With high-speed and high-throughput information, there is a need for high-speed data processing, visualization, and reporting. Collecting the sum of the additional upstream data, downstream data, and corresponding off-line analytics data is a typical challenge often found in scale ups. But in this case, the manual and laborious spreadsheet-based approach confounded the efforts of scientists and engineers to derive insight from their data.

Data Strategy

Troubleshooting begins with having a well-defined and well-understood physical situation. Without this vision, it’s not possible to select the right data for analysis. In addition to understanding what data are critical, it is equally important to have appropriate automation of sampling, electronic storage of data, and connectivity among historians and other data repositories.

Choosing the right data connectivity, aggregation, and analytics components is critical. Often the pain of gathering the data doesn’t warrant the time and attention it takes to pull all of the information together. Unfortunately, this lack of data and insight often leads to additional time-intensive experiments that don’t effectively leverage knowledge from the past.

The approach taken to address these issues and enable rapid troubleshooting in this case included:

  • Define the physical situation to determine the key physics involved.

  • Identify the key variables and determine all of the data streams both on-line time-series data, such as O2 flow rates, glucose addition rates, acid or base addition rates, temperatures, and pH, and off-line contextual data, such as integrated viable cell density, titer, and media component concentrations.

  • Recognize how and where these key data are collected and stored.

  • Leverage an effective data analysis, visualization, and reporting application alongside lab-scale and pilot-scale experimentation.

Specifically in this case study, the time-series data from a DeltaV historian (Emerson Automation Solutions) was accessed by data analytics software (Seeq). Multiple analytical devices including the Vi-Cell cell counter (Beckman Coulter) provided integrated viable cell density data, and the NovaFlex Bioprofile Flex instrument provided key media data (Figure 2).

Figure 2. A historian (bottom right) is often used to store collected data, with data analytics software interfacing to the historian to provide insight.

Analytics Provides Answers

Through implementation of Seeq, alongside investments in the appropriate historians and databases, the company was able to avoid the manual, time-consuming data investigation and analysis typically required.

Instead of trying to tackle data analysis in spreadsheets, which is a difficult and time-consuming process for the thousands of data points created in just a few bioreactor runs (i.e., contextual batch data for 6–10 bioreactors, plus data from five to six offline pieces of analytical equipment with multiple data points per run, plus thousands of online trending data points), the company’s scientists and engineers were able to quickly assess in a matter of minutes, as opposed to several days or even weeks, what was happening at the cell-level and the process-level for multiple scales and operating conditions.

In cell culture, important relationships may include the glucose feed rates (or media changes) and the resulting impact on cell growth and productivity, the acid/base addition rates for controlling pH, and the media feed strategy and impact on final titer. The analysis yielded insights into these relationships by providing visualization of individual process variables, and also by using an internal calculation engine to determine relationships such as cell-specific oxygen uptake rates (Figures 3 a, b, and c), an important metric when comparing the micro-environment across equipment scales.

Figure 3a. Oxygen flow-rate data plotted directly with Seeq from the DeltaV historian for three 3L lab-scale bioreactors and the 100L pilot-scale bioreactor.

In the example outputs shown in Figures 3a, b, and c, each of the required variables was visualized and the resulting calculations were developed, all within a single data analytics environment, while leaving the original data untouched in their original location.

Figure 3b. Oxygen flow-rate data overlaid with integrated viable cell density data from a separate data source, either a.csv file or an SQL database depending upon the specific analysis.

First, oxygen flow rate data were displayed for three 3-L bioreactors, as well as for the 100-L single use bioreactor. Next, integrated viable cell density data were displayed for each bioreactor. Using the Seeq calculation engine, the cell-specific oxygen uptake rate was calculated for each bioreactor.

Figure 3c. Calculated cell-specific oxygen uptake rate resulting from ability to overlay data from disparate data sources and use a calculation engine, all within the same Seeq analytics environment.

Using this newly implemented data strategy, additional work was done to review biological growth and productivity data, while testing the remaining science-based hypotheses. Additional process parameters were investigated along with off-line analytics data; allowing plant personnel to assess the impact of several parameters on the bioreactor process.


A strategy using data analytics software demonstrated that factors affecting process robustness and product quality can be rapidly identified, enabling definition of key performance indicators from development through scale up.

Key elements in a successful data strategy include:

  • Ensuring a comprehensive connection with all data historian(s) and other important databases, including automatic indexing of the sensor names/tags in the historian to make it easy to search for and access related data.

  • Enabling rapid visualization of the time-series data over a designated period of time, in context with data obtained from off-line analytics, is crucial in supporting rapid problem-solving, small-scale model verification, and future process development.

When implemented, the data strategy described herein provides a positive twist on pipeline development. From a business perspective, the appropriate data strategy supports the goals of reducing rework, thus requiring fewer resources per molecule in the pipeline. Reduced experimentation, more rapid data investigation efforts, and more effective use of resources can then lead to improved cost management, and more importantly to higher production quantities of quality medicines.


The author wishes to thank T. Barreira at Merrimack Pharmaceuticals, Inc. for her outstanding support and technical contributions to this article.


1. L.J. Graham and T. Barreira, “Leveraging a Data Strategy with Seeq to Create the Optimal Biotherapeutic Development Process,” poster presentation at the Bioprocessing Summit Conference, Boston, MA, August 15-19, 2016.

Article Details

BioPharm International
Vol. 29, No. 11
Pages: 18–22


When referring to this article, please cite it as L. Graham, "Leveraging Data Analytics Innovations to Improve Process Outcomes," BioPharm International 29 (11) (Novembe 2016).