Of course, it would be possible to create a bivariate analysis for every conceivable pair of parameters, but it would be extremely
inefficient to do so. Also, this method cannot uncover interaction effects, which could be important. Thus, bivariate analysis
can be used as a prioritization step to reduce the number of variables for further study.
Multivariate analysis. Having identified several potentially critical variables through bivariate analysis, we applied multivariate analysis to
look simultaneously at many variables that drive yields.
Multivariate analysis can be done through many commercially available software packages. The goal is to create statistically
valid models that describe the behavior of output variables (Ys) from input variables (Xs). A word of caution: these methods
require some advanced statistical training to produce valid results.
Figure 3. Multivariate analysis of three variables.

In this case, multivariate analysis of historical data revealed three variables that accounted for 76% of the variability
in yield. Further, the analysis showed that all three variables were related to raw materials.
Figure 3 shows the results of the multivariate analysis. A perfect model, which would account for 100% of the variability
in yield, would show all of the data falling on the solid diagonal line.
BEYOND PURELY HISTORICAL DATA: DESIGN OF EXPERIMENTS
All of the previous steps focus on historical data. The next step—Design of Experiments (DoE)—uses the results of that historical
analysis as a foundation on which to begin building experiments that will further refine understanding of the process. However,
the three important variables uncovered by the multivariate analysis are not the end of the story. Before embarking on a DoE
study, you should make sure that you haven't overlooked other variables that might be important.
For example, in biopharmaceutical manufacturing, variables that are typically important include mixing time, temperature,
and the size of the bioreactor. Any such variable that you know to be important in a particular case should be included in
the DoE study. In addition, you should solicit the input of personnel who are thoroughly familiar with the process and the
product. For example, it might emerge from these conversations that certain variables known to be important were kept at the
same setting for all batches. As a result, those potential variables might not have shown up as important in the previous
analysis. Nevertheless, they should be included in the statistically designed experiments.
Once you've determined all of the relevant variables to be included, you then design experiments that are scaled correctly,
balanced, orthogonal, and correctly sized. You must also have a statistically valid sampling plan to make sure you're making
the right number of measurements in each batch. Then you can carefully execute the experimental trials.
The DoE study must include any parameters that were found to be important as a result of the bivariate and multivariate analyses.
It should also include other variables that are found to be important in similar processes. Depending on the number of variables
to be studied and the amount of variability one might expect to see in the response variables (CQAs), the DoE can be designed
and sized.
For PharmaCo, this DoE might include batches representing different levels of raw material properties: RM1, RM2, and RM3.
Since we are referring to only three variables, these could be arranged in a central composite design consisting of as few
as 16 trials.
Figure 4. A contour profiler matrix plot.

Using widely available statistical software, you can then perform multiple regression analysis on the results of the DoE to
arrive at estimates of the effects of variables on the CQAs, their interactions, and their statistical significance. The model
can then be refined manually based on such diagnostics as risk probability and others. The final regression model for yield
should then have an excellent fit to the data—indicating that the model explains nearly all of the variation in the desired
outcome. You can then produce a multidimensional picture of the Design Space (Figure 4).
