Figure 3

A flowchart that could be used in selecting an appropriate method is shown in Figure 3. The use of this flowchart is demonstrated
in the example application that follows.
AN EXAMPLE APPLICATION: COANALYZING BENCH AND SCALE DATA
To demonstrate the approach shown in Figure 3, we present an example for a chromatography step used in purifying a recombinant
protein.
Table 2. Distribution of data by scale

The full data set from this step consists of 76 observations distributed by scale (Table 2). EOR studies (1X OR bench) are
performed at the upper and lower ranges of the OR for a given OP. ROB studies (3X OR bench) are performed at three times the
upper and lower range of the OR.
Values for the OPs are unknown for the largescale GMP and nonGMP runs, but known for the bench studies. Suppose we want
to construct a tolerance interval to establish a VAC for a purity PP using only the three largescale GMP runs. Equation (1)
can be used to compute a tolerance interval that contains 99% of the population with 95% confidence using the three GMP values
in the data set. From the data, the sample mean of the three observations is Y mean = 89.5, the standard deviation is S = 1.68, c = 3, r = 2, α = 0.05, and p = 0.99. This provides:
Z_{(p+1)/2} = Z_{(0.99+1)/2} = Z_{0.995} = 2.576
and
χ^{2}
_{r,a} = χ^{2}
_{2,0.05} = 0.1026
so that the computed value of k
_{2} is
The resulting tolerance interval is the following:
89.5 – 13.1(1.68) = 67.4 (lower limit) to
89.5 + 13.1(1.68) = 112 (upper limit)
This is a relatively wide interval because of the large value for k
_{2} due to the small sample size. Thus, the use of benchscale process characterization data is useful for shortening this interval
to a more informative width.
Table 3. Regression of three operating parameters (OPs ) on a performance parameter (PP)

Table 3 reports a regression model for the PP based on a set of three OPs using the entire set of 76 data points. Note that
this data set includes data from both the 1X and 3X ranges of the OPs, as well as the largescale data. The OP values for
the largescale runs are taken as the setpoint values. The OPs have been coded to have value zero at setpoint conditions.
A statistical test of equal means provides no evidence that the mean of the largescale GMP runs differs from the other scales.
Statistical tests of equal variance among the different groups of data fail to disclose any evidence of a difference in spread.
The root mean squared error (RMSE) in Table 3 is 1.64, which is relatively close to the standard deviation of the clinical
runs (1.68). This suggests that combining the GMP data with the rest of the data set is reasonable and will provide a better
estimate of the standard deviation. By combining the data, the value of r increases from 2 to 72 and the tolerance interval is shortened accordingly.
