Statistical Equivalence Testing for Assessing Bench-Scale Cleanability - The two-one-sided t-test compares the equivalency of two data sets. - BioPharm International
Table 1. Upper and lower confidence limits of the difference between two groups as determined using the two-one-sided t-test
(TOST)
Two protein products were cleaned using the bench-scale method. A total of 18 data points (for cleaning time) were recorded
for each product. Commercially available statistical software (JMP) was used to perform the TOST analysis.12 The one-way analysis "Fit Y by X" function was used with a set alpha level (probability of type 1 error) of 0.1, which represents
the 90% confidence interval discussed earlier. Figure 3 shows the distribution of cleaning times for the two products. The
box and whisker plot (in red) represents the range and distribution of the data points. The box contains the middle 50% of
the data and the line across the middle of the box represents the median of the data set. The difference between the quartiles
is the interquartile range. Each box has whiskers that extend from the edge of the box to the outermost data point that falls
within the boundary defined by upper quartile + 1.5*(interquartile range) and lower quartile –1.5*(interquartile range).
Table 1 shows the output of the TOST analysis performed using JMP. The difference between two group means represents the point
estimate of the true difference between the two means. This can be calculated by subtracting the sample mean for data set
A from the sample mean for B. The standard error (SE) of the difference between two group means can be calculated by applying
the following equation:
in which sA is the standard deviation of group A, nA is the sample size of group A, and sB and nB represents the corresponding values for product B. This value provides an estimate of the variability of the difference between
the two data sets. The degrees of freedom are adjusted based on the variability of each data set, which is determined by the
statistical software (JMP) using the Satterthwaite approximation.11 The 90% confidence interval for the difference between two means is reflected by the upper confidence limit difference of
70.36 and the lower confidence limit difference of 62.91 of the two group means. Because the equivalence limit is ±4.48, and
the upper and lower confidence limit of the difference between two means fall outside the set equivalence limit, it is concluded
that product A and product B are not equivalent. Based on the average cleaning time and confidence interval, product B is
considered more difficult to clean than product A.
In this case study, the products failed to meet cleanability equivalency mainly because of the large difference (66.64 min)
in the mean cleaning times, as shown by the blue bar in Figure 2. It is also possible to fail the equivalency test when the
two group means are similar but product B has a high degree of variability, resulting in broad confidence intervals as the
one shown by the red bar in Figure 2. In such a scenario, the variability in product B should be further evaluated and the
outcome of the cleanability ranking (B<A or B>A) can be made based on an appropriate risk assessment and business considerations.
Case Study 2: Product A and Y are Equivalent
Figure 4
The TOST analysis, as described in the previous case study, was repeated for two other products. Figure 4 shows the distribution
of cleaning times for these two products: A and Y.
Table 2. Upper and lower confidence limits of the difference between two groups as determined using the two-one-sided t-test
Table 2 shows the output of the TOST analysis using JMP. The 90% confidence interval for the difference between two means
is reflected by the upper confidence limit difference of 1.5547 and the lower confidence limit difference of 0.0564 of the
two group means. Because the equivalence limit is ±4.48, the upper and lower confidence limits of the difference between two
means fall within the equivalence limit. It is therefore concluded that product A and product Y are equivalent to each other
in terms of cleanability.