Statistical Equivalence Testing for Assessing Bench-Scale Cleanability - The two-one-sided t-test compares the equivalency of two data sets. - BioPharm International

The null hypothesis (also referred to as the equivalence hypothesis) states that the means of the cleaning times of two products are different by an amount θ or larger:

Figure 2

in which θ is the equivalence limit and μa and μb are the means of the two groups. To test for equivalence, the 90% confidence
intervals for the difference between two groups are constructed. The null hypothesis that the groups differ by at least θ
is rejected if the limits of the interval fall outside the ±θ bounds. Conversely, comparability is demonstrated when the bounds
of the 90% confidence interval of the mean difference fall entirely within the ±θ bounds, as shown in Figure 2.

Note that the confidence interval width increases with smaller sample sizes of collected data and with less variability within
each data group. The specifics behind the sample size calculation are outside the scope of this article. Larger sample size,
however, would naturally result in a narrower confidence interval of the mean difference and hence would make declaring comparability
easier. Likewise, although equivalency does not explicitly compare an individual group's variability, wider variance would
result in wider confidence intervals, making it more difficult to declare comparability.

This equivalence limit was computed as two times the upper 95% confidence limit of standard deviation estimate of the controlled
dataset. For the case of cleaning experiments, equivalence limit was equal to 2 x [1.6 x 1.4] = 4.48, in which 1.6 was the
standard deviation of a controlled data set (product A) and 1.4 was the multiplier for the 95% confidence limit of a standard
deviation estimate, based on a sample size of 18.^{11} Using the upper confidence limit of the standard deviation estimate accounts for the uncertainty of such estimates based
on a given sample size.

Figure 3

Therefore, the acceptance criterion for equivalency was that the upper and lower confidence limit of the difference between
the two means should be within ±4.48. The following two case studies show the application of this statistical approach to
comparing the cleanability of different protein drug products.