Regulatory authorities expect biopharmaceutical manufacturing facilities to demonstrate that they have an effective and consistent cleaning process in place. For a multiproduct facility, bench-scale characterization offers a useful and cost-effective means to support cleaning validation by comparing the cleanability of a new product to a validated one. Because of the challenges posed by experimental variability in such evaluations, such relative cleanability assessments should be based on a sound statistical analysis. This article describes the application of a two-one-sided t-test (TOST) method to assess the comparability of two groups of cleanability data generated from a bench-scale study.
A STATISTICAL METHOD
When comparing two or more groups of data, the more common approach is to determine if the difference in group means (a group mean represents the average of all data within the group) is sufficiently large to be declared statistically significant. The test statement or the null hypothesis is that the groups are not different. The effect of declaring the difference statistically significant indicates that the null hypothesis is rejected; the groups represent two or more different distributions of values and are in fact not equal. In practice, given sufficient sample size, even differences that are too small to be meaningful may be declared statistically significant.
The opposite cannot be declared, however, when no statistically significant difference is observed. One can only reject the null hypothesis or show that the groups are different using the common t-test. This is inconvenient when the goal is to show comparability between two or more groups.
An approach widely used in clinical trial statistics and which is gaining popularity in pharmaceutical and biotech settings, the TOST is a method for declaring the comparability of equivalence that is built around comparing two or more group means and their respective mean difference confidence intervals against predetermined equivalence limits. If the difference between the confidence intervals is within a predefined equivalence limit, then the true difference will be within the limit as well, thus making it possible to claim equivalency between two data sets. The key goal for the cleanability assessment is to compare the cleanability of the two products by an equivalency test.
Experimental data generated during a cleaning characterization study using the bench-scale model showed that some inherent variability exists because of the nature of the cleaning process. In addition, analyst and experimental error contribute to further variability. To adequately establish the predefined equivalence limit, each component contributing to variability should be considered. If the equivalence limit is set too wide, the resolution of the method may be reduced because it would be more difficult to distinguish between two products. If the equivalence limit is to set too narrow, the results may not be accurate in assessing whether two products are truly equivalent. For the scale-down cleaning model, an evaluation of the different components of experimental variability showed that two times the upper 95% confidence limit of the standard deviation estimate of a controlled data set is adequate to differentiate between the cleanability of two products. Variability in the controlled dataset is one of many potential equivalence limit justifications. Often, when specification or acceptance criteria are available, maximum differences that ensure the capability of meeting these criteria maybe used as equivalence limits.