Demonstrating Comparability of Stability Profiles Using Statistical Equivalence Testing

The authors present an approach for testing statistical equivalence of two stability profiles.
Mar 01, 2011
Volume 24, Issue 3


Statistical comparisons are helpful in objectively assessing comparability between a historical (prechange) and new (postchange) manufacturing process, site, formulation, or delivery device. When the objective of the comparison is to demonstrate that the stability profiles (i.e., slopes of a performance attribute over time) of two processes are highly similar, an equivalence approach is recommended. The authors present an approach for testing statistical equivalence of two stability profiles. The authors discuss concepts, selection of an equivalence acceptance criterion, sampling design considerations, and data analysis.

Regulatory bodies recognize and accept change as a normal part of manufacturing in a cGMP environment. Changes in scale, site of manufacture, manufacturing processes, formulation, and delivery devices are common aspects as products progress through development, to commercialization, and finally, to commercial sustainability. These changes are often made to improve efficiencies, for process control, or to meet product supply demands and patient needs. Because change is recognized as a necessary aspect of a product's life cycle, regulatory guidance has been developed to ensure that when changes are implemented that they have no adverse impact on the product's safety and efficacy.

The guidance in the International Conference on Harmonization Q5E guideline acknowledges that pre- and post-change conditions do not have to be identical, but there must be no negative impact on product safety and efficacy (1). Specifically, the guidance document states the following:
"The demonstration of comparability does not mean that the quality attributes of the pre-change and post-change product are identical, but they are highly similar and that the existing knowledge is sufficiently predictive to ensure that any difference in quality attributes have no adverse impact upon safety or efficacy of drug product."

It has been argued in recent years that statistical tests of equivalence provide the strongest evidence of process comparability. Two articles in the biopharmaceutical literature recommending equivalence have been published recently (2, 3). These papers and others have focused on demonstrating average equivalence of two process means. Briefly, the test of average equivalence for means compares means of two processes (i.e., historical and new). After collecting the data, the difference between the historical and new means is estimated using upper and lower 95% one-sided confidence bounds (i.e., a 90% two-sided confidence interval). Equivalence of the two processes is demonstrated if the resulting confidence interval falls within the range predefined to demonstrate comparable performance of the two processes. The range, based on scientific understanding, is symmetrically centered around a parameter difference of zero and is from –EAC to +EAC, where EAC is an acronym for Equivalence Acceptance Criterion. The EAC is defined as the largest acceptable difference between the historical and new process means. Figure 1 graphically represents three result scenarios of an equivalence test of means along with the EAC range, and the upper and lower confidence bounds on the difference between the two parameters.

Figure 1. Result scenarios for an equivalence test. EAC is equivalence acceptance criteria, and X is the estimated difference between the average of the historical process and the new process. (All figures are Courtesy of the Authors)
For each of the three scenarios in Figure 1, an X represents the estimated difference between the average of the historical process and the new process. The horizontal line through the X represents the length of the confidence interval around the difference. If the interval falls entirely outside the range from –EAC to +EAC, the test of equivalence fails, and in fact a condition of nonequivalence has been demonstrated (see Scenario A). When the interval straddles the EAC, the test is considered inconclusive (i.e., statistical equivalence has neither been proven nor disproven) as shown in Scenario B (4). The test of equivalence passes when the confidence interval around the difference falls completely within the range from –EAC to +EAC. In this case, statistical equivalence has been proven with a type 1 error rate of 5% (see Scenario C).

In the case where the test result is inconclusive (i.e., Scenario B), additional data would enhance further process understanding. From a statistical perspective, the additional data would shrink the size of the confidence interval with minimal impact on the estimated difference in the two process parameters X. Ultimately, increasing sample size would produce a conclusive result (i.e., Scenario A or Scenario C).

lorem ipsum