The Power of Hypothesis

June 1, 2008
BioPharm International, BioPharm International-06-01-2008, Volume 21, Issue 6
Page Number: 40–45

How to use hypothesis correctly, and understanding the difference between one-sample, two-sample, and z-test.

This article is the second in a four part series on essential statistical techniques for any scientist or engineer working in the biotechnology field. This installment presents statistical methods for comparing sample means, including how to establish the correct sample size for testing these differences. The difference between one-sample, two-sample, and z-test also are explored.

Steven Walfish

HYPOTHESIS TESTING

In hypothesis testing, we must state the assumed value of the population parameter called the null hypothesis. The goal of hypothesis testing is to verify if the sample data is part of the population of interest. You either have sufficient evidence to accept the null hypothesis or reject it—you do not prove it. The significance level or p-value indicates the likelihood that the sample comes from the population of interest. Statisticians usually use a p-value of 0.05 as the cutoff for statistical significance. In other words, a p-value less than 0.05 is sufficient evidence to reject the null hypothesis. Typically, the null hypothesis is a statement about the value of the population parameter. For example, μ = 100 versus μ ≠ 100. A one-sided test means we are testing the null hypothesis of either less than or greater than. A two-sided test means we are testing the null hypothesis of less than and greater than.

ONE-SAMPLE T-TEST

The one-sample t-test is used to compare a sample mean to a hypothesized population mean. The hypothesis can be either a one-sided or two-sided test. Usually, the population variance is unknown requiring use of the t-distribution, which takes into account the uncertainty in estimating the sample variance. The t-distribution is tabled by confidence level and degrees of freedom. For the one-sample t-test, the degrees of freedom are the number of observations used to estimate the sample standard deviation minus one. The formula for the one-sample t-test is as follows:

in which X-mean is the sample mean, μ is the theoretical population mean, s is the sample standard deviation, and n is the sample size used to estimate the mean and standard deviation.

If the value of t* is greater than the tabled value from the t-distribution, the sample mean is statistically different than the population mean (μ). An example of a one-sample t-test would be comparing protein concentration for a particular batch to a theoretical protein concentration. Table 1 shows an example of a two-sided one-sample t-test for protein concentration. The hypothesis is that the lot is not statistically different than 30 (μ = 30). The mean of the six vials was not statistically different than the theoretical value of 30 (p = 0.223). The t* of 1.355 did not exceed the tabled value for a 95% confidence level with five degrees of freedom of 2.571.

Table 1. An example of a two-sided one sample t-test for protein concentration. The hypothesis is that the lot is not statistically different than 30 (μ = 30).

TWO-SAMPLE T-TEST

The two-sample t-test is used to compare two different sample means. Similar to the one-sample t-test, the hypothesis can be either a one-sided or two-sided test. The t-distribution is required here because instead of population mean, two samples are being compared. For the two-sample t-test, the degrees of freedom are the number of observations for sample 1 (n1) plus the number of observations for sample 2 (n2) minus two. The formula for the two-sample t-test is shown in the following equation:

in which mean1 and mean2 are the two means being compared, s1 and s2 are the standard deviation of each mean, and n1 and n2 are their respective sample sizes.

If the value of t* is greater than the tabled value from the t-distribution, the two sample means are statistically different. An example of a two-sample t-test would be comparing the existing lot to a previous lot of material. Table 2 shows an example of a one-sided two sample t-test for purity. The hypothesis is that the new lot is not less pure than the old lot (New ≥ Old). The mean of the new lot is statistically less pure than the mean of the old lot (p = 0.003). The t* of 3.49 exceeds the tabled value for a 95% confidence level with 10 degrees of freedom of 1.812. Notice that the tabled value of a one-sided 95% confidence level is the same as the two-sided 90% confidence level.

Table 2. An example of a one-sided two- sample t-test for purity. The hypothesis is that the new lot is not less pure than the old lot (New ≥ Old).

Z-TEST

The z-test is probably the most misused and misunderstood statistical test in biopharmaceutical organizations. The misconception is that if the sample size is large enough (n > 30) then the z-test is appropriate. The z-test is restricted for only those situations where the population variance is known. Because it is practically impossible to know the population variance and computers can calculate the t-distribution for any sample size, it is preferred to use the t-test. If it is truly necessary to use the z-test, the formula is:

in which X-mean is the sample mean, μ is the theoretical population mean, σ is the population standard deviation, and n is the sample size used to estimate the mean.

Notice that the formula for the z-test is similar to the formula for the one-sample t-test. The only difference between the two equations is the estimation of the standard deviation (square root of the variance). The t-test uses the sample standard deviation and the z-test uses the population standard deviation. The normal distribution values do not require a sample size because the standard deviation is known and, therefore, the tabled value is the same regardless of sample size. Table 3 shows a comparison of the z-values to the t-values for sample sizes of 30, 60, and 120. The t-distribution, though close to the normal distribution, does not converge to normal distribution until the sample size exceeds 120.

Table 3. A comparison of the z-values to the t-values for sample sizes of 30, 60, and 120.

POWER OF A HYPOTHESIS

Most people use a p-value equal to or less than 0.05 as the criteria for rejecting the null. The probability of rejecting the null hypothesis when the null is true is called Type I error. The more critical error is the Type II error where you accept the null hypothesis when the null is actually false. Because most hypothesis testing in the biopharmaceutical industry sees its greatest use in comparing a previous lot to the new lot or comparing a sample to a known value, accepting the wrong answer can be detrimental. Luckily, there are ways to minimize the risks of making a wrong decision in hypothesis testing.

For example, increasing the sample size can minimize the risk. Alternatively, increase the difference needed to be considered statistically different will reduce the risk. The normal distribution can be used to get a rough estimate for the correct sample size. Software such as Minitab and JMPuse a noncentral t-distribution to calculate the sample size. The following equation gives the normal approximation to the sample size calculation:

in which n is the number of samples to be calculated, S is the sample standard deviation, Δ is the difference to detect, Zα is the Z-value for the α error (a two-sided 0.05 α would be 1.645) and Zβ is the Z-value for the β error (a two-sided 0.10 α would be 1.281).

The α risk is the probability of rejecting a good lot; this is sometimes called the producer risk. The β risk is the probability of accepting a bad lot; this is sometimes called the consumer risk.

RISKS OF HYPOTHESIS TESTING

If the sample size gets too large or the variability is too small, a hypothesis test might conclude a statistical difference, when the difference observed is not clinically relevant.

One proposed method that combines both the statistical rigor of hypothesis testing and the appropriateness of meaningful differences is to set a minimum difference that must be obtained to be considered different. The correct selection of the minimum difference value is still being debated in the statistical and scientific community.

SUMMARY

Although a powerful statistical method, hypothesis testing can lead to false conclusions if applied incorrectly. Whenever possible, use the t-distribution over the z-test and normal distribution because the population standard deviation is never known for sure. Using the correct sample size and power analysis can lead to robust comparisons, especially if a minimum difference is prespecified.

Steven Walfish is the president of Statistical Outsourcing Services, Olney, MD, 301.325.3129, steven@statisticaloutsourcingservices.com