Statistical Essentials—Part 4: Regression and Design of Experiments - A well-designed experiment can make it easier to understand the sources of variation. - BioPharm International

ADVERTISEMENT

Statistical Essentials—Part 4: Regression and Design of Experiments
A well-designed experiment can make it easier to understand the sources of variation.


BioPharm International
Volume 21, Issue 11

Some hazards of regression analysis include extrapolating beyond the range of the x values; influential observations or outliers giving misleading models; and the regression of y = x is not the same as the regression of x = y.

A very popular statistic used to assess "goodness" of the model is R 2 . R 2 can be defined as the percent of the variation explained by the model. Unfortunately, R 2 is very sensitive to data distribution and spread. A high R 2 does not necessarily mean a good fit.

Anscombe created a data set that highlighted the importance of graphing data first, before applying any statistical test to the data set.1 In Figure 1, you can see how four different data sets, each with the same mean and standard deviation, can have very different data distributions, giving misleading regression coefficients. The graph of x1 versus y1 gives the appropriate regression line. The other graphs have a curvilinear relationship (x2 versus y2), a single influential point (x3 versus y3), or an outlier (x4 versus y4), leading to inappropriate conclusions.


Figure 1
We always consider the validity of the assumptions to be doubtful and conduct an analysis to examine the adequacy of the model. We cannot detect violations of the assumptions by examining summary statistics. A residual is defined as the difference between the observed value and the predicted value from the model. The standardized residuals have zero mean and unit variance (like a standard normal variable). A standardized residual outside ±3 can be considered an outlier.


Figure 2
Figures 2, 3, and 4 show an example of how to apply regression analysis for assay data. A potency assay is to be validated for the range 50–150% potency. The protocol tested five different theoretical potency samples, three times on each of three days (nine total observations per potency level, for a total of 45 data points). We have done the analysis three different ways—three data points per day per potency level (Figure 2); one data point per day per potency level (Figure 3); and one data point per potency level (Figure 4)—to compare the results.


Figure 3
The same data summarized by days or concentration can give different R 2 values, while having no impact on the parameter estimates. Always use the raw data over summary data, unless the precision of the data (i.e., poor assay repeatability) warrants the use of the means.


blog comments powered by Disqus

ADVERTISEMENT

Moscow Hosts IFPMA Biosimilars Conference
May 17, 2013
AbbVie and Alvine Will Collaborate on Celiac Disease Therapy
May 15, 2013
FDA Issues Pharmacoepidemiologic Safety Study Guidance
May 14, 2013
USP Launches Initiative to Fight Counterfeit Drugs in Sub-Saharan Africa
May 13, 2013
Amgen Forms New Joint Venture to Commercialize Vectibix in China
May 13, 2013
Upcoming Conferences
UPCOMING CONFERENCES

Access Programs for Investigational and Pre-Launch Drugs
Philadelphia, PA | July 17-18, 2013
Request Brochure

Strategic Pipeline Planning & Portfolio Valuation
Philadelphia, PA | August 13-14, 2013
Request Brochure

MES 2013 - Forum on Manufacturing Execution Systems
Philadelphia, PA | August 14-15, 2013
Request Brochure

Mobile Innovation for the Life Sciences Industry
Philadelphia, PA | August 20-21, 2013
Request Brochure

See All Conferences >>

ADVERTISEMENT

Author Guidelines
FindPharma
Source: BioPharm International,
Click here