QUANTITATIVE DATA
Quantitative, or continuous, data are preferred because of their ability to estimate and predict the true population values.
Though qualitative data can be used to estimate and predict the true population values, they typically require larger sample
sizes to accomplish the task. There are several summary statistics that are used with quantitative data. The most common is
the mean or average of the data. Another estimate for central tendency is the median, or 50th percentile of the data. Even
though the mean is the most widely used, it is not appropriate for highly skewed distributions and is less efficient than
other measures of central tendency when extreme scores are possible. The median is useful because its meaning is clear and
it is more efficient than the mean in highly-skewed distributions. Another good estimate for the central tendency is the geometric
mean if all the values are positive and the distribution has a positive skew. The geometric mean is computed by taking the
average of the logarithms of all the values and raising the base of the logarithm used to the resultant average. If the distribution
is skewed positively, the mean will be larger than the median; if it is skewed negatively, the mean is smaller than the median.
When a distribution is symmetrical, the mean and the median are equal.
The standard deviation or the square root of the variance is by far the most widely used measure of spread. The variance is
the average squared deviation from the mean of the data. A key point to remember is that the variance can be averaged but
the standard deviation cannot.
The range is another estimate of the dispersion of the data, but it takes into account only two scores, the maximum and minimum
value. A very handy method for comparing variability is the coefficient of variation (CV), sometimes called the relative standard
deviation (RSD). The coefficient of variation measures variability in relation to the mean and is used to compare the relative
dispersion in one type of data with the relative dispersion in another type of data. The data to be compared may be in the
same units, in different units, with the same mean, or with different means.
There are several methods to graphically display quantitative data. The most common methods include the line plot, box and
whisker, and histogram.
A PICTURE IS WORTH A THOUSAND WORDS
 Figure 3. An example of a line plot. The horizontal line is the mean of the 30 lots.
|
Graphing data makes it easier to see patterns in the data and to confirm assumptions about the distribution of the results.
A line plot is a two- dimensional plot of data, usually over time, used to detect trends in the data. Line plots are used
in conjunction with other statistical techniques such as control charts for process control. A control chart is a line plot
with statistical limits set at ±3 standard deviations from the mean. Based on the normal distribution, 99.7% of the data should
be within these limits. Figure 3 is an example of a line plot. The horizontal line is the mean of the 30 lots.
 Figure 4. An example of a box plot
|
A box and whisker, or simply a box plot is a graphical representation of dispersion of the data. Figure 4 represents the lower
quartile (Q1), upper quartile (Q3), and median. The box includes the range of scores falling into the middle 50% of the distribution.
The whiskers i.e, the vertical lines extending from the box usually are set at 1.5 times the interquartile range (Q3–Q1).
Points that are outside of the whiskers are usually candidates for outlier analysis. The box plot also can be used to compare
different lots or batches. A t-test would be used to statistically compare two different lots. If you have more than two lots
to compare, a one-way analysis of variance (ANOVA) would be used.
|