OR WAIT 15 SECS
CUSUM charts are easy to create in Excel and can reveal when a change occurred.
Cumulative sum (CUSUM) charts are a valuable tool in problem solving because they can reveal when a change occurred. This information then can be used to confirm or eliminate potential causes by showing that the problem appeared at the same or a different time. This article focuses on retrospective analysis for problem solving and not the use of CUSUM for controlling a process. The methods for converting a trend chart into a CUSUM chart in Excel are explained in depth. Because interpreting the charts is not entirely intuitive, a guide is presented, which shows how the chart should be read. One case study is used to provide a running example through the article.
A manufacturer of excipients had process problems which arose during the course of a year. The major problem was that assay measurements of one characteristic of the product, the concentration of a contaminant, had increased over several months. As always with problem solving, many theories were cited about the likely cause. A favored one was that the assay standard had changed and the production measurements were consequently distorted but the material itself was still in specification.
A trend chart (Figure 1) shows how the measurements deteriorated and eventually exceeded the limit of 10 ppm but does not show clearly when the change occurred.
Figure 1. Contaminant concentration in production batches. The trend chart shows how the detected quantities increased and eventually exceeded the limit of 10 ppm.
In this situation, it is common to plot a moving average, especially because it is easy to do so using the trend line option in Excel and selecting "Moving Average." The second chart (Figure 2) shows the moving average, but even after examining it, one cannot be certain when the increase occurred. Interpreting this chart is difficult because the moving average is slow to respond to a change and always lags behind the raw data. For example, this chart shows that the line for the moving average was displaced to the right from the data points when the average rose during September.
Figure 2. Contaminant concentration in production batches with moving average. It is still not certain when the increase occurred.
Another chart for the assay standard (Figure 3) shows a similar pattern where an increase in the second half of the year could be taken as evidence that the production measurements were mirroring the assay standard.
Figure 3. The chart for the assay standard shows an increase in the second half of the year.
A simple check to see if the production and reference results are related, is to calculate and plot the ratio of production value to reference value for each batch. It is easy to do this in Excel: if the ratio is constant, then the cause of the problem is probably in the assay reference, whereas if the ratio changes in step with the production results, then the cause is somewhere in the production process. The chart for ratio (Figure 4) is unfortunately inconclusive and shows some increase at about the right time but not enough to account for the change in production results.
Figure 4. Ratio of production measurement to assay reference
Cumulative Sum (CUSUM) charts have a range of applications, but they are most useful for identifying when an event—typically a change in average—took place. The principle behind a CUSUM chart is that it displays a running total of the differences from an average or target value. One common use of the CUSUM approach is in golf where a scorecard records if a player is below par or ahead of par for each hole, and the cumulative difference during the course of a game.
Interpreting the chart depends on an examination of the slopes of the lines. The vertical position and the vertical scale are irrelevant here. If a process is running on target, the CUSUM plot is a horizontal line; if the process average is high, then the line is rising; if the process average is low, then the line has a downward slope. It is easy for us to see a change in slope of a line and thus recognize when an event occurred. The other advantage of a CUSUM chart is that it smoothes a response so that trends are easier to see.
Software programs are available for CUSUM plotting but it is not difficult to perform the calculations in Excel. Table 1 shows an extract from an Excel spreadsheet that was set up to calculate the CUSUM values for these data.
Table 1. Extract from an Excel spreadsheet that was set up to calculate the CUSUM values.
Table 1 shows that the CUSUM score is decreasing steadily. This happens because all the data values are below the average. Similarly, a golfer's cumulative par score would get lower and lower if he were playing below par at each hole. Figure 5 shows the raw data and the CUSUM scores for concentration, in which the downward trend in the CUSUM line at the start of the year is quite obvious. Starting in August, the raw data become higher and are mostly above the average; on the CUSUM plot this is shown as a sharp change in the slope of the line, which is now rising.
Figure 5. Combined raw data and CUSUM for batch leakage
As mentioned above, a line which slopes downwards indicates that the results are below the average or the target value that was selected. The steeper the line, the bigger the difference. A line that slopes upwards indicates that the selected values are above average.
Changes in Slope
The human eye is a wonderful tool for seeing patterns and it is easy to imagine that every slight variation in a line is significant. In this example, the extreme change in slope at the end of August is an important event but the others are debatable. For example, at the end of July, the slope of the line is upwards, which indicates that the raw data values are above the overall average, but not as much above average here as in September and October. This change may be significant or it could be a random variation, it is difficult to tell from the data alone. If the date of change coincides with a known event, such as a change of material or a process adjustment, then the data would support an assertion that something had affected the output of the process. Below we describe some statistical significance tests that give an objective method for assessing a change in the average.
Gaps in the Data
Gaps in the data, such as the times during March and April when there was no production, put a horizontal step into the CUSUM plot. It is usually possible to ignore such gaps by eye and make a visual estimate of the slope of the line to either side. If there are many gaps and the plot becomes confusing, it is simple in Excel to change the horizontal axis so that it shows batch number or successive measurements in sequence. A related problem occurs when a batch exists but there is no measurement for it. In this situation, Excel interprets a blank cell as zero and the CUSUM score can be sent very high or low, creating a vertical step. The solution is to either manually adjust the formulae so that they ignore the empty cell or to put in an IF function to do it automatically.
Changes in Vertical Height
The actual vertical position of the CUSUM plot is not relevant and can be changed very easily by adjusting the value of the first point in the sequence. In this example, the first point in the CUSUM is equal to the first measured value but it could be set to zero, or the average, or some arbitrary value, which makes plotting convenient.
Individual extreme values put a step into the line but the slope remains the same to either side. If there are only a few of these values, they can be ignored when looking at the charts. Alternately, the Excel formulae can be changed to exclude those values.
Average Value or Target Value
For problem solving, it is simple and convenient to use the overall average as the fixed, offset value in the CUSUM calculations. If a process target value is used instead, then the chart gives a clear indication if the process is running high or low. If it continues to run high or low, then the line will eventually fall outside the scale on the chart and a reset will need to be put in.
Find the Change Point
A graphical approach for finding the time when the average shifted is simple to use but does require a little judgment. Figure 6 shows the CUSUM for concentration of contaminant and two straight lines have been drawn through the points near the change point. The judgment is in deciding how many of the data points to use for the straight lines. Sometimes, it is helpful to mask the points nearest to the change time so that the lines are not distracted by a few points. Where the lines cross over gives the time when the average changed and it is usually much clearer than an estimate from a simple trend chart. In this example, the average went up starting on August 29.
Figure 6. An example of how a CUSUM plot can be used to estimate the change point.
The CUSUM chart for production data shows a definite change on August 28 or 29 and possible changes at a few other dates. The problem-solving process uses this information to confirm or eliminate potential causes. Figure 7 shows the CUSUM plots for production data and for the assay reference standard, which may be a cause.
Figure 7. A CUSUM plot for production and assay reference data. The lack of correspondence between the plots gives reason to eliminate the reference assay as the cause of the problem.
The two plots have a similar shape but the key change date for the production measurements does not have a complementary one for the assay reference. An obvious change point for the reference standard on August 28 (or on a slightly earlier date) would have confirmed that it was worthwhile to investigate this potential cause in more depth. The reference does rise (the slope is upwards) starting on August 19 but similar short periods of high values in late June and late July and very high values in October do not have a consistent effect on the production measurements. The differences between the two plots give reason to eliminate the reference assay as the cause of the problem.
The best interpretation of the shape of the CUSUM plot for the reference assay is that it was on target up to May, was low during June, and then it was slightly high. The other twists and turns in the line are probably just random variation and serve to illustrate that it is possible to read a lot into a chart when nothing has really happened. If you think you might be falling into this trap, then plot the raw data behind the CUSUM (as in Figure 5) because it is harder to imagine a pattern or shape in the conventional trend chart.
The tests described here can be used to support the interpretation of a CUSUM chart, though it should be remembered that the key purpose of the chart is to identify retrospectively when a change occurred. Judgment, knowledge of the process, and other events around the process are key to interpretation. The results of statistical tests in these situations are not a core requirement as in a clinical trial; they are an aid to problem solving. It is often enough to trust one's eyes, especially if a CUSUM plot looks like two straight lines as has happened with these data.
T test to Compare Two Periods of Data
The t test is used to check for a difference in averages between two sets of data. Figure 8 shows how this is done in Excel using the TTEST function:
Figure 8. A t test for difference in averages
The t test estimates the random, background variation from the spread of results in each array. It then compares the difference between the average of the first and second array with this variation and calculates the likelihood that it is just random. The answer given by the test is the probability that a difference this big would have occurred by chance. Values greater than 0.1 are generally considered statistically insignificant. If you get an answer of 0.05 and you believe that the difference is real, then you carry a one in 20 risk that you are mistaken, an equivalent statement is that you are 95% confident that the difference is real. Small values of 0.01 or less suggest that the difference is real or that you have been unlucky.
The increase in average seen by comparing data up to August 28 with data from August 29 onwards is almost certainly real because the chance that it would have occurred through random variation is very low (the result is 7.76 x 10–18).
There is a major risk in applying the t test to a historical analysis, such as the one here, in which we have chosen a change point that already looks important, and are then attempting to show that it is statistically significant. The answer in such a situation is often one which confirms our subjective judgment. If we select a period when the average is high and compare it with a period when the average is low then a statistical test will almost always say that the difference is real, particularly if the number of samples in each average is high. It would be a serious mistake to perform t tests on many groupings of data until one is found to be significant; a significance level of one in 20 will be found in about one in 20 random data sets. As stated above, the visual inspection of the CUSUM plot is primary and the statistical test is secondary. A negative result (probability or p-value of more than about 0.1) does, however, give a strong indication that the observed, potential difference is just a random variation.
An alternative approach is to continually apply a test to each new data point and check if it is significantly different from its predecessors. This is the principle behind statistical process control (SPC). The advantage of SPC over the t test is that the data indicate when there has been a significant change and the issue of false positive results does not arise.
The chart for individual values in Figure 9 has control limits that have been calculated using standard methods from SPC that are based on the overall average and the moving range or difference between each sequential pair of values. The point for August 29 and many of those that follow are above the upper control limit, confirming that the average has risen.
Figure 9. Statistical process control limits for individual values
One risk with these charts is that the other points, which are outside control limits, can confuse the interpretation of the plot by suggesting that there are other key events which should be considered. This becomes a serious concern if the process is not stable and under statistical control, as is the case here.
A weakness of this approach using a chart for individuals is that it lacks the statistical power to identify a small shift in the average. Visual examination of a CUSUM plot may have identified a change point but the chart for individuals with its control limits might only provide the same indication somewhat later, limiting its value as a statistical test. Other rules from SPC, such as a run of points, can be applied if the change is not so pronounced, but the only thing such rules will confirm is that the average has changed, not when it changed. A chart of moving average, like the one in Figure 2, is a more sensitive detector of a small change if appropriate control limits are added but it also has the limitation of not identifying when the change occurred.
It is possible to apply control rules directly to a CUSUM chart but the methods are not as simple or intuitive as with conventional SPC. For the occasional user who is trying to solve a problem, they present a steep learning curve; for a statistician or experienced practitioner in SPC, they are worth pursuing. This article does not cover those control rules.
CUSUM charts work well with counted data, such as the number of particles in a sample, or the number of rejects per day. If the batch size or lot size varies, then the step size on the horizontal scale should mimic this so that a percentage reject rate from a large batch has a greater visual weight than the same percentage from a small batch. There are different ways to do this in Excel, but an X–Y plot on which the X-axis has a running total from all the batches is straightforward to implement.
CUSUM Chart in a Report
Interpretation of the charts is not intuitive. People often look at the vertical scale and try to find the significance in positive or negative values. Only the slope is important, however, so this should be emphasized if a CUSUM chart is included in a report. It is best to superimpose a plot for raw data with a CUSUM, as in Figure 5, along with an explanation if the intended readers of the report are not familiar with the technique.
Statistical Process Control
The equivalents of control limits can be set up for CUSUM charts and are valuable if a process average can drift, needs tight control, or cannot be evaluated using large subgroup sizes. This can occur for process variables, such as concentration or batch weight, where repeated measurements are not possible or will give the same result. However, it is not easy to run the charts and interpreting them is not intuitive for people who are only accustomed to conventional trend charts.
The case study example shows how CUSUM was used to eliminate one potential cause of a problem and thereby save time and a lot of unnecessary work.
CUSUM techniques are an effective method for identifying a change point and are especially valuable in problem solving when a trend chart shows that an average has shifted. They are simple to implement in Excel and will often give a precise answer. One risk with CUSUM charts is that a casual reader may not understand that the slope of the lines is important and consequently make errors in interpretation. A second risk is that small variations might be thought to be important because visually they appear to be significant. However, these problems are minor in comparison to the benefits that the cumulative sum method brings when it is applied in the appropriate situations.
George R. Bandurek, PhD, is the director of GRB Solutions, Ltd., West Sussex, UK, +44 1903 215175, firstname.lastname@example.org