Misconceptions of Maintenance and Reliability: A Biopharmaceutical Industry Survival Guide

June 1, 2013
Paul Boles

Paul Boles is senior technical manager GMP manufacturing at Genentech.

Rob Christman

Rob Christman is associate director global reliability engineering at Genzyme.

Gerald Clarke

Gerald Clarke is reliability engineer at Pfizer.

James Bailargeon

James Bailargeon is instruments and control manager at MedImmune.

Steve Jones

Steve Jones is director at BioPhorum Operations Group.

BioPharm International, BioPharm International-06-01-2013, Volume 26, Issue 6

The authors provide common misconceptions and key concepts behind reliability engineering.

Maintenance excellence and reliability have become prominent within the corporate agenda. This is particularly true of the biopharmaceutical industry where such concepts are becoming more widely adopted in attempts to reduce risk and reduce costs. Many techniques, however, are still in their infancy, and while leaders are pressing for wider adoption, organizations are often slow to adopt these techniques because many of the new concepts are counter-cultural.

This article focuses on common misconceptions that can be encountered by those spear-heading the change and introduces some of the key concepts behind reliability engineering. This article also summarizes a detailed white paper created by the authors (1).


Failure is an unfortunate fact of life. Systems have a natural tendency to break and wear out, and the components of any asset are subject to the effects of wear and tear. Eventually, components fail. It is a common misconception that simply because preventive maintenance is employed, the risk of failure can be eliminated. While preventive maintenance can reduce the risk of failure, so long as the failure mode exists, the risk of failure remains.

Reliability is about managing the probability of failure over time. Contrary to popular belief, only a small percentage of equipment ages or wears out at the end of its expected life (see Sidebar). In practice, most failures occur in early life—infant mortality—or completely randomly at any point in its life (2).

The reality of failure (2)

By understanding the failure mode, appropriate maintenance strategies can be established to help detect, prevent, or mitigate failure and improve the reliability of a component. Nevertheless, 100% reliability can never be guaranteed in reality so long as the failure mode still exists.


Historically, the biopharmaceutical industry has adopted mainly time-based maintenance but, in fact, other more effective strategies can often be used. Increasingly, the industry is adopting predictive and condition-based techniques to anticipate failure ahead of time. These techniques enable repairs to be planned and scheduled in a controlled manner, well before failure. Preventive maintenance can be divided into three categories:

  • Time-based or age-related. This type of preventive maintenance applies where the failure rate increases over time. This pattern applies only to a small percentage of failures in the real world. Clearly, it does not make sense for this to be a primary approach to preventive maintenance.

  • Run-based or usage-related. This type of preventive maintenance is a development of time-based approach and applies where the failure rate increases with usage, for example a valve diaphragm deteriorating through thermal cycles.

  • Predictive or conditioned-based. This type of preventive maintenance applies to situations in which failure rates appear randomly, where neither time nor usage provides good early failure indicators. This is the most common pattern of failure and, to be truly effective, preventive maintenance programs should reflect this fact.

In the biopharmaceutical industry, vibration monitoring of bearings, motors, and gearboxes in plant and equipment is increasingly common practice, where an increase in detected vibration can be used to indicate failure. Such systems provide a step increase in reliability compared to invasive time-based replacement. Similarly, thermography can be used to monitor the condition of electrical controls to signal early onset of failure. On the manufacturing floor, visual inspections carried out by operators provide early signals as part of a structured total productive maintenance system.


It is a fallacy that increasing frequency of invasive maintenance leads to better reliability. In many situations, opening up a system to perform invasive maintenance may actually increase the chances of failure through the introduction of iatrogenic (technician-caused) failures.

Unfortunately, many preventive maintenance (PM) programs set maintenance frequencies using generic industry practices without consideration of the asset and the operating environment. Worse still, time-based intervals are often arbitrarily tightened in a knee-jerk response to failures and deviations.

Such actions can, in fact, worsen the situation by inadvertently introducing premature failure. A far more effective approach is to understand the failure modes and develop specific strategies to address them, such as less invasive condition-based techniques.


Asset failure could signal a failing in the maintenance strategy, but not necessarily. Further analysis and investigation is required before a maintenance strategy is deemed to be ineffective.

The effectiveness of a maintenance strategy should be evaluated against targets such as quality, health and safety, environmental integrity, production output, operating costs, etc. A preventive maintenance strategy cannot completely eliminate the risk of failure. Failure with a low probability of occurrence may still occur, even under the most robust maintenance strategy.

This is not to say that we give up trying to improve reliability; on the contrary, periodic maintenance effectiveness reviews are used to identify root causes of recurring failures and drive continuous improvement in reliability that are quantifiable to the business.

An effective maintenance strategy manages asset failure to a tolerable risk, aligned with the business objectives. If you are meeting your objectives, then the asset maintenance strategy is effective with respect to your business objectives.


If failure impacts product quality, then maintenance is critical, but if it doesn't have product impact, then it need not be. In practice, only a small percentage of maintenance tasks are critical to product quality, the rest being there for business reasons.

The ISPE Good Practice Guide on Maintenance cites, "The maintenance program should help to ensure that the equipment is continually maintained in a qualified state and is suitable for intended use" (3). The primary goal of maintenance in the biopharmaceutical industry is to reduce the risk of a failure that may impact product drug quality. Not all functional failures of an asset, however, impact drug quality. Differentiating between those failure modes that do and those that do not enables effort to be focused where it is needed most.

Having a maintenance strategy of run-to-failure is perfectly acceptable when a failure mode cannot be detected and the equipment is deemed to be non-critical. Conversely, monitoring the condition of critical equipment provides constant assurance that the equipment is safely operating in its qualified state, while providing early signals of wear that may lead to a failure that affects product quality.


The misconception that any deviation from a PM schedule will lead to equipment not fit for use is perhaps the most dangerous. Performing critical maintenance outside the optimum time interval may increase the risk of a functional failure that impacts the qualified state. Execution of PM outside of the optimum interval, however, does not in itself cause the asset to be no longer qualified or suitable for intended use, unless the qualified state or suitability for use is dependent upon the execution of the PM task at a specific point. In the majority of circumstances, this condition does not apply.

So, apart from a very small number of specific exceptions, deviation from a PM schedule increases risk, but does not directly cause the asset to be no longer fit for use. This is not to say that PM tasks are unimportant; they are important because they reduce risk and save money.

If an organization falls behind with its maintenance schedule, it is important to prioritize work so that the bigger risks are still addressed and slippage is allowed only on the lower risk items. Schedule-adherence at an aggregate level, therefore, provides a leading indicator on the risks that the business is running. When organizations fall behind, the most important priority is to clear the backlog to get back on track.

Gerard Clarke is reliability engineer at Pfizer; James Baillargeon is instruments and control manager at MedImmune; Paul Boles is senior technical manager GMP manufacturing at Genentech; Rob Christman is associate director global Reliability engineering at Genzyme; and Steve Jones is director at BioPhorum Operations Group.


1. BPOG, Misconception of Maintenance and Reliability, www.biophorum.com/page/96/Misconceptions-Paper-Form.htm.

2. J. Moubray, Reliability Centered Maintenance, 2nd edition (Industrial Press Inc., 1997).

3. ISPE, Good Practice Guide on Maintenance (ISPE, March 2009).