Mathematical Programming for the Design and Analysis of a Biologics Facility-Part 2

March 1, 2010

An algorithmic approach to fine tune facility design and predict capacity.

ABSTRACT

In this paper, we describe the use of mathematical programming methods for automated schedule generation. In part 1 of the article, we described the process and results of a study using large-scale mathematical programming in the design and analysis of a biologics facility. In part 2, we summarize the formulation, compare our approach to discrete event simulation, and discuss the algorithmic methods used to produce high quality production schedules.

In part 1 of this paper, we presented the results of a study using large-scale mathematical programming in the design and analysis of a biologics facility.1 Here, in part 2, we describe the solution methodology used in that study. First we define a basic modeling framework that describes process physics in terms of materials, equipment, and tasks. Then we provide a detailed description of how this framework is translated into a mathematical formulation that models the overall scheduling problem. Our solution uses a core solver with a customized outer layer, specifically designed to handle biologics processes. We discuss the importance of explicit consideration of intermediate material storage and also the challenges that this presents in solving scheduling problems in the process industry as opposed to discrete parts manufacturing. Finally, we contrast our approach with traditional methods of discrete event simulation, discussing the differences and relative strengths of both.

Bristol-Myers Squibb, Inc.

THE RESOURCE TASK NETWORK

The mathematical programming approach described in this article uses process-specific information in two ways. A resource task network (RTN) description is used to provide a structured description of process details.2 The solution algorithm also is customized to accelerate the search of combinatorial alternatives and to address nuanced preferences concerning selection among degenerate solutions. A customized solution algorithm can select degenerate solutions that are most natural to accommodate the unmodeled constraints and adhere to desired patterns. For example, operators may prefer solutions that use the same equipment over and over for the same activity, even though many other patterns are mathematically equivalent. Under exceptional circumstances, operators will forgo preferences to achieve business objectives. A customized solution algorithm can address this reality. The RTN framework used here explicitly considers materials, equipment, and renewable resources, instead of using the more abstract versions that treat equipment as a renewable resource.3 This approach simplifies algorithm engineering and makes the development of theoretical properties for pruning the search space more straightforward. In intuitive terms, the RTN captures recipe information about a process. Developing the RTN as part of a prospective design analysis provides a way for development personnel to specify the important aspects of a process and helps translate research and development information to practice. In the approach considered here, an RTN instance is translated into a formulation automatically. This automatic translation formulates all necessary variable and constraint objects and constructs the supporting algorithm data structures.

A MATHEMATICAL PROGRAMMING APPROACH TO FACILITY MODELING

We developed a high fidelity model of an industrial biologics facility using the VirtECS version 7.1 software system (Advanced Process Combinatorics, Inc., West Lafayette, IN), which consists of a core mathematical programming solver designed around a uniform discretization model (UDM) and a customized outer layer that is specifically tailored to address biologics process behavior.3,4 The system uses built-in virtual methods that allow an algorithm engineer to control core algorithm behavior to improve performance and resolve degeneracy. Algorithm heuristics affect solution speed, but the precise mathematical programming framework permits the efficient exploration of decision implications throughout the timeline. By overriding virtual methods, algorithm engineers can develop an approach whose performance is adapted to match the needs of the process and which extrapolates to match model evolution because the preponderance of process physics is captured in equations addressed by the core solver. The core solver, which solves a UDM mixed integer linear program (MILP), is used by the outer algorithm in the same way that a base linear program (LP) solver is used as a relaxation in traditional branch and bound algorithms. The core VirtECS solver is a mathematical programming–based system that finds high quality feasible schedules for a UDM formulation derived from the RTN framework described above. The equation solving, branching rules, backtracking, cut generation, and search-tree enumeration strategy of VirtECS version 7.1 are beyond the scope of this paper. This core solver can be used in a stand-alone fashion to determine schedules for any process that can be described using the RTN. In practice, however, processes frequently possess individual features, special constraints, or degeneracy preferences that require accelerated or engineered solution logic. Examples of these include rapid convergence to solutions that imply periodic repacking of chromatography columns and the need to dispose of batches. The customized outer algorithm also is required to address stochastic variability by choosing among mutually exclusive UDM terms that reflect sampling of distributions of uncertain parameters such as titer.

Benefits of a Layered Approach

The layered algorithmic approach derives its strength from the fact that the customized outer layer typically contains much less than one percent of the overall code base. The outer algorithm is able to use the core solver in a number of ways—to solve the entire UDM formulation or only portions of the UDM formulation consistent with a particular solution strategy. The core solver, therefore, accounts for all of the constraints present in the UDM formulation. The function of the outer algorithm is to restrict the core algorithm search space to solutions that satisfy special constraints or resolve the degeneracy, especially for the process in question. Solution degeneracy is of particular importance in determining high quality schedules for real-world processes. Often, there are a variety of scheduling choices that satisfy the physical constraints of a process. For example, in the absence of inter-product transitions, it is perfectly acceptable to run the following sequence of tasks: ABAB. However, most people would prefer the following sequence: AABB. Mathematically, these two solutions may be equally desirable, but the second choice normally will be preferred by plant operators. The customized outer layer of the algorithm allows us to control solution degeneracy more directly than would the use of penalty functions and other such contrivances. The custom algorithm layer also makes it relatively easy to handle implicit constraints that are difficult to formulate explicitly. Examples include clean-in-place (CIP) requirements and periodic column recharging. Using the RTN described above, the scheduling problem is translated into a UDM-adapted form.3 As the name suggests, this formulation discretizes the timeline into a set of uniformly spaced buckets.

We define the following variables:

Wijt is a binary decision variable indicating that task i starts on equipment j at time t

Ibj is set of tasks that can run on batch unit j

pij is the process time for task i on equipment j

Fsjt is the amount of material s stored in equipment j at time t

Fs is the set of all equipment that can store material s

Zsjt is 1 if material s is stored in equipment j at time t

Kbi is the set of equipment that can process task i

Bijt is the batch size for task i on equipment j at time t

Fmaxsj is the maximum storage capacity for material s in equipment j

Ist is the amount of material s in storage at time t

T means is the set of tasks producing material s

Ts is the set of tasks consuming material s

Pst is the amount of material s purchased at time t

Dst is the amount of material s delivered to satisfy demands at time t

ρ meanisj is the amount of material s produced by task i in equipment j

ρisj is the amount of material s consumed by task i in equipment j.

The unit allocation constraints on equipment may be written as:

Equation 1 ensures that at most one task may begin at any time on any one piece of equipment. Equation 2 ensures that successive tasks do not overlap on any piece of equipment.

Equation 3 ensures that equipment j can store at most one material at any one time.

Because some equipment can be used both for processing and storage, eq 4 is required to ensure that batch processing and storage activities do not overlap.

The material balance constraints for material s at time t may be written as:

The UDM approach is easily extended to address other physical considerations.3 One advantage of being able to control the solution algorithm and UDM formulation is that an effective approach to stochastic optimization can be implemented. Namely, multiple literal versions of a task with stochastic parameters can be incorporated and the UDM formulation guarantees that only one of these versions is executed—just as if there were multiple alternate activities to accomplish a certain process step. To be precise, consider a task TR that depends on stochastic parameters, for example, titer. The UDM formulation and customized solution algorithm can easily accommodate introducing literal task and probability tuples (TR1, p1), (TR2, p2),..., (TRk, pk), where the tuple probabilities for TR must sum to one. The solution algorithm can randomly choose among these versions, according to the probabilities, to achieve a Monte Carlo sampling effect. The random selection of a particular literal version of a stochastic task, TRi—one containing specific values for one or more stochastic parameters—forces the algorithm to accommodate the consequences as the solution is completed. For example, a longer task instance may induce longer storage time or a high titer batch may require a solution that uses multiple chromatographic separation cycles. Unlike discrete event simulation approaches, Monte Carlo treatment of uncertainty can evolve over the timeline in arbitrary directions and not just left to right. However, to be realistic, the solution algorithm cannot use information about the future behavior of a stochastic parameter, like cycle time, to improve process behavior. This imposes constraints on the solution algorithm, hence the control of algorithm details with full awareness of process physics through the RTN is necessary to implementing this approach.

Automatic Schedule Generation

The core VirtECS solver is able to automatically generate schedules that satisfy all of the above constraints and can be controlled to solve subsets of constraints or variables in several ways to support algorithmic strategies implied by application physics. The paper by Pekny describes the features required of the core solver in more detail and references earlier algorithm engineering work underpinning the development of the core solver.5 For this paper, the performance of the core solver may be taken as a given, and we focus on the novel aspects of solving the problem for a biologics facility that resides in the development of the outer algorithm. We will therefore treat the core solver much as papers on branch and bound algorithms for MILP problems treat the underlying LP solver—we take its performance as a given.

The purpose of the process-specific outer algorithm is two-fold. First, it is necessary to define the parameters within which the core solver must work. This can mean specifically authorizing a restricted set of task-equipment-pairs (teps) and material-equipment-pairs (meps) that the core solver is allowed to use for a particular demand. This allows the algorithm engineer to maintain tight control over the search space of the core solver and still backtrack to relax suggestions that prove incorrect. The outer algorithm must focus attention effectively for solution acceleration to be dramatic. A well-designed outer algorithm speeds up the solution time because the core solver must search a smaller space and allows the user to ensure the solutions generated will conform to the special constraints and degeneracy preferences of the process in question. In selecting which equipment set to use for a particular demand, the outer algorithm weighs, for example, the relative advantages of load balancing on parallel equipment versus avoiding transitions. This is one place where the underlying logic of experienced plant personnel may be captured and codified to produce a high quality automated algorithm for a particular process. Second, the outer algorithm must examine the results of the solution generated for a subset of the UDM formulation. This is essential to determine whether remedial action must be taken because of the violation of some specialized process constraint that the core solver does not explicitly address, but which is implied by the UDM formulation knowing the details of the process under study. In addition, the outer algorithm has the opportunity to apply wholesale solution transformation by sliding task instances or adding additional tasks to meet operational preferences in a way that exploits the power of the underlying UDM formulation. Examples of tasks that may be treated in this exceptional fashion include CIP and column repacks, which are highly specific to biologics.

EXPLICIT MANAGEMENT OF INTERMEDIATE STORAGE

Chemical and pharmaceutical processes are more difficult to schedule than processes that make discrete parts. Much of this difficulty arises from the challenge of storing intermediate materials. From a material balance standpoint, one can consider a schedule feasible so long as inventories of materials do not go negative. The absence of negative inventories ensures that material is produced before it is consumed. In a plant that makes discrete parts (e.g., engine blocks), this level of modeling may be sufficient. Many parts often can be stacked on pallets while awaiting the next stage of processing. The pallets can be set on the factory floor or put in a warehouse; all that is required is floor space. Therefore, in discrete parts manufacturing, the management of intermediate storage often is ignored.

Storage Constraints in Process Industries

Process industries, however, are not so simple. Physics dictates that intermediate inventory must actually physically reside somewhere. If you are dealing with liquids or unpackaged solids, you don't have feasible schedules until you know where every pound of material is stored at every time over the schedule horizon. For example, what do you do with a batch of bioreactor product when the downstream equipment is still busy? Real-world constraints dictate that unless you have a tank, empty and cleaned for this material, it must remain in the reactor, where its presence will block the initiation of subsequent batches. A scheduling system that does not account for this delay is virtually useless in the process industry. In part 1, we described many subtle and nonintuitive interactions between successive batches.1 This inter-batch coupling plays a dominant role in determining the operating behavior and capacity of the process. Moreover, this coupling is a direct result of the rigorous enforcement of intermediate storage constraints. Thus, a scheduling solver that does not handle intermediate storage properly cannot accurately predict or describe process behavior.

As an example, consider a simple two-stage process making two products, P1 and P2. In the first stage, a reactor makes an intermediate material and in the second stage, the material is packaged into final product. There is a single reactor, but two packaging lines, with products P1 and P2 packaged on Line 1 and Line 2, respectively. No storage tanks are available for the reactor product, so this material must remain in the reactor until the packaging line consumes the entire batch. Only when the reactor has emptied all of one batch may another batch be started. Figure 1 illustrates a schedule for one batch of each product, but ignores the limitations of intermediate storage. Note that the reactor begins a second task immediately after finishing the first task. This schedule satisfies material balance constraints; no material inventories are negative, all intermediate materials are produced before they are consumed. Yet, the schedule is physically infeasible. Where does the P1 intermediate reside between 1 AM when it is produced and 4 AM when the last of this batch is fed to Line 1? The only vessel available to store this material is the reactor, but this schedule shows the reactor starting a second task at 1 AM. What is the reactor doing between 1 and 4 AM? Is it running the second batch or storing the material from the first batch? Physically, the reactor cannot do both. Furthermore, if the second reactor batch must be delayed, where does the material come from to feed the packing tasks on Line 2? These tasks are scheduled to begin at 2 AM, but as we will see when we examine a rigorous schedule below, the task that makes the material to feed these packing tasks will not even be started until 4 AM.

Figure 1

When we enforce the storage constraints, we get the schedule shown in Figure 2. Here, the initiation of the second reactor batch is delayed until the entire contents of the first batch are processed by the packing line. For both reactor batches, the reactor is occupied as a storage vessel until downstream equipment can consume its product. This schedule is physically feasible because we account for every pound of material, in both amount and location, at all times over the scheduling horizon. Note the effect of rigorous modeling on plant capacity. In Figure 1, production for both demands is completed in only 5 h. Figure 2 shows the time required to produce these products in the real world is 8 h. Even in this trivial example, failing to account for intermediate storage constraints results in a schedule that is physically meaningless and that seriously overestimates plant capacity.

Figure 2

One natural question that arises is: Why not just place upper limits on the amount of inventory that may be held for each material? It is tempting to believe that this relatively simple treatment would suffice in the real world, but it will not. The fault with this approach lies in the assumption that storage capacity for a particular material is independent of the amount of storage used for other materials. In most cases, this is an incorrect assumption.

Many processes have dozens of intermediate materials that are storable in multiple tanks. What then is the maximum allowable inventory for a material? Is it the sum of the capacities of all tanks that can store that material? This approach can lead to infeasible schedules because even a pound of material A stored in a tank renders that tank unavailable for storage of material B. In effect, introducing material A into a tank reduces the available storage capacity for all other materials that the tank could hold. Thus, if you have 10 intermediates storable in six tanks, then certainly no more than six of those intermediates may have positive inventory at any time over the scheduling horizon. This interaction is captured by the rigorous mathematical programming formulation described here (see eq 3). Simplistic methods such as spreadsheet analysis or scheduling algorithms that do not explicitly track material storage will ignore these physical constraints and hence produce schedules that are not executable in the real world.

DIFFERENCES IN SOLUTION STRATEGY BETWEEN SIMULATION AND MATHEMATICAL PROGRAMMING

Beyond simplistic averaging methods that can be performed in a spreadsheet, there are only two accurate approaches to biologics manufacturing analysis—discrete event simulation and mathematical programming. Here, we compare and contrast these approaches. The mathematical programming paradigm presented here represents a marked departure from traditional discrete event simulation methods.

Discrete Event Simulation

Discrete event simulation has been around for many years and a number of commercial simulator systems are available. These simulators work by initializing the conditions of the simulation, e.g., starting with all vessels idle and empty, and then defining events that alter this simulation state. The state is defined at a particular time—the simulation or current time. The simulation maintains the state only at the current value of tc. This state includes all of the material inventories at tc, and a list of events that are posted to be processed at times t > tc. When the next (earliest) event is handled, the state of the simulation may change and tc is advanced. The handling of an event will generate new events for future processing. For example, an event that initiates a batch task will create the event marking the end of that task as well as events that consume the task ingredients and produce the products at their respective times. Events must be processed in a chronological sequence, from earliest to latest. Because the state of the simulation exists only at the current value of tc, it is necessary for the simulation to produce a record of the time-dependent data as it progresses, so that a Gantt chart and material inventory traces will be available. Thus, the simulation can only see the current time; it cannot, for example, foresee the inventory levels of materials that will exist at future times. Events and data lying to the left of the current time t < tc, are part of the history and cannot be changed. In addition, new events may only be created for t > tc. As we discuss below, this is a serious limitation.

Mathematical Programming

The mathematical programming approach used here operates in a completely different fashion from discrete event simulation. The state of the model refers to and represents the model data at all times between the model start time and the horizon. The solver data structures maintain the information that describe all activities present in the solution, on all equipment over the entire timeline. These data, of course, represent the familiar Gantt chart. However, the solver data structures also maintain the material inventories and location for all materials over the entire timeline. These data represent a complete time history of inventory for each material. They also imply blocks on the Gantt chart that represent storage activities for vessels that have non-zero inventories over the timeline. This contrasts to simulation-based methods, and it affords the ability to handle more complex problems with improved computational speed and flexibility.

Simulation versus Mathematical Methods

Now that we understand the differences in the way data are stored and processed in these two different approaches, we can contrast how they behave in practice. Two major problems arise when applying simulation methods to a process such as the one discussed in this paper. The first is the sheer level of complexity of the process. Writing rules for dealing with hundreds of pieces of equipment and tasks is both tedious and error prone.

The second problem, however, is more severe. Because events must be processed in chronological order, if an event at time t has been processed, then no events at any earlier time may be considered. Consider what happens when a demand is scheduled in this process. The culture is initialized in the inoculation area, then moves sequentially through the plant to the bioreactors and purification systems, emerging as a final product some days later. It is quite natural to view the progress of an individual seed batch as it moves through the plant, much as a graduating class might ideally progress en bloc through a university. Our mathematical programming method addresses the entire timeline at all points in the solution algorithm execution. This means that in considering the scheduling of a batch, we have placed tasks on the timeline that are days or even weeks later than when the inoculation took place for that batch. When we consider an adjacent batch on the timeline, the solver is able to explore, going back on the timeline and placing new inoculation activities at times much earlier than the final tasks of the leftward batch. We can do this because our approach has no concept of present time. We are free to range over the entire timeline and insert activities anywhere they are feasible. This is possible because our solution represents not just the state of the plant at a particular time, but at all times. We represent and can account for all material inventories and locations over the entire horizon of the schedule. Of course, if events that happen on day one change inventory levels and alter the value of inventory available on subsequent days, the system must account for this causality to avoid generating infeasible solutions. This functionality is provided by the core VirtECS solver that obeys the UDM formulation given above, much as a commercial LP solver ensures that linear constraints are satisfied. Because we can range over the entire timeline, care must be taken to honor the time-phased discovery of stochastic information. If the intent is to model actual plant behavior, the algorithm cannot make decisions based on data that would not yet be known at a given time point. For example, the titer of a bioreactor batch is not known until it finishes. If a column repack fails, this information only becomes available at the point of failure. The algorithm described here takes care not to cheat the physics by using data to alter events that occurred on the timeline before the knowledge of that data was available.

Considering a demand within a mathematical programming framework is a simpler operation than doing so within a simulation. The outer algorithm contains logic that examines the availability times on the equipment and selects an appropriate equipment set for processing and storage. Then, a single call to the core solver can cause the demand to be scheduled, honoring all of the constraints in the UDM formulation. Such a demand-by-demand solution strategy ranges over the timeline from left to right and then back again until a complete solution is generated. For a single-product facility, such an approach works well. After completion of the UDM subproblem solution method in the core solver, control returns to the outer algorithm where the results can be evaluated and additional specialized tasks inserted if need be, before the next demand is considered. This is the point at which the history of the timeline is examined and process-specific constraints are enforced. For example, the cumulative loading on each column is examined by computing the weighted total of all tasks since the last column repack. If this value exceeds the specified limit, a repack task is scheduled on that column before subsequent demands are considered.

Another process-specific constraint handled in this way is the requirement that bioreactor batches enter the purification system within the required time after completion. If the most recent demand violates this condition, the core solver backtracks to the point before the demand was scheduled, and a demand is inserted for a discarded material. When this demand is considered, the core solver will manufacture the product up through the bioreactors, then use that material to feed to a disposal task, creating the demanded discarded material. This allows the solver to reproduce the effect of a bioreactor batch that is produced, but held too long and then discarded. Similar logic is used to accomplish the effect of batch failures that occur at a probability that may be specified by the user. Although some of these operations are possible using discrete event simulation, simulators are bound to the present time, so all decisions must be based on the state of the plant at tc. This makes developing logic difficult and the decisions tend to be myopic. There is no ability to view the entire timeline and make decisions based on the overall past, current, and a predicted future plan of activities. Also, programming the logic into a simulator devolves into a process of forming policies of the type: given state s, take action A. The programmer is left with the tedious exercise of trying to anticipate all possible relevant states and then decide on appropriate courses of action for each, using only the current state, or predictions of future state, as a decision basis. The simulation programmer takes on the responsibility that is fulfilled by the core solver in our approach.

A major advantage that our approach holds over discrete event simulation is that the process-specific outer algorithm is much less brittle. This means that the operational envelope over which the algorithm may be expected to work without alteration or human intervention is larger. The outer algorithm used here considers the process to be a sequence of stages, each of which has properties like preparation and process time. Each stage also has a list of parallel equipment on which its task can run. Adding more parallel equipment at a stage often is as easy as listing the new equipment in the input RTN file. Indeed, entire new stages may be added to the design without changing the solver customization. This reduces the time required for process engineers to explore major design changes.

Donald L. Miller is the co-founder and Derrick Schertz is a senior project engineer at Advanced Process Combinatorics, Inc., West Lafayette, IN. Christopher Stevens is an associate director of process support at Bristol-Myers Squibb, Inc., Devens, MA, 978.784.6413, christopher.stevens@bms.comJoseph F. Pekny is a professor of chemical engineering and interim head of industrial engineering at Purdue University, West Lafayette, IN.

REFERENCES

1. Miller D, Schertz D, Stevens C, Pekny JF. Results of a practical case study of the use of large scale mathematical programming for the design and analysis of a biologics facility. BioPharm Int. 2010;23(2):26–38.

2. Pantelides CC. Unified frameworks for the optimal process planning and scheduling. Proceedings of the Second Conference on Foundations of Computer aided Operations. 1994; New York.

3. Elkamel A. PhD thesis [dissertation]. Scheduling of process operations using mathematical programming techniques: towards a prototype decision support system. West Lafayette (IN): Purdue University; 1993.

4. Floudas CA, Lin X. Mixed integer linear programming in process scheduling: modeling, algorithms, and applications. Ann Oper Res. 2005:139:131–62.

5. Pekny JF. Algorithms architectures to support large-scale process systems engineering applications involving combinatorics, uncertainty, and risk management. Computers Chem Eng. 2002:26:239–67.