Simulation and optimization of dynamic flux balance analysis models using an interior point method reformulation

Felipe Scott, Pamela Wilson, Raúl Conejeros, Vassilios S. Vassiliadis
2018 Computers and Chemical Engineering  
This work presents a novel, differentiable, way of solving dynamic Flux Balance Analysis (dFBA) problems by embedding flux balance analysis of metabolic network models within lumped bulk kinetics for biochemical processes. The proposed methodology utilizes transformation of the bounds of the embedded linear programming problem of flux balance analysis via a logarithmic barrier (interior point) approach. By exploiting the first-order optimality conditions of the interior-point problem, and with
more » ... urther transformations, the approach results in a system of implicit ordinary differential equations. Results from four case studies, show that the CPU and wall-times obtained using the proposed method are competitive with existing state-of-the art approaches for solving dFBA simulations, for problem sizes up to genome-scale. The differentiability of the proposed approach allows, using existing commercial packages, its application to the optimal control of dFBA problems at a genome-scale size, thus outperforming existing formulations as shown by two dynamic optimization case studies. Keywords: Dynamic flux balance analysis, Ordinary differential equations with embedded optimization, Linear programming, Genome-scale metabolic network Genome-scale metabolic models provide a reliable representation of metabolism based on available in-2 formation of cellular systems [18]. These models enable the mathematical representation of the metabolic 3 processes occurring within the organism and may be analyzed further using available toolboxes based on 4 mathematical optimization methods [42]. 5 The central optimization task in metabolic networks is flux balance analysis (FBA; Orth et al. [27], Savinell 6 and Palsson [33]). The most attractive feature of FBA is its ability to make quantitative predictions about 7 a metabolic network without any need for detailed kinetic descriptions and given only the stoichiometry 8 of the reactions, thus the number of published reconstructed genome-scale metabolic models has increased 9 rapidly in recent years [40]. The only necessary inputs for FBA are the metabolic model (i.e., the network 10 stoichiometry), a biologically meaningful objective and the growth and environmental conditions defining the 11 substrates uptake rates. The fundamental assumption underlying FBA is that the system is at steady-state. 12 The steady-state mass balance equation for each metabolite and environmental and growth conditions are 13 mathematically described in the form of constraints for the optimization problem. Given that the system of 14 equations describing the steady-state mass balances is under-determined (i.e., more reactions than metabolites 15 exist), an infinite feasible solution set exists. To obtain a solution, a maximization principle is used as a 16 surrogate for the true (and always unknown) totality of interactions. Typically this objective function is the 17 maximization of the flux through the biomass formation reaction [27]. This results in a linear programming (LP) formulation that can be solved readily using existing tools, such as GAMS or MATLAB™, or metabolic 19 modeling frameworks such as the constrained-based modeling and analysis (COBRA) toolbox [34]. 20 Dynamic Flux Balance Analysis (dFBA) is an extension of FBA enabling the simulation of the cellular 21 dynamics of a culture system by assuming that cells reach an intracellular steady state rapidly in response to 22 changes in the extracellular environment. In this way, the rates of product and biomass formation predicted 23 by FBA are used to update the extracellular concentration in the environment. In turn, the changes in 24 the environment produce variations in the uptake rates of substrates required for growth and metabolites 25 production. In this way, the kinetics of the extracellular concentrations of substrates and products, often 26 modeled as Ordinary Differential Equations (ODEs), are coupled to an FBA model, i.e, an LP problem. Sev-27 eral strategies have been developed to simulate dFBA models; and have been classified by Höffner et al. [15], 28 who also offer a list of applications, as the Static Optimization Approach (SOA), the dynamic Optimization 29 Approach (DOA) and the Direct Approach (DA). 30 The SOA approach uses the forward Euler's method to integrate the upper level ODE system and at each 31 time step the embedded LP problem is solved using a suitable solver. As recognized by Gomez et al. [11], since 32 most dFBA models are stiff, small time steps are required to ensure convergence, thus a large number of LP 33 problems need to be solved to calculate the trajectory of the system, making this approach computationally 34 expensive. The DOA approach is an attempt to use collocation methods to avoid the embedded nature of 35 the dFBA problem. This approach discretizes the time horizon and transforms the problem to a nonlinear 36 programming (NLP) problem. Mahadevan et al. [22] analyzed a network of 54 metabolites and 85 reactions 37 using both the SOA and DOA approaches, concluding that the large number of constraints and variables 38 introduced in the DOA approach limits its applicability to larger metabolic networks. 39 The DA approach includes the LP solver in the right hand side evaluator for the ordinary differential 40 equations. Although this requires obtaining a solution of the LP problem at every evaluation of the right 41 hand side, this approach can be implemented within implicit ODE integrators with adaptive step size for error 42 control, thus reducing the number of integration steps compared to the use of SOA. In this regard, Gomez 43 et al. [11] presented a DA implementation in MATLAB™, DFBAlab, that incorporates an LP feasibility 44 problem and lexicographic linear optimization problems to generate an extended dynamic system for which 45 the LP always has a solution. Lexicographic optimization augments the original LP problem, where typically 46 the specific growth rate is maximized, by adding constraints from a user-predefined list of fluxes to deal with 47 the (possible) existence of multiple flux distributions resulting in the same objective function. In this way, 48 the LP problem is first solved by optimizing, for example, the specific growth rate. Next, a constraint is 49 added specifying that the biomass flux should be equal or higher than the obtained optimum value and the 50 next objective function in the predefined list is used. This idea has been used recently by Harwood et al. [12] 51 and extended by exploiting the fact that, during an integration period, the optimal basis of the LP could 52 remain unchanged thus transforming the dFBA problem in a system of semi-explicit index-1 differential 53 algebraic equations. They also devised methods for detecting a change in the optimal basis of the LP and to 54 update it. In this way, obtaining the solution of a dFBA simulation problem reduces to the integration of a 55 semi-explicit index-1 system of equations until a change in the basis is detected. After updating the optimal 56 basis, the integration can continue until the end of the integration horizon is attained. Theoretically this 57 is the most elegant way for solving embedded LP problems within an ODE system and provides the most 58 accurate solution without any approximation error. However, the active set method proposed by Höffner 59 et al. [15] and Harwood et al. [12], is entirely equivalent to a basis identification method, such as the one 60 implemented in DFBAlab. As such, it leads to a dynamic simulation that requires continuous monitoring 61 and identification of any active set changes. This in turn constitutes a dynamic simulation involving discrete 62 events (hybrid system). 63 Finally, Zhao et al. [41] propose a solution approach for dFBA problems with nonlinear objective functions, 64 such as the maximization of the biomass yield or the maximization of the ATP yield per flux unit. In 65 this approach, the Karush-Kuhn-Tucker (KKT) conditions of the LP are embedded resulting in a quasi 66 differential-algebraic system of equations. Since the active set may change during the simulation, they use an 67 extreme-ray-based transformation to update the active set. The largest problem solved using this approach 68 consists of 45 intracellular reactions and took nearly 20 seconds. Considering that genome-scale models 69 include thousands of reactions, new methods for solving large dFBA problems are required. 70 The system of differential equations with an embedded linear optimization problem can be reformulated 71 as an index-1 differential and algebraic (DAE) system of equations by using the KKT conditions of the 72 73 relationship between variables where at least one of the variables must be at its bound [2], see Eqs. (9e) to 74 (9h) for a typical set of complementarity constraints. These constraints are linearly dependent, which within 75 the context of the aforementioned DAE system renders it unsolvable as it will have a linearly dependent 76 Jacobian, as noticed by Zhao et al. [41]. 77 Increasing demands for the sustainable and economically optimized synthesis of bioproducts, energy 78 requirements and environmental concerns and demands for microbial strains that can produce valuable bio-79 chemicals led to efforts to improve the yield and productivity of fermentation processes by optimizing batch 80 or fed-batch operation of bioreactors. Optimal control and parameter estimation applications of genome-scale 81 dFBA models have been severely limited due to the computational burden of embedding a dFBA model into 82 an optimal control problem for large models. In this regard, the solution of the bilevel optimization problem 83 has been approached by resorting to its reformulation as a mathematical program with complementarity 84 constraints (MPCC) [2]. In this approach the optimality conditions of the inner (FBA) optimization problem 85 are imposed as constraints on the outer problem and the differential equations are discretized using different 86 collocation strategies [4] leading to a NLP problem. This approach requires handling the complementary 87 constraints by one of several regularization techniques to avoid the non-uniqueness of the constraint multipli-88 ers. Regularization approaches include the relaxation of the right hand side of the complementary constraint 89 by a small positive value whose value decreases as the optimization proceeds, or the inclusion of a penalized 90 sum of the complementary constraints in the objective function (see Section 3 in Baumrucker et al. [2]). 91 Finally, MPCC solvers have been developed such as CONOPT-C [30] and also automatic reformulation tools 92 of MPCCs are available such as the NLPEC meta-solver in GAMS, which allows using standard NLP solvers 93 in GAMS. 94 The MPCC reformulation approach for solving optimization problems with embedded dFBA problems 95 has been previously reported in the literature. Hjersted and Henson [13] studied the fed-batch optimization 96 of a bioreactor with a small-scale model of S. cerevisiae metabolism. The approach used was to discretize the 97 state variables, to model the feed stream as piecewise constant control inputs in time and to replace the LP 98 problem by its KKT conditions, with this resulting in a nonlinear problem whose solution is limited by the 99 size of the network. One year later, Hjersted et al. [14] presented a genome-scale analysis of the production 100 of ethanol by S. cerevisiae in fed-batch culture, where in this work no attempts were made to optimize the 101 performance of the fed-batch culture using optimal control, presumably because the metabolic model was 102 too large to be handled by the current solution methods. 103 Kaplan et al. [17] proposed a parameter estimation formulation using a dFBA model of a yeast (42 metabo-104 lites and 48 reactions) handling the complementary constraints obtained by including the KKT conditions of 105 the inner LP by using a Fischer-Burmeister smoothing function [2]. Raghunathan et al. [31] used variational 106 inequalities to model switches in the objective function and in the uptake rates of substrates in a dFBA 107 model of S. cerevisiae with 39 reactions. The model was embedded in a parameter estimation problem aimed 108 to obtain the biomass composition in terms of macromolecular fractions of proteins, carbohydrates, nucleic 109 acids and lipids. The optimization problem was reformulated as an MPCC and solved using CONOPT-C. 110 Although the metabolic network analyzed was small, the resulting MPCC contains 33066 variables and 26192 111 constraints. Recently, Emenike et al. [9] applied a similar MPCC-collocation approach to the in-silico op-112 timization of the production of recombinant proteins in Pichia pastoris. The metabolic network consists of 113 37 metabolites and 47 reactions, far from the available genome-scale metabolic models P. pastoris, such as 114 iPP668, composed of 1.361 reactions and 1.177 metabolites [5]. 115 Thereby, new methods are required in dFBA so as to be able to address three key points; (a) produce 116 a differentiable simulation of dBFA so that it can be embedded in an optimal control solver, (b) be faster 117 computationally than existing methods and able to handle genome-scale metabolic networks, and, (c) be 118 able to deal with non-linear objective functions, also under the proviso that the weak Slater's condition [3] is 119 satisfied for the resulting non-linear FBA problem (so that strong duality holds, as is the case with the LP 120 formulation in this work). 121 In this work, a new approach for the solution of dFBA models is presented. The method relies on a trans-122 formation of the dFBA model to an implicit system of ordinary differential equations. This transformation is 123 accomplished by using a logarithmic barrier approach (an Interior Point approach) for the inner LP problem. 124 This approach is advantageous since it does not require the detection of a feasible set or an optimal basis, 125 neither requires the repeated solution of LP problems. Moreover, our approach can be applied directly to 126 3 solve dynamic optimization problems with an embedded dFBA model. Hence, all three points presented in 127 the previous paragraph are addressed with the contributions put forward with our present work, with actual 128 implementation of non-linear objectives being a case that will be addressed in a future publication. 129 This paper is organized as follows: section 2 introduced the FBA and dFBA models as well as some 130 useful properties of the interior point methods for the solution of LP problems, section 3 presents the interior 131 point based formulation to solve dFBA models as implicit systems of ODEs. Finally, section 4 presents seven 132 examples covering the simulation and dynamic optimization of dFBA models. 133 2. Theoretical background and problem formulation 134 2.1. Dynamic Flux Balance Analysis, dFBA 135 In the context of Flux Balance Analysis models, a mass balance for every identified metabolite is used to 136 derive the stoichiometry of a set of the biochemical reactions taking place inside a cell and the transport of 137 metabolites across the cell membrane, resulting in a set of linear equations. Let v be a vector of n fluxes, 138 formed by each reaction rate expressed in mmol per hour per gram of dry biomass (mmol(gDWh) −1 ). The flux 139 of biomass is expressed as the specific growth rate (gDW(gDWh) −1 ) to match the units of the experimental 140 measurements. In FBA it is assumed that the rate of the internal reactions and transport rates of the m 141 metabolites are faster when compared to the dynamics of the fermentation, resulting in a quasi-steady state 142 mass balance [37] that can be expressed as Nv = 0, where N is an m × n matrix with rank(N) = m. 143 Generally, the resulting system of linear equations cannot be solved as the number of variables is larger 144 than the number of equations. Thereby, it is assumed that a cellular objective exists, such as the maximization 145 of the specific cell growth under the prevailing external conditions. Hence, the following linear programming 146 157 such as lexicographic optimization [12]. 158 A second shortcoming of FBA is that it may highlight parts of the biochemical reaction pathway as active 159 when in reality there is no way of being certain regarding their activity without further experimental confir-160 mation. This is the result of the problem being incomplete in terms of having more variables than equations 161 to obtain a solution which defines the state of the network (the full set of independent fluxes). This second 162 problem is an inevitability regardless of the method chosen to solve the FBA LP problem. 163 164 the Direct Approach, no event detection was used and the integration was forced to continue during the 595 step-wise change in the oxygen concentration. This results in 484 LP problems solved during the integration 596 in the Direct Approach using ODE15s compared to 22 LPs solved by DFBAlab. 597
doi:10.1016/j.compchemeng.2018.08.041 fatcat:fgwydxbtencqfbse7nyc3g5jli