Surrogate-based uncertainty and sensitivity analysis for bacterial invasion in multi-species biofilm modeling

A. Trucchia, M.R. Mattei, V. Luongo, L. Frunzo, M.C. Rochoux
2019 Communications in nonlinear science & numerical simulation  
In this work, we present a probabilistic analysis of a detailed one-dimensional biofilm model that explicitly accounts for planktonic bacterial invasion in a multi-species biofilm. The objective is (1) to quantify and understand how the uncertainty in the parameters of the invasion submodel impacts the biofilm model predictions (here the microbial species volume fractions); and (2) to spot which parameters are the most important factors enhancing the biofilm model response. An emulator (or
more » ... ogate") of the biofilm model is trained using a limited experimental design of size N = 216 and corresponding to a Halton's low-discrepancy sequence in order to optimally cover the uncertain space of dimension d = 3 (corresponding to the three scalar parameters newly introduced in the invasion submodel). A comparison of different types of emulator (generalized Polynomial Chaos expansion -gPC, Gaussian process model -GP) is carried out; results show that the best performance (measured in terms of the Q 2 predictive coefficient) is obtained using a Least-Angle Regression (LAR) gPC-type expansion, where a sparse Preprint submitted to Communications in Nonlinear Science and Numerical Simulation polynomial basis is constructed to reduce the problem size and where the basis coordinates are computed using a regularized least-square minimization. The resulting LAR gPC-expansion is found to capture the growth in complexity of the biofilm structure due to niche formation. Sobol' sensitivity indices show the relative prevalence of the maximum colonization rate of autotrophic bacteria on biofilm composition in the invasion submodel. They provide guidelines for orienting future sensitivity analysis including more sources of variability, as well as further biofilm model developments. Nomenclature 1 2 Recent experimental activity has highlighted that both in natural and 3 artificial environments, microorganisms preferentially exist in the form of 4 self-organized assemblages termed "biofilms", consisting of surface-associated 5 communities embedded in an exopolysaccharide matrix and organized into 6 microcolonies [1, 2]. The exopolysaccharide matrix corresponds to extracel-7 lular polymeric substances that are secreted by microorganisms into their 8 environment and that play an important role in the cell attachment to a 9 given surface and therefore in the biofilm formation. Bacteria in biofilms 10 differ substantially from free-living bacterial cells through a set of emerging 11 properties, including the formation of physical and social interactions, the 12 enhanced rate of gene exchange and the increased tolerance to antimicro-13 bials [1]. Such complex microbial communities drive biogeochemical cycling 14 processes of most elements in water, soil, sediment and subsurface environ-15 ments. They have been extensively used in biotechnological applications 16 such as waste-water and solid waste treatment, drinking water filtration, 17 biofuel production. Conversely, biofilms can cause persistent infections and 18 contamination of medical devices and implants; they are also responsible for 19 biofouling and process water contamination, quality deterioration of drinking 20 water and microbially influenced corrosion. 21 Many biofilm models have been proposed in the literature over the last 22 decades [3, 4]. Some of them have been derived in the framework of con-23 tinuum mechanics and formulated as differential equations based on (mass, 24 volume, momentum, energy) conservation principles [5, 6, 7, 8, 9, 10]. Oth-25 ers have been introduced as bottom-up models and assume biofilms to be 26 4 inherently stochastic living systems [11, 12, 13, 14, 15]. Still, biofilm model-27 ing remains a challenge, in particular since the biological processes involved 28 in biofilm formation and growth are highly nonlinear and since there is no 29 agreed-upon methodology to guide the user in the selection of the most ap-30 propriate model(s) and in the choice of the input parameters. For instance, 31 no reference values have been defined for these inputs [16], while they may 32 affect the nonlinear system in unpredictable ways. 33 In this context, studying the sensitivity of the biofilm model predictions 34 to the variability in the inputs provides a way to better understand the 35 response of the model to an arbitrary choice of parameters and to highlight 36 new insights into the underlying biological processes. To this aim, for each 37 set of input parameters θ = {θ 1 . . . θ d }, the output of the model is codified 38 into a set of quantities of interest y = {y 1 . . . y n }, leading to the definition of 39 the functional relation F 40 (1) In the framework of uncertainty quantification [17, 18], the set of input pa-41 rameters θ is considered uncertain and the objective is to propagate the 42 input uncertainties through the numerical model and to estimate the sub-43 sequent uncertainties in the quantities of interest y. In complement, global 44 sensitivity analysis methods [19, 20] provide valuable ways to characterize 45 the input-output model dependency F: they are helpful to derive a rele-46 vant screening of the input parameters, spot unimportant parameters and 47 focus the attention on the most relevant ones. These methods can be classi-48 fied in at least three categories: variance-based sensitivity analysis [21, 22], 49 5 derivative-based sensitivity analysis [23, 24, 25, 26], and moment-independent 50 sensitivity measures [27, 28, 29]. 51 Although the parameters involved in biofilm models may vary consider-52 ably and interact with each other to determine the model output, only few 53 attempts have been made in the past years to apply uncertainty quantifi-54 cation [30, 31] and sensitivity analysis to biofilm models at both local and 55 global levels [32, 33, 34, 35, 36, 37, 38]. Most of these studies refer to an ap-56 plication of the original Wanner-Gujer model [5], which is currently the most 57 widely used biofilm model in engineering applications. This model has been 58 integrated in AQUASIM [39], a computer program designed for simulating 59 aquatic systems and also for performing parameter estimation and sensitivity 60 analysis, see Refs. [33, 35, 36] related to global sensitivity analysis: Ref. [33] 61 presents a comparison between the qualitative Morris screening method and 62 the quantitative variance-based Fourier amplitude sensitivity test for a two-63 step nitrification biofilm model; Ref. [35] presents variance-based sensitivity 64 analysis applied to a one-dimensional biofilm model for ammonium and ni-65 trite oxidation for varying biofilm reactor geometry; and Ref. [36] calculates 66 sensitivity by performing model output linear regression for a complete au-67 totrophic nitrogen removal biofilm. 68 However, Wanner-Gujer-type biofilm modeling is not detailed enough to 69 study bacterial invasion mechanisms, which frequently occur and are crucial 70 in most of engineering applications. To overcome this modeling limitation, 71 a new class of continuum models for multi-species biofilm formation and 72 growth, which explicitly accounts for invasion mechanisms, has been recently 73 introduced [40, 41]. The novelty in such biofilm modeling class relates to the 74 6 introduction of a new state variable, which describes the concentration of 75 planktonic species within the biofilm. In this framework, the diffusion of the 76 free cells from the bulk liquid into the biofilm and inversely is described by a 77 diffusion-reaction equation; the growth processes are governed by a system of 78 nonlinear hyperbolic partial differential equations; and substrate dynamics 79 are governed by a system of semi-linear parabolic partial differential equa-80 tions. All equations are mutually connected so that the resulting system of 81 differential equations corresponds to a free boundary value problem, where 82 the free boundary is represented by the biofilm thickness. This model for-83 mulation aims at reproducing the colonization of new species diffusing from 84 bulk liquid to biofilm and the development of latent microbial species within 85 the biofilm, without explicitly prescribing boundary conditions for the invad-86 ing species at the free boundary. Such boundary conditions are determined 87 self-consistently by the model, instead of being set arbitrarily [42]. 88 This new class of continuum models can handle any number of microbial 89 species, both in sessile and planktonic states, as well as dissolved substrates. 90 One difficulty is that this type of model involves parameters related to species 91 invasion that are rather new in the literature and whose reference values are 92 not obvious to specify. To overcome this issue, we present in this study, a 93 variance-based sensitivity analysis approach that makes use of the well known 94 Sobol' indices [21, 43] to identify the most important parameters related to 95 bacterial invasion mechanisms. These Sobol' indices derived from variance 96 decomposition quantify the contribution of each uncertain parameter to the 97 variance of the quantities of interest. One non-intrusive way to compute 98 them is to build a Monte Carlo random sample of inputs and simulated 99 7 outputs [44]. While this approach is generic and robust, it is computationally 100 expensive due to a slow rate of convergence with respect to the sample size. 101 Due to the complexity of the biofilm model, this would require the order of 102 10 4 -10 5 biofilm model simulations and this is therefore far out of the available 103 computational budget. An alternative is to derive (or "train") an emulator of 104 the biofilm model using a limited sample of inputs and simulated outputs (or 105 "training set") and taking advantage of the regularity of the model response 106 F. Stated differently, the objective is to fit the emulator (or "surrogate") 107 over a dataset of biofilm model simulations and then to mimic in an accurate 108 and efficient way, the model response F for any set of parameters θ without 109 solving the original system of differential equations. Statistical information 110 on the quantities of interest and Sobol' indices can then be computed using 111 the emulator. Emulating can be regarded as a supervised learning procedure 112 and belongs to the field of machine learning [45]. 113 In this study, the objective is to build a surrogate that accurately repre-114 sents bacterial invasion as described by a recent multi-species biofilm model 115 and use it to perform uncertainty quantification and global sensitivity analy-116 sis. In order to provide results that are not algorithm-dependent, we compare 117 two families of popular surrogate models, namely generalized Polynomial 118 Chaos (gPC) [46, 47, 48, 49, 50, 51] and Gaussian Processes (GP) [52, 53, 119 54, 55, 56]. Comparison of gPC-expansion and GP-model have been reported 120 in the literature [57, 58, 59, 60]; Ref. [59] highlights that one approach does 121 not systematically outclass the other in terms of surrogate accuracy and com-122 putational efficiency, the best surrogate being application-dependent. It is 123 therefore of interest to compare gPC and GP approaches for biofilm appli-124 8 cations. The training step of the surrogate requires a sampling of the uncer-125 tain input space. The GP approach is known to be more accurate for less 126 structured design than tensor grid when performing sensitivity analysis [61]. 127 Consistently, the sampling is performed here using a low-discrepancy Halton's 128 sequence with a given budget N = 216. Due to the nonlinearities of the bi-129 ological processes involved, we investigate the impact of different choices of 130 the gPC polynomial basis (full or sparse) on the surrogate performance for 131 a fixed sample size N . Using a sparse polynomial basis may reduce the size 132 of the stochastic problem by only selecting the most significant basis compo-133 nents, and help to better capture a complex model response to variations in 134 the input parameters [62]. We consider here the least-angle regression (LAR) 135 approach to build a sparse gPC basis [63, 64], which was found to provide 136 the best performance among several sparse methods in Ref. [62]. 137 The biofilm model we use folds into the category of hyperbolic partial dif-138 ferential equations, meaning that the quantities of interest may feature sharp 139 variations, possibly discontinuities, for certain part of the input stochas-140 tic space. In this situation, building an accurate surrogate that covers the 141 whole input space when dealing with model nonlinearities remains a chal-142 lenge [47, 49, 50, 65]. One way to overcome this issue is to partition the 143 input space, to build local surrogates and combine them into a mixture-of-144 experts model [66]. It is thus of primary interest to investigate if building 145 a global surrogate is feasible for biofilm applications before moving to more 146 advanced settings such as mixture of experts. 147 In this work, the target problem represents a typical microbial interaction 148 occurring in waste-water treatment plants. Initially, the biofilm is only made 149 9 157 tainty quantification mostly deal with scalar outputs, while the biofilm model 158 output here is functional with spatial and temporal discretizations, n > 1 in 159 Eq. (1). Our approach consists here in building a surrogate at each time step 160 of interest, over the spatial grid associated to the model output [67, 68, 20]. 161 The paper is organized as follows. The biofilm model is described in Sec-162 tion 2. Section 3 presents the uncertain input parameters, the quantities 163 of interest, the stochastic framework and the experimental design to build 164 the training set. Section 4 presents the key ideas of the gPC and GP surro-165 gates. Uncertainty quantification and global sensitivity analysis results are 166 presented in Section 5. Conclusions and perspectives are outlined in Sec-167 tion 6. 168 2. Biofilm model 169 We present the recent continuum model [40] describing in a quantitative 170 and deterministic way, the bacterial invasion in multi-species biofilms [3]. 171 This model essentially consists of a modified Wanner-Gujer formulation ac-172 counting for the dynamics of the invading planktonic species as well as 173 10 substrate diffusion, attachment, detachment, microbial growth and biomass 174 spreading. Note that this model has been derived in one dimension and then 175 generalized to three dimensions [4]. In the present study, we consider the 176 one-dimensional model. 177 2.1. Free boundary value problem 178 The invasion model is formulated as a free boundary value problem for the 179 three state variables: (1) the concentration of microbial species in sessile form 180 X i (z, t), i = 1, . . . , N s , X = X 1 , . . . , X Ns ; (2) the concentration of planktonic 181 species ψ i (z, t), i = 1, . . . , N s , ψ = ψ 1 , . . . , ψ Ns ; and (3) the concentration 182 of dissolved substrates S j (z, t), j = 1, . . . , N m , S = S 1 , . . . , S Nm , including 183 the substrates provided by the bulk liquid and the metabolic waste products 184 related to microbial metabolism. Note that the state variables are functions 185 of time t and space z, with z denoting the one-dimensional spatial coordinate 186 assumed perpendicular to the substratum surface located at z = 0. Note also 187 that for generality, both the microbial species in sessile and planktonic states 188 are in number of N s , although in most of applications N s denotes the number 189 of all particulate components, such as extracellular polymeric substance, inert 190
doi:10.1016/j.cnsns.2019.02.024 fatcat:i2ylyaveibg35ddylwn3edalsq