Modelling salinity in river systems using hybrid process and data-driven models
Hydrology and Earth System Sciences Discussions
Salinity modelling in river systems is complicated by a number of processes, including in-stream salt transport and various mechanisms of saline accession that vary dynamically as a function of water level and flow, often at different temporal scales. Traditionally, salinity models in rivers have either been process- or data-driven. The primary problem with process-based models is that in many instances, not all of the underlying processes are fully understood or able to be represented
... epresented mathematically, and that there are often insufficient historical data to support model development. The major limitation of data-driven models, such as artificial neural networks (ANNs), is that they provide limited system understanding and are generally not able to be used to inform management decisions targeting specific processes, as different processes are generally modelled implicitly. In order to overcome these limitations, a hybrid modelling approach is introduced and applied in this paper. As part of the approach, the most suitable sub-models are developed for each sub-process affecting salinity at the location of interest based on consideration of model purpose, degree of process understanding and data availability, which are then combined to form the hybrid model. The approach is applied to a 46&thinsp;km reach of the River Murray in South Australia, which is affected by high levels of salinity. In this reach, the major processes affecting salinity include in-stream salt transport, accession of saline groundwater along the length of the reach and the flushing of three waterbodies in the floodplain during overbank flows of various magnitudes. Based on trade-offs between the degree of process understanding and data availability, a process-driven model is developed for in-stream salt transport, an ANN model is used to model saline groundwater accession and three linear regression models are used to account for the flushing of the different floodplain storages. The resulting hybrid model performs very well on approximately three years of daily validation data, with a Nash-Sutcliffe efficiency (NSE) of 0.89 and a root mean squared error (RMSE) of 12.62&thinsp;mg&thinsp;L<sup>&minus;1</sup> (over a range from approximately 50 to 250&thinsp;&thinsp;mg&thinsp;L<sup>&minus;1</sup>). Each component of the hybrid model results in noticeable improvements in model performance corresponding to the range of flows for which they are developed. The predictive performance of the hybrid model is significantly better than that of a benchmark process-driven model (NSE&thinsp;=&thinsp;&minus;0.14, RMSE&thinsp;=&thinsp;41.10&thinsp;mg&thinsp;L<sup>&minus;1</sup>) and slightly better than that of a benchmark data-driven (ANN) model (NSE&thinsp;=&thinsp;0.83, RMSE&thinsp;=&thinsp;15.93&thinsp;mg&thinsp;L<sup>&minus;1</sup>). However, apart from improved predictive performance, the hybrid model also has advantages over the ANN benchmark model in terms of increased capacity for improving system understanding and greater ability to support management decisions.