Machine Learning Regression Approaches for Colored Dissolved Organic Matter (CDOM) Retrieval with S2-MSI and S3-OLCI Simulated Data

Ana Ruescas, Martin Hieronymi, Gonzalo Mateo-Garcia, Sampsa Koponen, Kari Kallio, Gustau Camps-Valls
2018 Remote Sensing  
The colored dissolved organic matter (CDOM) variable is the standard measure of humic substance in waters optics. CDOM is optically characterized by its spectral absorption coefficient, a CDOM at at reference wavelength (e.g., ≈ 440 nm). Retrieval of CDOM is traditionally done using bio-optical models. As an alternative, this paper presents a comparison of five machine learning methods applied to Sentinel-2 and Sentinel-3 simulated reflectance (R rs ) data for the retrieval of CDOM: regularized
more » ... linear regression (RLR), random forest regression (RFR), kernel ridge regression (KRR), Gaussian process regression (GPR) and support vector machines (SVR). Two different datasets of radiative transfer simulations are used for the development and training of the machine learning regression approaches. Statistics comparison with well-established polynomial regression algorithms shows optimistic results for all models and band combinations, highlighting the good performance of the methods, especially the GPR approach, when all bands are used as input. Application to an atmospheric corrected OLCI image using the reflectance derived form the alternative neural network (Case 2 Regional) is also shown. Python scripts and notebooks are provided to interested users. Remote Sens. 2018, 10, 786 2 of 25 (CDOM), concentration of chlorophyll_a (Chl-a), occurrence of surface accumulating algal blooms, concentration of phycocyanin, and Secchi depth, e.g., [5] [6] [7] [8] [9] [10] . The development of algorithms that do not require extensive in situ sampling for training is a central aim in remote sensing of water quality [11] . The atmospheric correction, previous step to derive water leaving reflectance, has turned out to be demanding particularly for non-oligotrophic waters [12, 13] , and especially complicated for darker CDOM-rich waters so called Case-2 absorbing (C2A) or extreme absorbing waters (C2AX) [14] . The water leaving signal is very low and the proportion of atmospheric noise can be very high (until 95% of the signal [15] ). This atmospheric correction issue is, however, not part of the developments of this research, even though is a factor to take into account. C2A and C2AX waters dominated by dissolved organic matter (DOM) are the focus of this research. DOM in the ocean has a relevant role and impact on the global carbon cycle, in concrete the colored component of DOM, which absorbs light exponentially decreasing from the ultra-violet (UV) to the visible parts of the spectrum. The estimation of CDOM from remote sensing data, as a proxy for dissolved organic carbon (DOC), requires of accurate algorithms [16] . In the boreal temperate and cold regions like Finland, Sweden and Estonia, humic waters in lakes and some coastal zones are abundant. These waters typically have fairly low TSM and Chl-a concentrations, even though some cases of "black lakes" with high Chl-a and TSM values have been reported too [17] . In these cases, the reflectance is negligible in the visible, and only in the red-near infrared is some times possible to detect the Chl-a signal. Within ESA's C2X project, extreme absorbing waters were characterized by CDOM absorption a CDOM (440) > 1 m −1 , which results in very low reflectance, typically with maximum below 0.005 sr −1 [14] . In Finnish lakes, for instance, the median absorption coefficient of CDOM at 443 nm is around 3.7 m −1 [18, 19] . In Finland the humic matter concentration of lakes correlates with the share of peat land in the drainage area [20] . Humic lakes can also originate from peat dredging, e.g., in the Netherlands. Information on humic substances is utilized in the application of official directives, lake management and climate change studies. The a CDOM parameter is considered to be a measure of dissolve organic carbon, which could help to estimate CO 2 efflux and to assess the carbon pool in carbon budget studies [19] . An accurate measurement of the a CDOM parameter from remote sensing seems crucial in these types of water. However, it is known that CDOM is one of the most critical and uncertain ocean color (OC) product [21, 22] . In the work presented here, we focus on the CDOM estimation, showing results of the application of several machine learning (ML) algorithms in 'typical boreal waters', with medium to extreme CDOM absorption. The main objective is the retrieval of the CDOM variable using the sensors developed by the European Space Agency (ESA) as part of the Copernicus Earth Observation Programme: Sentinel-2 Multi-Spectral Instrument (S2-MSI) and Sentinel-3 Ocean and Land Colour Instrument (S3-OLCI). The S2-MSI sensor and other high-spatial-resolution instruments have the drawback that, since they are designed initially for terrestrial applications, their spectral resolution, measurement frequency, and radiometric characteristics are not optimized for water quality mapping. Even though, S2-MSI gives more accurate water quality estimates through its enhanced channel configuration and better temporal resolution than other high resolution satellites like Landsat (see Section 2 for details). S3-OLCI is designed to measure ocean color over ocean and coastal zones with 300 m of spatial resolution (see Section 2 for specifications). With a very good signal-to-noise ratio, good radiometric stability, mitigation of sun-glint contamination and excellent cover of the global ocean, S3-OLCI images the spectral distribution of the radiance at top-of-atmosphere. After atmospheric correction, the upwelling radiance just above the sea surface (the water-leaving radiance) is retrieved and used to estimate a number of geophysical parameters through the application of specific bio-optical algorithms. S3-OLCI provides information on the atmosphere too, especially on the aerosols characterization necessary for the atmospheric correction process. Concerning the retrieval of CDOM absorption, several band ratios have been proposed as predictive models for estimating CDOM from spectral data [11, 23, 24] . These parametric approaches only take into account a few spectral bands and thus they disregard the information contained in
doi:10.3390/rs10050786 fatcat:bkdjc5s4nzflvcfteulsebqtgm