Filling the gaps of in situ hourly PM2.5 concentration data with the aid of empirical orthogonal function analysis constrained by diurnal cycles
Atmospheric Measurement Techniques
Abstract. Data gaps in surface air quality measurements significantly impair the data quality and the exploration of these valuable data sources. In this study, a novel yet practical method called diurnal-cycle-constrained empirical orthogonal function (DCCEOF) was developed to fill in data gaps present in data records with evident temporal variability. The hourly PM2.5 concentration data retrieved from the national ambient air quality monitoring network in China were used as a demonstration.
... a demonstration. The DCCEOF method aims to reconstruct the diurnal cycle of PM2.5 concentration from its discrete neighborhood field in space and time firstly and then predict the missing values by calibrating the reconstructed diurnal cycle to the level of valid PM2.5 concentrations observed at adjacent times. The statistical results indicate a high frequency of data gaps in our retrieved hourly PM2.5 concentration record, with PM2.5 concentration measured on about 40 % of the days suffering from data gaps. Further sensitivity analysis results reveal that data gaps in the hourly PM2.5 concentration record may introduce significant bias to its daily averages, especially during clean episodes at which PM2.5 daily averages are observed to be subject to larger uncertainties compared to the polluted days (even in the presence of the same amount of missingness). The cross-validation results indicate that our suggested DCCEOF method has a good prediction accuracy, particularly in predicting daily peaks and/or minima that cannot be restored by conventional interpolation approaches, thus confirming the effectiveness of the consideration of the local diurnal variation pattern in gap filling. By applying the DCCEOF method to the hourly PM2.5 concentration record measured in China from 2014 to 2019, the data completeness ratio was substantially improved while the frequency of days with gapped PM2.5 records reduced from 42.6 % to 5.7 %. In general, our DCCEOF method provides a practical yet effective approach to handle data gaps in time series of geophysical parameters with significant diurnal variability, and this method is also transferable to other data sets with similar barriers because of its self-consistent capability.