Asymmetric Tobit Analysis for Correlation Estimation from Censored Data

HongYuan CAO, Tsuyoshi KATO
2021 IEICE transactions on information and systems  
Contamination of water resources with pathogenic microorganisms excreted in human feces is a worldwide public health concern. Surveillance of fecal contamination is commonly performed by routine monitoring for a single type or a few types of microorganism(s). To design a feasible routine for periodic monitoring and to control risks of exposure to pathogens, reliable statistical algorithms for inferring correlations between concentrations of microorganisms in water need to be established.
more » ... r, because pathogens are often present in low concentrations, some contaminations are likely to be under a detection limit. This yields a pairwise left-censored dataset and complicates computation of correlation coefficients. Errors of correlation estimation can be smaller if undetected values are imputed better. To obtain better imputations, we utilize side information and develop a new technique, the asymmetric Tobit model which is an extension of the Tobit model so that domain knowledge can be exploited effectively when fitting the model to a censored dataset. The empirical results demonstrate that imputation with domain knowledge is effective for this task. key words: censored data, Tobit analysis, asymmetric normal distribution, EM algorithm, non-negative least square Fig. 1 Water resources and uses. Fecal contamination in water resources leads to microbial risk of exposure to waterborne pathogens through various water uses including drinking, recreation, agriculture, and industry.
doi:10.1587/transinf.2021edp7022 fatcat:6dqmpub2pncizljejlq2fguvvy