Similarity Search and Correlation-Based Exploratory Analysis in EHRs: A Case Study with COVID-19 Databases

Mirela T. Cazzolato, Lucas S. Rodrigues, Marcela X. Ribeiro, Marco A. Gutierrez, Caetano Traina Jr., Agma J. M. Traina
2021 Anais do XXXVI Simpósio Brasileiro de Banco de Dados (SBBD 2021)   unpublished
With the COVID-19 pandemic, many hospitals have collected Electronic Health Records (EHRs) from patients and shared them publicly. EHRs include heterogeneous attribute types, such as image exams, numerical, textual, and categorical information. Simply posing similarity queries over EHRs can underestimate the semantics and potential information of particular attributes and thus would be best supported by exploratory data analysis methods. Thus, we propose the Sketch method for comparing EHRs by
more » ... imilarity to provide a tool for a correlation-based exploratory analysis over different attributes. Sketch computes the overall data correlation considering the distance space of every attribute. Further, it employs both ANOVA and association rules with lift correlations to study the relationship between variables, allowing a deep data analysis. As a case study, we employed two open databases of COVID-19 cases, showing that specialists can benefit from the inference modules of Sketch to analyze EHRs. Sketch found strong correlations among tuples and attributes, with statistically significant results. The exploratory analysis has shown to complement the similarity search task, identifying and evaluating patterns discovered from heterogeneous attributes.
doi:10.5753/sbbd.2021.17863 fatcat:b6tjjecexngaplb4apccqcy4zu