EVALUATING THE ACCURACY OF USING CROSS DATASETS TO PREDICT NEW LOCAL HEART DISEASES CASES
Journal of Theoretical and Applied Information Technology
Extracting hidden knowledge from healthcare datasets are quite important for the health sector. It is a common practice that health organizations often focus on their local data to build prediction model that can be used to predict and identify some popular diseases, heart diseases are no exception. The main challenge that faces health organizations around the world is how to generalize the prediction model on different cases collected from different places. It is well recognized that the
... prediction models are built over data collected from a specific community, but there is a lack of confirmation if this model can be applied for data collected from different communities. In this paper we turn our attention to heart diseases problem. In this work, we empirically examine the prediction accuracy of different classification algorithms when different medical datasets are used for learning and testing. Specifically, three studies were developed to determine how successfully we can generalize a model that is built based on a dataset obtained from a health organization and then used to predict new cases from different one. In the first study we developed and tested classification models over each individual dataset, whereas in the second study we developed classification models over a dataset and tested using another dataset. In the last study, we made a merge between the employed datasets, then a classification model is built and tested over the merged dataset. Results from these studies confirm that using a classification model built from different dataset and used to predicted cases from another dataset is generally reasonable and accurate. They also confirm that merging heart disease datasets that have the same structure are useful for identifying potential cases.