Two-Step Heterogeneous Finite Mixture Model Clustering for Mining Healthcare Databases

Ahmed Najjar, Christian Gagne, Daniel Reinharz
2015 2015 IEEE International Conference on Data Mining  
Dealing with real-life databases often implies handling sets of heterogeneous variables. We are proposing in this paper a methodology for exploring and analyzing such databases, with an application in the specific domain of healthcare data analytics. We are thus proposing a twostep heterogeneous finite mixture model, with a first step involving a joint mixture of Gaussian and multinomial distribution to handle numerical (i.e., real and integer numbers) and categorical variables (i.e., discrete
more » ... alues), and a second step featuring a mixture of hidden Markov models to handle sequences of categorical values (e.g., series of events). This approach is evaluated on a realworld application, the clustering of administrative healthcare databases from Québec, with results illustrating the good performances of the proposed method.
doi:10.1109/icdm.2015.70 dblp:conf/icdm/NajjarGR15 fatcat:gq7vigydazeflcinehgmov55qe