An Automated Report Generation Tool for the Data Understanding Phase [chapter]

Juha Vesanto, Jaakko Hollmén
2004 Studies in Fuzziness and Soft Computing  
To prepare and model data successfully, the data miner needs to be aware of the properties of the data manifold. In this paper, the outline of a tool for automatically generating data survey reports for this purpose is described. The report combines linguistic descriptions (rules) and statistical measures with visualizations. Together these provide both quantitative and qualitative information and help the user to form a mental model of the data. The main focus is on describing the cluster
more » ... ture and the contents of the clusters. The data is clustered using a novel algorithm based on the Self-Organizing Map. The rules describing the clusters are selected using a significance measure based on the confidence on their characterizing and discriminating properties. Copyright 2002 Springer-Verlag. processing data preparation cluster analysis variable analysis visualizations descriptions summary tables REPORT interactive analysis reporting models lists Figure1 . Data understanding as an iterative process. The data is prepared and fed into the analysis system which generates the data survey report. Based on the findings in the report and possibly further insights based on interactive investigation, the data miner may either proceed with the next data mining phase, or prepare the data set better and, with a push of a button, make a new report. The area within the dashed box corresponds to the implemented system.
doi:10.1007/978-3-540-39615-4_8 fatcat:riej6d4fwvgubf7ync7ip7xu6m