Combining complex networks and data mining: why and how
The increasing power of computer technology does not dispense with the need to extract meaningful in- formation out of data sets of ever growing size, and indeed typically exacerbates the complexity of this task. To tackle this general problem, two methods have emerged, at chronologically different times, that are now commonly used in the scientific community: data mining and complex network theory. Not only do complex network analysis and data mining share the same general goal, that of
... ing information from complex systems to ultimately create a new compact quantifiable representation, but they also often address similar problems too. In the face of that, a surprisingly low number of researchers turn out to resort to both methodologies. One may then be tempted to conclude that these two fields are either largely redundant or totally antithetic. The starting point of this review is that this state of affairs should be put down to contingent rather than conceptual differences, and that these two fields can in fact advantageously be used in a synergistic manner. An overview of both fields is first provided, some fundamental concepts of which are illustrated. A variety of contexts in which complex network theory and data mining have been used in a synergistic manner are then presented. Contexts in which the appropriate integration of complex network metrics can lead to improved classification rates with respect to classical data mining algorithms and, conversely, contexts in which data mining can be used to tackle important issues in complex network theory applications are illustrated. Finally, ways to achieve a tighter integration between complex networks and data mining, and open lines of research are discussed.