Grid-based Approaches for Distributed Data Mining Applications [article]

Lamine M. Aouad, Nhien-An Le-Khac, Tahar Kechadi
2017 arXiv   pre-print
The data mining field is an important source of large-scale applications and datasets which are getting more and more common. In this paper, we present grid-based approaches for two basic data mining applications, and a performance evaluation on an experimental grid environment that provides interesting monitoring capabilities and configuration tools. We propose a new distributed clustering approach and a distributed frequent itemsets generation well-adapted for grid environments. Performance
more » ... aluation is done using the Condor system and its workflow manager DAGMan. We also compare this performance analysis to a simple analytical model to evaluate the overheads related to the workflow engine and the underlying grid system. This will specifically show that realistic performance expectations are currently difficult to achieve on the grid.
arXiv:1703.09807v1 fatcat:qb3todc5mzf65d64ougsnorytm