Performance characterization and analysis for Hadoop K-means iteration

Joseph Issa
2016 Journal of Cloud Computing: Advances, Systems and Applications  
The rapid growth in the demand for cloud computing data presents a performance challenge for both software and hardware architects. It is important to analyze and characterize the data processing performance for a given cloud cluster and to evaluate the performance bottlenecks in a cloud cluster that contribute to higher or lower computing processing time. In this paper, we implement a detailed performance analysis and characterization for Hadoop K-means iterations by scaling different
more » ... different processor micro-architecture parameters and comparing performance using Intel and AMD processors. This leads to the analysis of the underlying hardware in a cloud cluster servers to enable optimization of software and hardware to achieve maximum performance possible. We also propose a performance estimation model that estimates performance for Hadoop K-means iterations by modeling different processor micro-architecture parameters. The model is verified to predict performance with less than 5 % error margin relative to a measured baseline.
doi:10.1186/s13677-016-0053-0 fatcat:h7y4zeq7qrdnppl7xtirppfkea