A Novel Big Data Approach to Classify Bank Customers - Solution by Combining PIG, R and Hadoop

Lija Mohan, Sudheep Elayidom M.
2016 International Journal of Information Technology and Computer Science  
Large amount of data that is characterized by its volu me, velocity, veracity, value and variety is termed Big Data. Extracting hidden patterns, customer preferences, market trends, unknown correlations, or any other useful business information fro m large collection of structured or unstructured data set is called Big Data analysis. This article exp lores the scope of analyzin g bank transaction data to categorize customers wh ich could help the bank in efficient marketing, improved customer
more » ... rvice, better operational efficiency, increased profit and many other hidden benefits. Instead of relying on a single technology to process large scale data, we make use of a co mbination of strategies like Hadoop, PIG, R etc for efficient analysis. RHadoop is an upcoming research trend for Big Data analysis, as R is a very efficient and easy to code, data analysis and visualizat ion tool compared to tradit ional Map Reduce program. K-Means is chosen as the clustering algorith m for classification.
doi:10.5815/ijitcs.2016.09.10 fatcat:tucdibuabvf5bbhspqna6jlizq