Spark (Lightning-fast Cluster Computing) Application in Telecommunication Sector to Prevent Customer Churn out

Rajkumar Khatri, Anil
2015 Anil Kumar International Journal of Computer & Mathematical Sciences IJCMS   unpublished
Apache Spark™ is a fast and general engine for large-scale data processing. Apache Spark is current buzzword which is gaining enormous attention because of its lightning-fast in-memory cluster computation of Big data. Spark programs are100x faster than Hadoop MapReduce in memory, or 10x faster on disk.Many organizations are using Spark for the processing of large datasets, Spark is igniting the world of big data analytics which requires fast performance, such as , interactive querying,
more » ... processing, large-scale batch computations as well as streaming, and graph computations. In this paper, we are finding the Call Details Records of customers facing frequent call drops in Roaming. This helps telecom companies to prevent customer churn out,to improve the connectivity issues in specific areas. 1. INTRODUCTION Apache Spark is an open source big data processing framework which is best suited because of it's in-memory speed, ease of use, and sophisticated analytics. Spark was originally developed in 2009 in UC Berkeley's AMPLab, and open sourced in 2010 as an Apache project.Spark has many advantages compared MapReduce, Storm and other hadoop technologies. Spark provides a comprehensive and unified framework to manage big data processing requirements with diverse datasets whether they are batch or real-time streaming data. Spark applications in Hadoop clusters run up to 100 times faster in memory and 10 times faster on disk. We can quickly write Spark applications in Java, Scala, or Python. It has built-in set of over 80 high-level operators. And we can interactively use it to query data within the shell. it supports Map and Reduce operations as well as SQL queries, streaming data, machine learning and graph data processing. In this paper, we have a CDR (Call Details Record) file; and find out top 20 customers facing frequent call drops in Roaming. This is a very important report which telecom companies use to prevent customer churn out, by calling them back and at the same time contacting their roaming partners to improve the connectivity issues in specific areas.