Filters








1,471 Hits in 3.1 sec

Interactive and automated debugging for big data analytics

Muhammad Ali Gulzar
2018 Proceedings of the 40th International Conference on Software Engineering Companion Proceeedings - ICSE '18  
We seek to address these challenges with the development of BIGDEBUG, a framework providing interactive debugging primitives and tool-assisted fault localization services for big data analytics.  ...  We showcase the data provenance and optimized incremental computation features to effectively and efficiently support interactive debugging, and investigate new research directions on how to automatically  ...  To address this scalability challenge, our work Titian [14] implements data provenance in Apache Spark by directly extending the RDD abstraction with finegrained data provenance capabilities.  ... 
doi:10.1145/3183440.3190334 dblp:conf/icse/Gulzar18 fatcat:o36lxubmjzfqxmv6p2kfkufkia

Logging Reservoir Evaluation Based on Spark

Meng-xin SONG, Hong-ping MIAO, Yao SUN
2018 DEStech Transactions on Computer Science and Engineering  
However, as data size grow rapidly, the efficiency of manual analysis is low. With the big data technology becoming more and more mature, we can use big data platform to evaluate logging reservoir.  ...  By constructing a 3 nodes IBM BigInsights big data platform, using the decision tree algorithm of Spark, we accomplished the evaluation of logging reservoir.  ...  of value-added services that can be installed on top of the IBM Open Platform with Apache Spark and Apache Hadoop, it provides a complete solution, including Spark, to scale analytics quickly and easily  ... 
doi:10.12783/dtcse/wcne2017/19888 fatcat:bv4izgwbv5halbvmxuuo7pzysq

Big data analytics on Apache Spark

Salman Salloum, Ruslan Dautov, Xiaojun Chen, Patrick Xiaogang Peng, Joshua Zhexue Huang
2016 International Journal of Data Science and Analytics  
In addition, we highlight some research and development directions on Apache Spark for big data analytics.  ...  Apache Spark has emerged as the de facto framework for big data analytics with its advanced in-memory programming model and upper-level libraries for scalable machine learning, graph analysis, streaming  ...  A unified engine for big data analytics As the next-generation engine for big data analytics, Apache Spark can alleviate key challenges of data preprocessing, iterative algorithms, interactive analytics  ... 
doi:10.1007/s41060-016-0027-9 dblp:journals/ijdsa/SalloumD0PH16 fatcat:gtzw3aqupnhxvcjbefovrnfhne

IncApprox

Dhanya R. Krishnan, Do Le Quoc, Pramod Bhatotia, Christof Fetzer, Rodrigo Rodrigues
2016 Proceedings of the 25th International Conference on World Wide Web - WWW '16  
We implemented our algorithm in a data analytics system called INCAPPROX based on Apache Spark Streaming.  ...  Incremental and approximate computations are increasingly being adopted for data analytics to achieve low-latency execution and efficient utilization of computing resources.  ...  Rodrigo Rodrigues is partially supported by FCT with reference UID/CEC/50021/2013 and an ERC Starting Grant (ERC-2012-StG-307732).  ... 
doi:10.1145/2872427.2883026 dblp:conf/www/KrishnanQBFR16 fatcat:ukywbrylyjfkxepyksllsfryka

Parallel processing on Big Data in the context of Machine Learning and Hadoop Ecosystem: A Survey

Anilkumar Vishwanath Brahmane1, R Murugan
2018 International Journal of Engineering & Technology  
Emergent Big Data applications have become gradually more essential.  ...  In this paper an evaluation is done, this studies recent technologies developed for Big Data.  ...  Support SQL, HiveQL and Scala through Spark-SQL. Efficient query execution by Catalyst framework. High level tools to interact with data. Efficient query execution by Catalyst framework.  ... 
doi:10.14419/ijet.v7i2.7.10885 fatcat:goyvvzlwsbeifi62nrldkgp3yy

BigDL: A Distributed Deep Learning Framework for Big Data [article]

Jason Dai, Yiheng Wang, Xin Qiu, Ding Ding, Yao Zhang, Yanzhang Wang, Xianyan Jia, Cherry Zhang, Yan Wan, Zhichao Li, Jiao Wang, Shengsheng Huang, Zhongyuan Wu, Yang Wang (+6 others)
2018 arXiv   pre-print
It is implemented on top of Apache Spark, and allows users to write their deep learning applications as standard Spark programs (running directly on large-scale big data clusters in a distributed fashion  ...  an AllReduce like operation using existing primitives in Spark (e.g., shuffle, broadcast, and in-memory data persistence), it also provides a highly efficient "parameter server" style architecture, so  ...  Consequently, users can efficiently load very large dataset and process the loaded data in a distributed fashion using Spark, and then feed the processed data into the analytics and AI pipeline.  ... 
arXiv:1804.05839v3 fatcat:u5afdn37l5c7lalqxqmlj5se6e

The Marriage of Incremental and Approximate Computing [article]

Dhanya R Krishnan
2016 arXiv   pre-print
We implemented our algorithm in a data analytics system called IncAppox based on Apache Spark Streaming.  ...  Most data analytics systems that require low-latency execution and efficient utilization of computing resources, increasingly adopt two computational paradigms, namely, incremental and approximate computing  ...  We implemented our algorithm in a data analytics system called INCAPPROX based on Apache Spark Streaming.  ... 
arXiv:1611.08573v1 fatcat:nj72wfi3kfer5czembdas24y3q

Cloud-based Fault Detection and Classification for Oil & Gas Industry [article]

Athar Khodabakhsh, Ismail Ari, Mustafa Bakir
2017 arXiv   pre-print
In this paper, we propose a new Lambda architecture for oil & gas industry for unified data and analytical processing on data received from DCS, discuss cloud integration issues and share our experiences  ...  with the implementation of sensor fault-detection and classification modules inside the proposed architecture.  ...  We would like to thank process and software experts Burak Aydogan and Mehmet Aydin from TUPRAS for providing us the DCS data used in this research and also Dr.  ... 
arXiv:1705.04583v1 fatcat:shrn6kbn6fa4rd3aloeuxydzdq

A Brief Review of Big Data Analytics Based on Machine Learning

Ahmed Hussein Ali, Mahmood Zaki Abdullah, Shams N. Abdul-wahab, Mohammad Alsajri
2020 Iraqi Journal for Computer Science and Mathematics  
This review paper presents the researches with a brief display for recently existing works in big data analytics and the effective algorithms of machine learning, furthermore, the issues of resources allocation  ...  Owing to the exponential expansion in the data size, fast and efficient systems of analysis are extremely needed.  ...  Furthermore, several recently existence methods in Big data streaming based on the algorithms of machine learning are also reviewed.  ... 
doi:10.52866/ijcsm.2020.01.02.002 fatcat:z7rb7dcegjakhhn7evth7rylpe

Distributed Streaming Analytics on Large-scale Oceanographic Data using Apache Spark [article]

Janak Dahal, Elias Ioup, Shaikh Arifuzzaman, Mahdi Abdelguerfi
2019 arXiv   pre-print
We also use Google Maps API to visualize results by color coding the world map with values from various analytics.  ...  Apache Spark's streaming library is increasingly becoming a popular choice as it can stream and analyze a significant amount of data.  ...  CONCLUSIONS We use SciSpark successfully with Apache Spark to stream GRIB1 data in a streaming application.  ... 
arXiv:1907.13264v2 fatcat:ubwr334mwre2pesjs6bdxn23ly

Review in Data Stream Mining in Big Data

Padma Priya. R
2020 International Journal for Research in Applied Science and Engineering Technology  
Big Data grows continually with fresh data and are being generated at all times; hence it requires an incremental computation approach which is able to monitor large scale of data dynamically.  ...  Data stream mining in big data is a process in which large streams of real-time data are processed with the sole aim of extracting insights and useful trends out of it.  ...  efficient analytics of heterogeneous data.  ... 
doi:10.22214/ijraset.2020.1075 fatcat:bre434jmpbdlza64npysuzza4a

Multi-Agent Big-Data Lambda Architecture Model for E-Commerce Analytics

Gautam Pal, Gangmin Li, Katie Atkinson
2018 Data  
We propose a Multi-Agent Lambda Architecture (MALA) for e-commerce data analytics.  ...  Challenges of high-velocity data ingestion is resolved with distributed message queues.  ...  Acknowledgments: The applications were freely deployed on the cloud infrastructure provided by Research Institute of Big Data Analytics (RIBDA), Xi'an Jiaotong-Liverpool University, Suzhou, China.  ... 
doi:10.3390/data3040058 fatcat:6hcmxxlxvvdwfpuail7y3nyzju

Incremental Techniques for Large-Scale Dynamic Query Processing

Iman Elghandour, Ahmet Kara, Dan Olteanu, Stijn Vansummeren
2018 Proceedings of the 27th ACM International Conference on Information and Knowledge Management - CIKM '18  
Many applications from various disciplines are now required to analyze fast evolving big data in real time. Various approaches for incremental processing of queries have been proposed over the years.  ...  In this tutorial, we briefly discuss legacy approaches for incremental query processing, and then give an overview of the new challenges introduced due to processing big data streams.  ...  More recent versions of distributed compute frameworks such as Apache Spark [25] , Apache Flink [1] , and Twitter Storm [22] / Heron [15] allow stream-based computations instead of batch-based computations  ... 
doi:10.1145/3269206.3274271 dblp:conf/cikm/Elghandour0OV18 fatcat:rwdtjicsibghzgvdajh5puobjm

Incremental Techniques for Large-Scale Dynamic Query Processing [article]

Iman Elghandour and Ahmet Kara and Dan Olteanu and Stijn Vansummeren
2019 arXiv   pre-print
Many applications from various disciplines are now required to analyze fast evolving big data in real time. Various approaches for incremental processing of queries have been proposed over the years.  ...  In this tutorial, we briefly discuss legacy approaches for incremental query processing, and then give an overview of the new challenges introduced due to processing big data streams.  ...  More recent versions of distributed compute frameworks such as Apache Spark [25] , Apache Flink [1] , and Twitter Storm [22] / Heron [15] allow stream-based computations instead of batch-based computations  ... 
arXiv:1902.00585v1 fatcat:4dtwnxhiqjfqdbb63tartjo4fq

Approximate Stream Analytics in Apache Flink and Apache Spark Streaming [article]

Do Le Quoc, Ruichuan Chen, Pramod Bhatotia, Christof Fetze, Volker Hilt, Thorsten Strufe
2017 arXiv   pre-print
Unfortunately, the state-of-the-art systems for approximate computing primarily target batch analytics, where the input data remains unchanged during the course of sampling.  ...  Our results show that Spark- and Flink-based StreamApprox systems achieve a speedup of 1.15×-3× compared to the respective native Spark Streaming and Flink executions, with varying sampling fraction of  ...  IncApprox [31] is a data analytics system that combines two computing paradigms together, namely, approximate and incremental computations [14] [15] [16] [17] for stream analytics.  ... 
arXiv:1709.02946v1 fatcat:ejcj5eugkjbxpo5bet22dhv7jq
« Previous Showing results 1 — 15 out of 1,471 results