Filters








2,494 Hits in 5.7 sec

Towards Memory-Optimized Data Shuffling Patterns for Big Data Analytics

Bogdan Nicolae, Carlos Costa, Claudia Misale, Kostas Katrinis, Yoonho Park
2016 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)  
Big data analytics is an indispensable tool in transforming science, engineering, medicine, healthcare, finance and ultimately business itself.  ...  scalability so its efficiency is paramount, while on the other hand it needs to operate with scarce memory in order to leave as much memory available for data caching.  ...  In big data analytics, data shuffling is a key component of large-scale data aggregations. One widely-know example is MapReduce, in which mapper tasks shuffle the data to reducer tasks [2] .  ... 
doi:10.1109/ccgrid.2016.85 dblp:conf/ccgrid/NicolaeCMKP16 fatcat:deohde667raxpixgafx4jtpxy4

An experimental evaluation of garbage collectors on big data applications

Lijie Xu, Tian Guo, Wensheng Dou, Wei Wang, Jun Wei
2019 Proceedings of the VLDB Endowment  
Key Findings. (1) Big data applications' unique memory usage patterns (e.g., long-lived shuffled data and humongous data objects), and computation features (e.g., iterative computation and CPU-intensive  ...  By thoroughly investigating the correlation between these big data applications' memory usage patterns and the collectors' GC patterns, we obtain many findings about GC inefficiencies.  ...  Garbage collection optimization for big data applications.  ... 
doi:10.14778/3303753.3303762 fatcat:q442yge3ybdgjouwu4ngzjs5ma

A Survey on Automatic Parameter Tuning for Big Data Processing Systems

Herodotos Herodotou, Yuxing Chen, Jiaheng Lu
2020 ACM Computing Surveys  
for executing jobs in big data processing systems.  ...  Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression.  ...  of nodes, while taking economic factors (e.g., References [46, 113] ) into account. • Real-time analytics: The latest trend in big data analytics is to develop real-time big data pipelines [19] to  ... 
doi:10.1145/3381027 fatcat:7aglimtuwze25boptuano4ufdy

Comparison of Machine Learning Algorithm on Map Reduction for Performance Improvement in Big Data

Ananthi Sheshasayee, J. V. N. Lakshmi
2015 Indian Journal of Science and Technology  
Improvements: A method for optimizing job assignment on machine learning is implemented in order to minimize the total execution time.  ...  The attributes of the system are evaluated for improving time efficiency. The objective is to provide ad hoc performance for MapReduce programs which run on large data sets.  ...  Machine learning with big data will duplicate this behavior, at massive scales. Big Data needs big compute for which Hadoop is a solution.  ... 
doi:10.17485/ijst/2015/v8i1/84650 fatcat:rxqloc37jnd3rgjvkri47uqtcy

Pythia: Faster Big Data in Motion through Predictive Software-Defined Network Optimization at Runtime

Marcelo Veiga Neves, Cesar A.F. De Rose, Kostas Katrinis, Hubertus Franke
2014 2014 IEEE 28th International Parallel and Distributed Processing Symposium  
The MapReduce framework, as implemented in Hadoop, is one of the most popular frameworks for Big Data analysis.  ...  The rise of Internet of Things sensors, social networking and mobile devices has led to an explosion of available data. Gaining insights into this data has led to the area of Big Data analytics.  ...  manipulating flow rates) to big data movement patterns (shuffle, broadcast).  ... 
doi:10.1109/ipdps.2014.20 dblp:conf/ipps/NevesRKF14 fatcat:7rzrwcj5izbk3mieyha3ochamy

A Comprehensive Study on The Usage of Big Data Analytics for Wireless and Wired Networks

Pushpa Mannava
2018 International Journal of Scientific Research in Science and Technology  
This paper provides a comprehensive study on the usage of big data analytics for wireless and wired networks  ...  This happens due to the gigantic amount of data being refined either in set or live applications.  ...  use of still time: Big data analytics can be made use of by operators to help them run their very own data and uncover patterns that would certainly help with solution and network optimization.  ... 
doi:10.32628/ijsrst207256 fatcat:aymaqc22y5f5vjfxd56qv6ev4u

Spark Versus Flink: Understanding Performance in Big Data Analytics Frameworks

Ovidiu-Cristian Marcu, Alexandru Costan, Gabriel Antoniu, Maria S. Perez-Hernandez
2016 2016 IEEE International Conference on Cluster Computing (CLUSTER)  
Big Data analytics has recently gained increasing popularity as a tool to process large amounts of data on-demand.  ...  Our key finding is that there none of the two framework outperforms the other for all data types, sizes and job patterns.  ...  To address these limitations, a second generation of analytics platforms emerged in an attempt to unify the landscape of Big Data processing.  ... 
doi:10.1109/cluster.2016.22 dblp:conf/cluster/MarcuCAP16 fatcat:e6e6aftulved5c3qlj6634b6me

Shark: SQL and Rich Analytics at Scale [article]

Reynold Xin, Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, Ion Stoica
2012 arXiv   pre-print
Shark is a new data analysis system that marries query processing with complex analytics on large clusters.  ...  The result is a system that matches the speedups reported for MPP analytic databases over MapReduce, while offering fault tolerance properties and complex analytics capabilities that they lack.  ...  Join Optimization Partial DAG execution can be used to perform several run-time optimizations for join queries. Figure 4 illustrates two communication patterns for MapReducestyle joins.  ... 
arXiv:1211.6176v1 fatcat:cdpyu3sp3bd7rcdzaaci4juayi

Shark

Reynold S. Xin, Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, Ion Stoica
2013 Proceedings of the 2013 international conference on Management of data - SIGMOD '13  
Shark is a new data analysis system that marries query processing with complex analytics on large clusters.  ...  The result is a system that matches the speedups reported for MPP analytic databases over MapReduce, while offering fault tolerance properties and complex analytics capabilities that they lack.  ...  Join Optimization Partial DAG execution can be used to perform several run-time optimizations for join queries. Figure 4 illustrates two communication patterns for MapReducestyle joins.  ... 
doi:10.1145/2463676.2465288 dblp:conf/sigmod/XinRZFSS13 fatcat:qs4bvu7habd77g42mtm3m5sgoy

A Brief Review on scheduling algorithms of MapReduce Optimization Techniques

R Lavanya, Jeevanshu Malhotra, Rajeshwari Swaminathan
2019 Journal of Physics, Conference Series  
Scheduling algorithms of MapReduce model using hadoop vary with design and behaviour, and are used for handling many issues like data locality, awareness with resource, energy and time.  ...  The main objective is to study MapReduce framework, MapReduce model, scheduling in hadoop, various scheduling algorithms and various optimization techniques in job scheduling.  ...  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial Due to large scale technological advancements and exponential growth in terms of data generated a need for data acquisition and management  ... 
doi:10.1088/1742-6596/1362/1/012001 fatcat:ikw7wxziybbm7e2eph2xkgtif4

Big data analytics for wireless and wired network design: A survey

Mohammed S. Hadi, Ahmed Q. Lawey, Taisir E.H. El-Gorashi, Jaafar M.H. Elmirghani
2018 Computer Networks  
Third, there is a detailed review of the current academic and industrial efforts toward network design using big data analytics.  ...  To the best of our knowledge, this is the first survey that addresses the use of big data analytics techniques for the design of a broad range of networks.  ...  Industrial effort s toward optimizing networks based on big data analytics reflect the increasing trend toward employing AI-like approaches, such as pattern recognition and machine learning for network  ... 
doi:10.1016/j.comnet.2018.01.016 fatcat:xqjwzzeww5c3bhyv3yrpuhsgye

A Survey on Spark Ecosystem for Big Data Processing [article]

Shanjiang Tang, Bingsheng He, Ce Yu, Yusen Li, Kun Li
2018 arXiv   pre-print
Finally, we make a discussion on the open issues and challenges for large-scale in-memory data processing with Spark.  ...  With the explosive increase of big data in industry and academic fields, it is necessary to apply large-scale data processing systems to analysis Big Data.  ...  Nowadays, more and more Big Data analytics frameworks are moving towards larger degrees of parallelism and shorter task durations in order to provide low latency.  ... 
arXiv:1811.08834v1 fatcat:6fxvg6me7rayzm4suoabyg7fii

A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures [article]

Shantenu Jha, Judy Qiu, Andre Luckow, Pradeep Mantha, Geoffrey C.Fox
2014 arXiv   pre-print
We discuss the concept of "Big Data Ogres" and their facets as means of understanding and characterizing the most common application workloads found across the two paradigms.  ...  We analyze the ecosystems of the two prominent paradigms for data-intensive applications, hereafter referred to as the high-performance computing and the Apache-Hadoop paradigm.  ...  Also, we capture the richness of Big Data by including not just different parallel structures but also important overall patterns.  ... 
arXiv:1403.1528v2 fatcat:dnyrpncqfneofaxyuvq3tzffz4

Construing the big data based on taxonomy, analytics and approaches

Ajeet Ram Pathak, Manjusha Pandey, Siddharth Rautaray
2018 Iran Journal of Computer Science  
Big data have become an important asset due to its immense power hidden in analytics.  ...  Every organization is inundated with colossal amount of data generated with high speed, requiring high-performance resources for storage and processing, special skills and technologies to get value out  ...  With the release of new data platforms for Data Science especially open-source frameworks, the current trends in big data analytics are moving towards hybrid data management, data visualization and hybrid  ... 
doi:10.1007/s42044-018-0024-3 fatcat:teiovluolngepjyebzz2wnwjxu

Memory-Efficient and Skew-Tolerant MapReduce over MPI for Supercomputing Systems

Tao Gao, Yanfei Guo, Boyu Zhang, Pietro Cicotti, Yutong Lu, Pavan Balaji, Michela Taufer
2020 IEEE Transactions on Parallel and Distributed Systems  
Our enhancements to Mimir include combiner and dynamic repartition optimizations to minimize and balance memory usage and to achieve close to optimal balance of the memory usage across processes and to  ...  the data cannot be held in the memory.  ...  XSEDE resources, supported by US National Science Foundation grant ACI-1053575, were used to obtain some other performance data.  ... 
doi:10.1109/tpds.2019.2932066 fatcat:4pca2hxbgvfabgkcinhmls5l3m
« Previous Showing results 1 — 15 out of 2,494 results