7,850 Hits in 3.9 sec

Performance Improvement of DAG-Aware Task Scheduling Algorithms with Efficient Cache Management in Spark

Yao Zhao, Jian Dong, Hongwei Liu, Jin Wu, Yanxin Liu
2021 Electronics  
Cache management policies that are designed for Spark exhibit poor performance in DAG-aware task-scheduling algorithms, which leads to cache misses and performance degradation.  ...  Moreover, we present a cache-aware task scheduling algorithm based on LSF to reduce the resource fragmentation in computing.  ...  Algorithm Scheduling Order (Represented by Stage ID) Execution Time Cache-Oblivious Task Scheduling DAG-aware scheduling has been studied extensively for both homogeneous and heterogeneous systems.  ... 
doi:10.3390/electronics10161874 fatcat:6flphen445b4pb4uvfpoqqqjoi

Survey of Apache spark optimized job scheduling in big data

Walaa Khalil, Hanaa Torkey, Gamal Attiya
2020 International Journal of Industry and Sustainable Development  
The main goal in this research is to represent a comprehensive survey on job scheduling modes using in spark, the types of different scheduler, and existing algorithms with advantages and issues.  ...  Spark is an analytic machine for big data processing with various modules for SQL, streaming, graph processing and machine learning.  ...  Delay scheduler In this scheduler, when the data is not ready, a task tracker stays for a specific time [7] .  ... 
doi:10.21608/ijisd.2020.73486 fatcat:q6wpxyvujrcqvongqykiunroo4

A survey on bandwidth-aware geo-distributed frameworks for big-data analytics

Mohammed Bergui, Said Najah, Nikola S. Nikolov
2021 Journal of Big Data  
In this article, we discuss challenges and survey the latest geo-distributed big-data analytics frameworks and schedulers (based on MapReduce and Spark) with WAN-bandwidth awareness.  ...  While cluster computing applications, such as MapReduce and Spark, have been widely deployed in data centres to support commercial applications and scientific research, they are not designed for running  ...  Acknowledgements The authors thank the anonymous reviewers for their helpful suggestions and comments.  ... 
doi:10.1186/s40537-021-00427-9 fatcat:u2jx7x6hkfc47kn2iqpkcquhi4

Spark on entropy: A reliable & efficient scheduler for low-latency parallel jobs in heterogeneous cloud

Huankai Chen, Frank Z Wang
2015 2015 IEEE 40th Local Computer Networks Conference Workshops (LCN Workshops)  
for on-line Spark analysis jobs.  ...  In this paper we propose an entropy-based scheduling strategy for running the on-line parallel analysis as a service more reliable and efficient, and implement the proposed idea in Spark.  ...  SCHEDULING CHALLENGE IN SPARK ANALYSIS AS A SERVICE This section provides a brief overview of Spark and discusses the current scheduling challenge for deploying Spark analysis as a service. A.  ... 
doi:10.1109/lcnw.2015.7365918 dblp:conf/lcn/ChenW15 fatcat:skunklgd7ngstn7qyhxlibxy4y

Hybrid Cloud Workflow Scheduling Method With Privacy Data

Wei Hu, Xiaoping Li, Xue Li
2020 IEEE Access  
Privacy protection is an important problem in workflow scheduling, for which scheduling tasks with privacy constraints to reliable resources is essential.  ...  In this article, we consider the scheduling problem of Spark applications in a hybrid cloud with deadline and privacy constraints.  ...  with heterogeneous resources in a hybrid cloud. • A scheduling algorithm framework is proposed for the considered Spark application. • A Spark scheduling algorithm is presented to determine the number  ... 
doi:10.1109/access.2020.3037921 fatcat:xowcpzgehvbwhbvryuwt3mhwci

Mitigating Bottlenecks in Wide Area Data Analytics via Machine Learning

Hao Wang, Baochun Li
2018 IEEE Transactions on Network Science and Engineering  
Without data-driven insights into performance bottlenecks at runtime, schedulers might blindly assign tasks to workers that are suffering from unidentified bottlenecks.  ...  Lube monitors geodistributed data analytic queries in real-time, detects potential bottlenecks, and mitigates them with a bottleneckaware scheduling policy.  ...  We would like to thank the HotCloud anonymous reviewers for their valuable comments.  ... 
doi:10.1109/tnse.2018.2816951 fatcat:atngujfnt5cyxhrabf2l5jba2e

A Survey on Job Scheduling in Big Data

M. Senthilkumar, P. Ilango
2016 Cybernetics and Information Technologies  
Various scheduling algorithms of the MapReduce model using Hadoop vary with design and behavior, and are used for handling many issues like data locality, awareness with resource, energy and time.  ...  Big Data Applications with Scheduling becomes an active research area in last three years. The Hadoop framework becomes very popular and most used frameworks in a distributed data processing.  ...  But the Spark default task scheduling plan does not take the different capacity of node into account for heterogeneous Spark cluster, thus leading to lower the system performance.  ... 
doi:10.1515/cait-2016-0033 fatcat:psrtc3l3dzgkfklqjl6hr3qczi

Online Scheduling of Spark Workloads with Mesos using Different Fair Allocation Algorithms [article]

Yuquan Shan, Aman Jain, George Kesidis, Bhuvan Urgaonkar, Jalal Khamse-Ashari, Ioannis Lambadaris
2018 arXiv   pre-print
We first give typical results of a illustrative numerical study and then give typical results of a study involving Spark workloads on Mesos which we have modified and open-sourced to prototype different  ...  schedulers.  ...  may not be aware. multiple tasks (threads).  ... 
arXiv:1803.00922v3 fatcat:vurmyciglbf3ffu7d7gvewjudq

A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures [article]

Shantenu Jha, Judy Qiu, Andre Luckow, Pradeep Mantha, Geoffrey C.Fox
2014 arXiv   pre-print
Scientific problems that depend on processing large amounts of data require overcoming challenges in multiple areas: managing large-scale data distribution, co-placement and scheduling of data with compute  ...  Our comparison progresses from a fully qualitative examination of the two paradigms, to a semi-quantitative methodology.  ...  In contrast to HPC schedulers, a first-order design objective of YARN is the support for heterogeneous workloads using multi-level, data-aware scheduling.  ... 
arXiv:1403.1528v2 fatcat:dnyrpncqfneofaxyuvq3tzffz4

Pilot-Abstraction: A Valid Abstraction for Data-Intensive Applications on HPC, Hadoop and Cloud Infrastructures? [article]

Andre Luckow, Pradeep Mantha, Shantenu Jha
2015 arXiv   pre-print
As memory naturally fits in with the Pilot concept of retaining resources for a set of tasks, we propose the extension of the Pilot-Abstraction to in-memory resources.  ...  Further, there is a lack of abstractions that unify access to increasingly heterogeneous infrastructure (HPC, Hadoop, clouds) and allow reasoning about performance trade-offs in this complex environment  ...  Falcon [23] provides a data-aware scheduler on top of a pool of dynamically acquired compute and data resources [24] .  ... 
arXiv:1501.05041v1 fatcat:eiu3inxk7bblrcoh7orjkrimjq

Heterogeneous MacroTasking (HeMT) for Parallel Processing in the Public Cloud [article]

Yuquan Shan, George Kesidis, Bhuvan Urgaonkar, Jorg Schad, Jalal Khamse-Ashari, Ioannis Lambadaris
2018 arXiv   pre-print
As representative results, Spark with HeMT offers about 10% better average completion times for realistic data processing workloads over the default system.  ...  As a result, HomT is deemed especially desirable in settings with heterogeneous (and possibly possessing dynamically changing) processing capacities.  ...  Acknowledgements: This research was supported in part by NSF CNS 1717571 grant and a Cisco Systems URP gift.  ... 
arXiv:1810.00988v1 fatcat:e6cctyx6r5bj5h7o4uu3fhvvwm

Complexity Reduction: Local Activity Ranking by Resource Entropy for QoS-Aware Cloud Scheduling

Huankai Chen, Frank Wang, Matteo Migliavacca, Leon O. Chua, Na Helian
2016 2016 IEEE International Conference on Services Computing (SCC)  
Finally, we propose a new approach to controlling the chaos based on resource's Local Activity Ranking for QoS-aware cloud scheduling and implement such idea in Spark.  ...  Cloud resource is an example of a locally-active device, which is the origin of complexity in cloud scheduling system.  ...  Experiments show that our proposed Entropy Scheduler outperform the native Spark Fair Scheduler for better QoS satisfaction. Research on Complexity has just emerged in the area of cloud scheduling.  ... 
doi:10.1109/scc.2016.82 dblp:conf/IEEEscc/ChenWMCH16 fatcat:b7tkejthmbafbog5ib6yncsygy

Real-Time Machine Learning: The Missing Pieces [article]

Robert Nishihara, Philipp Moritz, Stephanie Wang, Alexey Tumanov, William Paul, Johann Schleier-Smith, Richard Liaw, Mehrdad Niknami, Michael I. Jordan, Ion Stoica
2017 arXiv   pre-print
with millisecond latency at high throughput, adaptive construction of arbitrary task graphs, and execution of heterogeneous kernels over diverse sets of resources.  ...  over a state-of-the-art execution framework for a representative application.  ...  Acknowledgments We would like to thank Richard Shin for substantial contributions to the development of our prototype.  ... 
arXiv:1703.03924v2 fatcat:4vojlygby5gnpbmziwl63gpb2u

Task Scheduling in Big Data Platforms: A Systematic Literature Review

Mbarka Soualhia, Foutse Khomh, Sofiène Tahar
2017 Journal of Systems and Software  
In this paper, we conduct a SLR of task scheduling algorithms that have been proposed for big data platforms.  ...  An analysis of the scheduling models proposed for Hadoop, Spark, Storm, and Mesos • A research taxonomy for succinct classi cation of the proposed scheduling models • A discussion of some future challenges  ...  research that can be included in a roadmap for research on task and jobs scheduling in Hadoop, Spark, Storm and Mesos frameworks.  ... 
doi:10.1016/j.jss.2017.09.001 fatcat:hr3w4v3uhzekhe56id4j3ic3fq

The HdpH DSLs for scalable reliable computation

Patrick Maier, Robert Stewart, Phil Trinder
2014 Proceedings of the 2014 ACM SIGPLAN symposium on Haskell - Haskell '14  
We report on HdpH and HdpH-RS, a pair of Haskell DSLs designed to address these challenges for irregular task-parallel computations on large distributed-memory architectures.  ...  We present operational semantics for both DSLs and investigate conditions for semantic equivalence of HdpH and HdpH-RS programs, that is, conditions under which topology awareness can be transparently  ...  The authors thank Lilia Georgieva, Sam Lindley, Daria Livesey, Greg Michaelson, Jeremy Singer and the anonymous referees for helpful feedback.  ... 
doi:10.1145/2633357.2633363 dblp:conf/haskell/MaierST14 fatcat:nzlribvpbre6jle6vililgxa4i
« Previous Showing results 1 — 15 out of 7,850 results