5,622 Hits in 9.5 sec

How Do API Selections Affect the Runtime Performance of Data Analytics Tasks?

Yida Tao, Shan Tang, Yepang Liu, Zhiwu Xu, Shengchao Qin
2019 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE)  
However, little is known on the characteristics and performance attributes of alternative data analytics APIs.  ...  We observed that developers sometimes use alternative data analytics APIs to improve program runtime performance while preserving functional equivalence.  ...  runtime performance attributes of alternative data analytics APIs.  ... 
doi:10.1109/ase.2019.00067 dblp:conf/kbse/TaoTLXQ19 fatcat:wxwzujzirfhajgsbadhiwkjqme

DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime [article]

Alberto Parravicini, Arnaud Delamare, Marco Arnaboldi, Marco D. Santambrogio
2021 arXiv   pre-print
code written using the C++ CUDA Graphs API.  ...  We leverage the GrCUDA polyglot API to integrate our scheduler with multiple high-level languages and provide a platform for fast prototyping and easy GPU acceleration.  ...  We also thank Rene Mueller and Lukas Stadler, the original authors of GrCUDA, for their valuable feedback and opinions. Oracle and Java are registered trademarks of Oracle and/or its affiliates.  ... 
arXiv:2012.09646v2 fatcat:tzaiunieyfepxiar4na5aqc6gm

Improving spark application throughput via memory aware task co-location

Vicent Sanz Marco, Ben Taylor, Barry Porter, Zheng Wang
2017 Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference on - Middleware '17  
Data analytic applications built upon big data processing frameworks such as Apache Spark are an important class of applications.  ...  However, effective task co-location is a non-trivial task, as it requires an understanding of the computing resource requirement of the co-running applications, in order to determine what tasks, and how  ...  Because many data analytic tasks do not use 100% of the CPU during execution [2, 24] there is a significant portion of unused processing capacity.  ... 
doi:10.1145/3135974.3135984 dblp:conf/middleware/MarcoTPW17 fatcat:tub4pau42fh5rczb27if3x42bi

MultiCL: Enabling automatic scheduling for task-parallel workloads in OpenCL

Ashwin M. Aji, Antonio J. Peña, Pavan Balaji, Wu-chun Feng
2016 Parallel Computing  
For best performance, the user has to find the ideal queuedevice mapping at command queue creation time, an effort that requires a thorough understanding of the underlying device architectures and kernels  ...  As an example, we design and implement an OpenCL runtime for task-parallel workloads, called MultiCL, which efficiently schedules command queues across devices.  ...  Acknowledgment This work was supported in part by the DOE contract DE-AC02-06CH11357, DOE GTO via grant EE0002758 from Fugro Consultants, VT College of Engineering SCHEV grant, NSF grants CNS-0960081,  ... 
doi:10.1016/j.parco.2016.05.006 fatcat:2q4ri3l36vgzfocevg6psnlqcq

Self-Adaptive OmpSs Tasks in Heterogeneous Environments

Judit Planas, Rosa M. Badia, Eduard Ayguade, Jesus Labarta
2013 2013 IEEE 27th International Symposium on Parallel and Distributed Processing  
for a particular architecture) and how the system can choose between these versions at runtime to obtain the best performance achievable for the given application.  ...  OmpSs is a task-based programming model and framework focused on the runtime exploitation of parallelism from annotated sequential applications.  ...  Excellence (FP7-ICT 287759), the Intel-BSC Exascale Lab collaboration project, the support of the Spanish Ministry of Education (CSD2007-00050 and FPU program), the projects of Computación de Altas Prestaciones  ... 
doi:10.1109/ipdps.2013.53 dblp:conf/ipps/PlanasBAL13 fatcat:xbply3nbize2ze3tfgii5jcrcq

Cpp-Taskflow v2: A General-purpose Parallel and Heterogeneous Task Programming System at Scale [article]

Tsung-Wei Huang, Dian-Lun Lin, Yibo Lin, Chun-Xun Lin
2020 arXiv   pre-print
The Cpp-Taskflow project addresses the long-standing question: How can we make it easier for developers to write parallel and heterogeneous programs with high performance and simultaneous high productivity  ...  We have demonstrated promising performance of Cpp-Taskflow on both micro-benchmark and real-world applications.  ...  We do not report the data of HPX and OpenMP because they do not have explicit task constructs at the functional level.  ... 
arXiv:2004.10908v2 fatcat:snwlszx6bnhnflbpmddx5ileyi

Improving Spark Application Throughput Via Memory Aware Task Co-location: A Mixture of Experts Approach [article]

Vicent Sanz Marco, Ben Taylor, Barry Porter, Zheng Wang
2017 arXiv   pre-print
However, effective task co-location is a non-trivial task, as it requires an understanding of the computing resource requirement of the co-running applications, in order to determine what tasks, and how  ...  Data analytic applications built upon big data processing frameworks such as Apache Spark are an important class of applications.  ...  The corresponding author of this paper is Zheng Wang (Email:  ... 
arXiv:1710.00610v1 fatcat:c732yhqm5zfdbgylpkll32ycja

Particle-In-Cell Simulation using Asynchronous Tasking [article]

Nicolas Guidotti, Pedro Ceyrat, João Barreto, José Monteiro, Rodrigo Rodrigues, Ricardo Fonseca, Xavier Martorell, Antonio J. Peña
2021 arXiv   pre-print
Inherently asynchronous, these models provide native support for dynamic load balancing and incorporate data flow concepts to selectively synchronize the tasks.  ...  In this paper, we study the parallelization of a production electromagnetic particle-in-cell (EM-PIC) code for kinetic plasma simulations exploring different strategies using asynchronous task-based models  ...  By doing that, we assess whether the virtues of the task-based paradigm, especially when complemented with data dependencies, effectively translate to relevant performance gains.  ... 
arXiv:2106.12485v2 fatcat:uloovw3qqnc7dbuhks2mc4lafm

Using Pilot Systems to Execute Many Task Workloads on Supercomputers [article]

Andre Merzky and Matteo Turilli and Manuel Maldonado and Mark Santcroos and Shantenu Jha
2018 arXiv   pre-print
RP is capable of spawning more than 100 tasks/second and supports the steady-state execution of up to 16K concurrent tasks.  ...  Pilot systems help to satisfy the resource requirements of workloads comprised of multiple tasks. RADICAL-Pilot (RP) is a modular and extensible Python-based pilot system.  ...  TTX is a measure of how fast a set of tasks can be executed by the RP Agent.  ... 
arXiv:1512.08194v4 fatcat:wylszrloqfh35isa6weu2vxmmm

A programming model for Hybrid Workflows: Combining task-based workflows and dataflows all-in-one

Cristian Ramon-Cortes, Francesc Lordan, Jorge Ejarque, Rosa M. Badia
2020 Future generations computer systems  
This paper tries to reduce the effort of learning, deploying, and integrating several frameworks for the development of e-Science applications that combine simulations with High-Performance Data Analytics  ...  We propose a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows from now on) using  ...  Acknowledgements This work has been supported by the Spanish Government (contracts SEV2015-  ... 
doi:10.1016/j.future.2020.07.007 fatcat:24a4z2fl6jgujkxp5vdeu4xo6m

TP-PARSEC: A Task Parallel PARSEC Benchmark Suite

An Huynh, Christian Helm, Shintaro Iwasaki, Wataru Endo, Byambajav Namsraijav, Kenjiro Taura
2019 Journal of Information Processing  
TP-PARSEC is not only useful for task parallel system developers to analyze their runtime systems with a wide range of workloads from diverse areas, but also enables them to compare performance differences  ...  TP-PARSEC is integrated with a task-centric performance analysis and visualization tool which effectively helps users understand the performance, pinpoint performance bottlenecks, and especially analyze  ...  financial analytics, physics simulation, and data mining.  ... 
doi:10.2197/ipsjjip.27.211 fatcat:32qsxkpufvafdkupbjjyzhuoji

Efficient, Dynamic Multi-task Execution on FPGA-based Computing Systems

Umar Minhas, Roger Woods, Dimitrios S. Nikolopoulos, Georgios Karakonstantis
2021 IEEE Transactions on Parallel and Distributed Systems  
This results in suboptimal resource utilisation and relatively poor performance, particularly as the number of tasks increase.  ...  Using models with varying resource/throughput profiles, we select the most appropriate distribution based on the runtime, workload needs to enhance temporal compute density.  ...  ACKNOWLEDGMENTS The work was supported by the European Commission under European Horizon 2020 Programme, under Grant 6876281 (VINEYARD).  ... 
doi:10.1109/tpds.2021.3101153 fatcat:cpixjpmwx5dwpdgyhss5mxy6oq

Analysis and Optimization of Task Granularity on the Java Virtual Machine

Andrea Rosà, Eduardo Rosales, Walter Binder
2019 ACM Transactions on Programming Languages and Systems  
Task granularity, i.e., the amount of work performed by parallel tasks, is a key performance attribute of parallel applications.  ...  Despite their performance may considerably depend on the granularity of their tasks, this topic has received little attention in the literature.  ...  At runtime, a framework selects the version to execute according to the size of the task queues. Cong et al.  ... 
doi:10.1145/3338497 fatcat:5t6yjwohjfflfa4nmuvek2di4a

Runtime-driven shared last-level cache management for task-parallel programs

Abhisek Pan, Vijay S. Pai
2015 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15  
Based on the input annotations for future tasks, the runtime instructs the hardware to prioritize data blocks with future reuse and evict blocks with no future reuse.  ...  We develop a task-based cache partitioning technique that leverages the dependence tracking and look-ahead capabilities of the runtime.  ...  However it is possible to let the runtime select such tasks at runtime based on the relative size of the memory footprints of tasks.  ... 
doi:10.1145/2807591.2807625 dblp:conf/sc/PanP15 fatcat:leh24puhiffeddkcmhnjgjam4i

Worksharing Tasks: An Efficient Way to Exploit Irregular and Fine-Grained Loop Parallelism

Marcos Maronas, Kevin Sala, Sergi Mateo, Eduard Ayguade, Vicenc Beltran
2019 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC)  
Hence, on many applications structured parallelism is also exploited using tasks to leverage the full benefits of a pure data-flow execution model.  ...  The former relies on the efficient fork-join execution model to exploit structured parallelism; while the latter relies on fine-grained synchronization among tasks and a flexible data-flow execution model  ...  ACKNOWLEDGMENT This work is supported by the Spanish Ministerio de Ciencia, Innovación y Universidades (TIN2015-65316-P), by the Generalitat de Catalunya (2014-SGR-1051) and by the European Union's Seventh  ... 
doi:10.1109/hipc.2019.00053 dblp:conf/hipc/MaronasSMAB19 fatcat:7f3yscuoczerjilk522wc5nw2u
« Previous Showing results 1 — 15 out of 5,622 results