A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
How Do API Selections Affect the Runtime Performance of Data Analytics Tasks?
2019
2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE)
However, little is known on the characteristics and performance attributes of alternative data analytics APIs. ...
We observed that developers sometimes use alternative data analytics APIs to improve program runtime performance while preserving functional equivalence. ...
runtime performance attributes of alternative data analytics APIs. ...
doi:10.1109/ase.2019.00067
dblp:conf/kbse/TaoTLXQ19
fatcat:wxwzujzirfhajgsbadhiwkjqme
DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime
[article]
2021
arXiv
pre-print
code written using the C++ CUDA Graphs API. ...
We leverage the GrCUDA polyglot API to integrate our scheduler with multiple high-level languages and provide a platform for fast prototyping and easy GPU acceleration. ...
We also thank Rene Mueller and Lukas Stadler, the original authors of GrCUDA, for their valuable feedback and opinions. Oracle and Java are registered trademarks of Oracle and/or its affiliates. ...
arXiv:2012.09646v2
fatcat:tzaiunieyfepxiar4na5aqc6gm
Improving spark application throughput via memory aware task co-location
2017
Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference on - Middleware '17
Data analytic applications built upon big data processing frameworks such as Apache Spark are an important class of applications. ...
However, effective task co-location is a non-trivial task, as it requires an understanding of the computing resource requirement of the co-running applications, in order to determine what tasks, and how ...
Because many data analytic tasks do not use 100% of the CPU during execution [2, 24] there is a significant portion of unused processing capacity. ...
doi:10.1145/3135974.3135984
dblp:conf/middleware/MarcoTPW17
fatcat:tub4pau42fh5rczb27if3x42bi
MultiCL: Enabling automatic scheduling for task-parallel workloads in OpenCL
2016
Parallel Computing
For best performance, the user has to find the ideal queuedevice mapping at command queue creation time, an effort that requires a thorough understanding of the underlying device architectures and kernels ...
As an example, we design and implement an OpenCL runtime for task-parallel workloads, called MultiCL, which efficiently schedules command queues across devices. ...
Acknowledgment This work was supported in part by the DOE contract DE-AC02-06CH11357, DOE GTO via grant EE0002758 from Fugro Consultants, VT College of Engineering SCHEV grant, NSF grants CNS-0960081, ...
doi:10.1016/j.parco.2016.05.006
fatcat:2q4ri3l36vgzfocevg6psnlqcq
Self-Adaptive OmpSs Tasks in Heterogeneous Environments
2013
2013 IEEE 27th International Symposium on Parallel and Distributed Processing
for a particular architecture) and how the system can choose between these versions at runtime to obtain the best performance achievable for the given application. ...
OmpSs is a task-based programming model and framework focused on the runtime exploitation of parallelism from annotated sequential applications. ...
Excellence (FP7-ICT 287759), the Intel-BSC Exascale Lab collaboration project, the support of the Spanish Ministry of Education (CSD2007-00050 and FPU program), the projects of Computación de Altas Prestaciones ...
doi:10.1109/ipdps.2013.53
dblp:conf/ipps/PlanasBAL13
fatcat:xbply3nbize2ze3tfgii5jcrcq
Cpp-Taskflow v2: A General-purpose Parallel and Heterogeneous Task Programming System at Scale
[article]
2020
arXiv
pre-print
The Cpp-Taskflow project addresses the long-standing question: How can we make it easier for developers to write parallel and heterogeneous programs with high performance and simultaneous high productivity ...
We have demonstrated promising performance of Cpp-Taskflow on both micro-benchmark and real-world applications. ...
We do not report the data of HPX and OpenMP because they do not have explicit task constructs at the functional level. ...
arXiv:2004.10908v2
fatcat:snwlszx6bnhnflbpmddx5ileyi
Improving Spark Application Throughput Via Memory Aware Task Co-location: A Mixture of Experts Approach
[article]
2017
arXiv
pre-print
However, effective task co-location is a non-trivial task, as it requires an understanding of the computing resource requirement of the co-running applications, in order to determine what tasks, and how ...
Data analytic applications built upon big data processing frameworks such as Apache Spark are an important class of applications. ...
The corresponding author of this paper is Zheng Wang (Email: z.wang@lancaster.ac.uk). ...
arXiv:1710.00610v1
fatcat:c732yhqm5zfdbgylpkll32ycja
Particle-In-Cell Simulation using Asynchronous Tasking
[article]
2021
arXiv
pre-print
Inherently asynchronous, these models provide native support for dynamic load balancing and incorporate data flow concepts to selectively synchronize the tasks. ...
In this paper, we study the parallelization of a production electromagnetic particle-in-cell (EM-PIC) code for kinetic plasma simulations exploring different strategies using asynchronous task-based models ...
By doing that, we assess whether the virtues of the task-based paradigm, especially when complemented with data dependencies, effectively translate to relevant performance gains. ...
arXiv:2106.12485v2
fatcat:uloovw3qqnc7dbuhks2mc4lafm
Using Pilot Systems to Execute Many Task Workloads on Supercomputers
[article]
2018
arXiv
pre-print
RP is capable of spawning more than 100 tasks/second and supports the steady-state execution of up to 16K concurrent tasks. ...
Pilot systems help to satisfy the resource requirements of workloads comprised of multiple tasks. RADICAL-Pilot (RP) is a modular and extensible Python-based pilot system. ...
TTX is a measure of how fast a set of tasks can be executed by the RP Agent. ...
arXiv:1512.08194v4
fatcat:wylszrloqfh35isa6weu2vxmmm
A programming model for Hybrid Workflows: Combining task-based workflows and dataflows all-in-one
2020
Future generations computer systems
This paper tries to reduce the effort of learning, deploying, and integrating several frameworks for the development of e-Science applications that combine simulations with High-Performance Data Analytics ...
We propose a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows from now on) using ...
Acknowledgements This work has been supported by the Spanish Government (contracts SEV2015- ...
doi:10.1016/j.future.2020.07.007
fatcat:24a4z2fl6jgujkxp5vdeu4xo6m
TP-PARSEC: A Task Parallel PARSEC Benchmark Suite
2019
Journal of Information Processing
TP-PARSEC is not only useful for task parallel system developers to analyze their runtime systems with a wide range of workloads from diverse areas, but also enables them to compare performance differences ...
TP-PARSEC is integrated with a task-centric performance analysis and visualization tool which effectively helps users understand the performance, pinpoint performance bottlenecks, and especially analyze ...
financial analytics, physics simulation, and data mining. ...
doi:10.2197/ipsjjip.27.211
fatcat:32qsxkpufvafdkupbjjyzhuoji
Efficient, Dynamic Multi-task Execution on FPGA-based Computing Systems
2021
IEEE Transactions on Parallel and Distributed Systems
This results in suboptimal resource utilisation and relatively poor performance, particularly as the number of tasks increase. ...
Using models with varying resource/throughput profiles, we select the most appropriate distribution based on the runtime, workload needs to enhance temporal compute density. ...
ACKNOWLEDGMENTS The work was supported by the European Commission under European Horizon 2020 Programme, under Grant 6876281 (VINEYARD). ...
doi:10.1109/tpds.2021.3101153
fatcat:cpixjpmwx5dwpdgyhss5mxy6oq
Analysis and Optimization of Task Granularity on the Java Virtual Machine
2019
ACM Transactions on Programming Languages and Systems
Task granularity, i.e., the amount of work performed by parallel tasks, is a key performance attribute of parallel applications. ...
Despite their performance may considerably depend on the granularity of their tasks, this topic has received little attention in the literature. ...
At runtime, a framework selects the version to execute according to the size of the task queues. Cong et al. ...
doi:10.1145/3338497
fatcat:5t6yjwohjfflfa4nmuvek2di4a
Runtime-driven shared last-level cache management for task-parallel programs
2015
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
Based on the input annotations for future tasks, the runtime instructs the hardware to prioritize data blocks with future reuse and evict blocks with no future reuse. ...
We develop a task-based cache partitioning technique that leverages the dependence tracking and look-ahead capabilities of the runtime. ...
However it is possible to let the runtime select such tasks at runtime based on the relative size of the memory footprints of tasks. ...
doi:10.1145/2807591.2807625
dblp:conf/sc/PanP15
fatcat:leh24puhiffeddkcmhnjgjam4i
Worksharing Tasks: An Efficient Way to Exploit Irregular and Fine-Grained Loop Parallelism
2019
2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC)
Hence, on many applications structured parallelism is also exploited using tasks to leverage the full benefits of a pure data-flow execution model. ...
The former relies on the efficient fork-join execution model to exploit structured parallelism; while the latter relies on fine-grained synchronization among tasks and a flexible data-flow execution model ...
ACKNOWLEDGMENT This work is supported by the Spanish Ministerio de Ciencia, Innovación y Universidades (TIN2015-65316-P), by the Generalitat de Catalunya (2014-SGR-1051) and by the European Union's Seventh ...
doi:10.1109/hipc.2019.00053
dblp:conf/hipc/MaronasSMAB19
fatcat:7f3yscuoczerjilk522wc5nw2u
« Previous
Showing results 1 — 15 out of 5,622 results