A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Python programmers have GPUs too: automatic Python loop parallelization with staged dependence analysis
2019
Proceedings of the 15th ACM SIGPLAN International Symposium on Dynamic Languages - DLS 2019
We show that staging the dependence analysis is an effective way to maximize performance. ...
The parallel loop nest code is then converted to CUDA kernels for GPU execution. ...
Section 3 demonstrates how the dynamic nature of Python enables staging dependence analysis, with some happening ahead-of-time and the remainder happening just-intime. ...
doi:10.1145/3359619.3359743
dblp:conf/dls/JacobTS19
fatcat:567e4c2txzfafeeygqlwnztvkq
Pricing Python parallelism: a dynamic language cost model for heterogeneous platforms
2020
Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages
The ALPyNA framework analyses moderately complex Python loop nests and automatically JIT compiles code for heterogeneous CPU and GPU architectures. ...
Execution times may be reduced by offloading parallel loop nests to a GPU. ...
Rather than require that developers have parallel programming expertise, our approach is to automatically parallelize loop nests in vanilla Python on GPUs. ...
doi:10.1145/3426422.3426979
fatcat:ex2h76pov5dgtiysm7ap4rn5de
Automatic Parallelization of Python Programs for Distributed Heterogeneous Computing
[article]
2022
arXiv
pre-print
This paper introduces a novel approach to automatic ahead-of-time (AOT) parallelization and optimization of sequential Python programs for execution on distributed heterogeneous platforms. ...
It includes extensions to the polyhedral framework that unify user-written loops and implicit loops present in matrix/tensor operators, as well as automated section of CPU vs. GPU code variants. ...
Shirako, Hayashi, Paul, Tumanov, Sarkar hybrid Python/C++ code generation, fine-grained NumPy-to-CuPy conversion, and profile-based CPU/GPU runtime selection. ...
arXiv:2203.06233v1
fatcat:4e7sa6j3szgfri5pajrgccuvuu
Accelerating Dynamically-Typed Languages on Heterogeneous Platforms Using Guards Optimization
2018
European Conference on Object-Oriented Programming
We have implemented MegaGuards along with an automatic loop parallelization backend in ZipPy, a Python Virtual Machine. ...
and available accelerator hardware without having to rely on programmer annotations. ...
Some techniques automatically parallelize sequential loops and run them on GPUs [35, 3] . New languages such as Lime [18, 2] implicitly perform parallel computations on GPUs. ...
doi:10.4230/lipics.ecoop.2018.16
dblp:conf/ecoop/QunaibitBNVF18
fatcat:spnmzayejnafzhtmn7uphnh2yq
TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning
2019
arXiv
pre-print
TensorFlow Eager is a multi-stage, Python-embedded domain-specific language for hardware-accelerated machine learning, suitable for both interactive research and production. ...
TensorFlow Eager thus offers a multi-stage programming model that makes it easy to interpolate between imperative and staged execution in a single package. ...
François Chollet was very helpful in integrating TF Eager with Keras. ...
arXiv:1903.01855v1
fatcat:hd5ha3pbi5e2vi3gk6c2arjjxq
GPUMap
2017
Proceedings of the 7th Workshop on Python for High-Performance and Scientific Computing - PyHPC'17
Just as individual systems with GPU-computing capability have become more available, so too have high-performance distributed systems. ...
Spark-Ucores does not provide an abstraction for its GPU components, so programmers must have at least some GPU experience. ...
doi:10.1145/3149869.3149875
dblp:conf/sc/PachevL17
fatcat:ufouqszxvzacdmattsj5kl5bdu
PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation
2012
Parallel Computing
High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). ...
This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. ...
Acknowledgments We would like to thank Ian Cullinan, Tomasz Rybak, Chris Heuser, Romain Brette, and Dan Goodman who have graciously agreed to let us showcase their research in Section 6 of this article ...
doi:10.1016/j.parco.2011.09.001
fatcat:o7iwvib6mvawdjbb4kn6xarwce
GPU Computing with Python: Performance, Energy Efficiency and Usability
2020
Computation
In this work, we examine the performance, energy efficiency, and usability when using Python for developing high-performance computing codes running on the graphics processing unit (GPU). ...
mid-range, and high-end GPUs. ...
However, the GPU programs in Numba are written as Python functions, and the programmer has to rely on Numba for efficient parallelization of the code. ...
doi:10.3390/computation8010004
fatcat:3mb46xwegfeclh4qiuuukware4
A domain specific language for performance portable molecular dynamics algorithms
2018
Computer Physics Communications
GPUs. ...
Inspired by this approach, we develop a Python code generation system for molecular dynamics simulations on different parallel architectures, including massively parallel distributed memory systems and ...
This implies that optimised and parallel code is automatically generated for this important stage of the simulation workflow. ...
doi:10.1016/j.cpc.2017.11.006
fatcat:65s7gmeloze7dkqrvygj2r633y
Vispark: GPU-accelerated distributed visual computing using spark
2015
2015 IEEE 5th Symposium on Large Data Analysis and Visualization (LDAV)
Without the knowledge of GPU-specific APIs such as NVIDA CUDA, and OpenCL, the user can write a Python-like mapper code using Vispark language, and the Vispark translator and runtime system will automatically ...
In this programming model, the task is decomposed into two user-programmable stages-the map stage processes the input data and generates key-value pairs, and the reduce stage processes a group of values ...
Note how Vispark's Python-like syntax is translated into C-like CUDA syntax; for example, Python-style for loop for orthogonal iterator in Code 2 is converted to C-style for loop in Code 3. ...
doi:10.1109/ldav.2015.7348080
dblp:conf/ldav/ChoiJ15
fatcat:6ediplzdrzgjto7knic3aswdvq
Vispark: GPU-Accelerated Distributed Visual Computing Using Spark
2016
SIAM Journal on Scientific Computing
Without the knowledge of GPU-specific APIs such as NVIDA CUDA, and OpenCL, the user can write a Python-like mapper code using Vispark language, and the Vispark translator and runtime system will automatically ...
In this programming model, the task is decomposed into two user-programmable stages-the map stage processes the input data and generates key-value pairs, and the reduce stage processes a group of values ...
Note how Vispark's Python-like syntax is translated into C-like CUDA syntax; for example, Python-style for loop for orthogonal iterator in Code 2 is converted to C-style for loop in Code 3. ...
doi:10.1137/15m1026407
fatcat:fslos4kufna25eyddqdnwuebem
Parsl: Pervasive Parallel Programming in Python
[article]
2019
arXiv
pre-print
Here, we present Parsl, a parallel scripting library that augments Python with simple, scalable, and flexible constructs for encoding parallelism. ...
This migration towards orchestration rather than implementation, coupled with the growing need for parallel computing (e.g., due to big data and the end of Moore's law), necessitates rethinking how parallelism ...
The first and third stages have the widest parallelism, with 20 tasks, while the second and fourth stage are reduce-like stages with a single task each. ...
arXiv:1905.02158v1
fatcat:okcga7i4vza6zmx5lyj63seone
GPU Computing with Python: Performance, Energy Efficiency and Usability
[article]
2019
arXiv
pre-print
In this work, we examine the performance, energy efficiency and usability when using Python for developing HPC codes running on the GPU. ...
We investigate the portability of performance and energy efficiency between CUDA and OpenCL; between GPU generations; and between low-end, mid-range and high-end GPUs. ...
Acknowledgement This work is supported by the Research Council of Norway through grant number 250935 (GPU Ocean). ...
arXiv:1912.02607v1
fatcat:2hxo3zsybvhrvckouaemjekjam
IMPACT OF ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING ON PROGRAMMING AND SOFTWARE ENGINEERING
2020
International Research Journal of Computer Science
Human brain consists of approximately 10 with each other with 100 system and transmit the information through the synapses to other neurons to carry out all the complex task we call natural intelligence ...
Thus the incumbent programmers are informed about machine learning and AI NLP advancement at a very rapid space future software engineering needs and demands. ...
The GPT-3 model is too big, requires several days of parallel processing on most powerful GPUs and TPUs to train. ...
doi:10.26562/irjcs.2020.v0709.003
fatcat:vs4dg4ptcnc25i6roanrhh353e
Executing linear algebra kernels in heterogeneous distributed infrastructures with PyCOMPSs
2018
Oil & Gas Science and Technology
Results obtained with linear algebra benchmarks demonstrate that significant performance can be obtained with a few lines of Python. ...
PyCOMPSs follows such approach for Python, and this paper presents its extensions to combine task-based parallelism and thread-level parallelism. ...
There are multiple modules that have been developed to provide parallelism in Python, such as the multiprocessing, Parallel Python (PP) and MPI modules. ...
doi:10.2516/ogst/2018047
fatcat:n6bovm4gibct3ojnkn33bhzai4
« Previous
Showing results 1 — 15 out of 941 results