Filters








941 Hits in 7.6 sec

Python programmers have GPUs too: automatic Python loop parallelization with staged dependence analysis

Dejice Jacob, Phil Trinder, Jeremy Singer
2019 Proceedings of the 15th ACM SIGPLAN International Symposium on Dynamic Languages - DLS 2019  
We show that staging the dependence analysis is an effective way to maximize performance.  ...  The parallel loop nest code is then converted to CUDA kernels for GPU execution.  ...  Section 3 demonstrates how the dynamic nature of Python enables staging dependence analysis, with some happening ahead-of-time and the remainder happening just-intime.  ... 
doi:10.1145/3359619.3359743 dblp:conf/dls/JacobTS19 fatcat:567e4c2txzfafeeygqlwnztvkq

Pricing Python parallelism: a dynamic language cost model for heterogeneous platforms

Dejice Jacob, Phil Trinder, Jeremy Singer
2020 Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages  
The ALPyNA framework analyses moderately complex Python loop nests and automatically JIT compiles code for heterogeneous CPU and GPU architectures.  ...  Execution times may be reduced by offloading parallel loop nests to a GPU.  ...  Rather than require that developers have parallel programming expertise, our approach is to automatically parallelize loop nests in vanilla Python on GPUs.  ... 
doi:10.1145/3426422.3426979 fatcat:ex2h76pov5dgtiysm7ap4rn5de

Automatic Parallelization of Python Programs for Distributed Heterogeneous Computing [article]

Jun Shirako, Akihiro Hayashi, Sri Raj Paul, Alexey Tumanov, Vivek Sarkar
2022 arXiv   pre-print
This paper introduces a novel approach to automatic ahead-of-time (AOT) parallelization and optimization of sequential Python programs for execution on distributed heterogeneous platforms.  ...  It includes extensions to the polyhedral framework that unify user-written loops and implicit loops present in matrix/tensor operators, as well as automated section of CPU vs. GPU code variants.  ...  Shirako, Hayashi, Paul, Tumanov, Sarkar hybrid Python/C++ code generation, fine-grained NumPy-to-CuPy conversion, and profile-based CPU/GPU runtime selection.  ... 
arXiv:2203.06233v1 fatcat:4e7sa6j3szgfri5pajrgccuvuu

Accelerating Dynamically-Typed Languages on Heterogeneous Platforms Using Guards Optimization

Mohaned Qunaibit, Stefan Brunthaler, Yeoul Na, Stijn Volckaert, Michael Franz, Michael Wagner
2018 European Conference on Object-Oriented Programming  
We have implemented MegaGuards along with an automatic loop parallelization backend in ZipPy, a Python Virtual Machine.  ...  and available accelerator hardware without having to rely on programmer annotations.  ...  Some techniques automatically parallelize sequential loops and run them on GPUs [35, 3] . New languages such as Lime [18, 2] implicitly perform parallel computations on GPUs.  ... 
doi:10.4230/lipics.ecoop.2018.16 dblp:conf/ecoop/QunaibitBNVF18 fatcat:spnmzayejnafzhtmn7uphnh2yq

TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning

Akshay Agrawal, Akshay Naresh Modi, Alexandre Passos, Allen Lavoie, Ashish Agarwal, Asim Shankar, Igor Ganichev, Josh Levenberg, Mingsheng Hong, Rajat Monga, Shanqing Cai
2019 arXiv   pre-print
TensorFlow Eager is a multi-stage, Python-embedded domain-specific language for hardware-accelerated machine learning, suitable for both interactive research and production.  ...  TensorFlow Eager thus offers a multi-stage programming model that makes it easy to interpolate between imperative and staged execution in a single package.  ...  François Chollet was very helpful in integrating TF Eager with Keras.  ... 
arXiv:1903.01855v1 fatcat:hd5ha3pbi5e2vi3gk6c2arjjxq

GPUMap

Ivan Pachev, Chris Lupo
2017 Proceedings of the 7th Workshop on Python for High-Performance and Scientific Computing - PyHPC'17  
Just as individual systems with GPU-computing capability have become more available, so too have high-performance distributed systems.  ...  Spark-Ucores does not provide an abstraction for its GPU components, so programmers must have at least some GPU experience.  ... 
doi:10.1145/3149869.3149875 dblp:conf/sc/PachevL17 fatcat:ufouqszxvzacdmattsj5kl5bdu

PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation

Andreas Klöckner, Nicolas Pinto, Yunsup Lee, Bryan Catanzaro, Paul Ivanov, Ahmed Fasih
2012 Parallel Computing  
High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs).  ...  This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique.  ...  Acknowledgments We would like to thank Ian Cullinan, Tomasz Rybak, Chris Heuser, Romain Brette, and Dan Goodman who have graciously agreed to let us showcase their research in Section 6 of this article  ... 
doi:10.1016/j.parco.2011.09.001 fatcat:o7iwvib6mvawdjbb4kn6xarwce

GPU Computing with Python: Performance, Energy Efficiency and Usability

Håvard H. Holm, André R. Brodtkorb, Martin L. Sætra
2020 Computation  
In this work, we examine the performance, energy efficiency, and usability when using Python for developing high-performance computing codes running on the graphics processing unit (GPU).  ...  mid-range, and high-end GPUs.  ...  However, the GPU programs in Numba are written as Python functions, and the programmer has to rely on Numba for efficient parallelization of the code.  ... 
doi:10.3390/computation8010004 fatcat:3mb46xwegfeclh4qiuuukware4

A domain specific language for performance portable molecular dynamics algorithms

William Robert Saunders, James Grant, Eike Hermann Müller
2018 Computer Physics Communications  
GPUs.  ...  Inspired by this approach, we develop a Python code generation system for molecular dynamics simulations on different parallel architectures, including massively parallel distributed memory systems and  ...  This implies that optimised and parallel code is automatically generated for this important stage of the simulation workflow.  ... 
doi:10.1016/j.cpc.2017.11.006 fatcat:65s7gmeloze7dkqrvygj2r633y

Vispark: GPU-accelerated distributed visual computing using spark

Woohyuk Choi, Won-Ki Jeong
2015 2015 IEEE 5th Symposium on Large Data Analysis and Visualization (LDAV)  
Without the knowledge of GPU-specific APIs such as NVIDA CUDA, and OpenCL, the user can write a Python-like mapper code using Vispark language, and the Vispark translator and runtime system will automatically  ...  In this programming model, the task is decomposed into two user-programmable stages-the map stage processes the input data and generates key-value pairs, and the reduce stage processes a group of values  ...  Note how Vispark's Python-like syntax is translated into C-like CUDA syntax; for example, Python-style for loop for orthogonal iterator in Code 2 is converted to C-style for loop in Code 3.  ... 
doi:10.1109/ldav.2015.7348080 dblp:conf/ldav/ChoiJ15 fatcat:6ediplzdrzgjto7knic3aswdvq

Vispark: GPU-Accelerated Distributed Visual Computing Using Spark

Woohyuk Choi, Sumin Hong, Won-Ki Jeong
2016 SIAM Journal on Scientific Computing  
Without the knowledge of GPU-specific APIs such as NVIDA CUDA, and OpenCL, the user can write a Python-like mapper code using Vispark language, and the Vispark translator and runtime system will automatically  ...  In this programming model, the task is decomposed into two user-programmable stages-the map stage processes the input data and generates key-value pairs, and the reduce stage processes a group of values  ...  Note how Vispark's Python-like syntax is translated into C-like CUDA syntax; for example, Python-style for loop for orthogonal iterator in Code 2 is converted to C-style for loop in Code 3.  ... 
doi:10.1137/15m1026407 fatcat:fslos4kufna25eyddqdnwuebem

Parsl: Pervasive Parallel Programming in Python [article]

Yadu Babuji, Anna Woodard, Zhuozhao Li, Daniel S. Katz, Ben Clifford, Rohan Kumar, Lukasz Lacinski, Ryan Chard, Justin M. Wozniak, Ian Foster, Michael Wilde, Kyle Chard
2019 arXiv   pre-print
Here, we present Parsl, a parallel scripting library that augments Python with simple, scalable, and flexible constructs for encoding parallelism.  ...  This migration towards orchestration rather than implementation, coupled with the growing need for parallel computing (e.g., due to big data and the end of Moore's law), necessitates rethinking how parallelism  ...  The first and third stages have the widest parallelism, with 20 tasks, while the second and fourth stage are reduce-like stages with a single task each.  ... 
arXiv:1905.02158v1 fatcat:okcga7i4vza6zmx5lyj63seone

GPU Computing with Python: Performance, Energy Efficiency and Usability [article]

Håvard H. Holm, André R. Brodtkorb, Martin L. Sætra
2019 arXiv   pre-print
In this work, we examine the performance, energy efficiency and usability when using Python for developing HPC codes running on the GPU.  ...  We investigate the portability of performance and energy efficiency between CUDA and OpenCL; between GPU generations; and between low-end, mid-range and high-end GPUs.  ...  Acknowledgement This work is supported by the Research Council of Norway through grant number 250935 (GPU Ocean).  ... 
arXiv:1912.02607v1 fatcat:2hxo3zsybvhrvckouaemjekjam

IMPACT OF ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING ON PROGRAMMING AND SOFTWARE ENGINEERING

Radha Guha
2020 International Research Journal of Computer Science  
Human brain consists of approximately 10 with each other with 100 system and transmit the information through the synapses to other neurons to carry out all the complex task we call natural intelligence  ...  Thus the incumbent programmers are informed about machine learning and AI NLP advancement at a very rapid space future software engineering needs and demands.  ...  The GPT-3 model is too big, requires several days of parallel processing on most powerful GPUs and TPUs to train.  ... 
doi:10.26562/irjcs.2020.v0709.003 fatcat:vs4dg4ptcnc25i6roanrhh353e

Executing linear algebra kernels in heterogeneous distributed infrastructures with PyCOMPSs

Ramon Amela, Cristian Ramon-Cortes, Jorge Ejarque, Javier Conejero, Rosa M. Badia, A. Anciaux-Sedrakian, Q. H. Tran
2018 Oil & Gas Science and Technology  
Results obtained with linear algebra benchmarks demonstrate that significant performance can be obtained with a few lines of Python.  ...  PyCOMPSs follows such approach for Python, and this paper presents its extensions to combine task-based parallelism and thread-level parallelism.  ...  There are multiple modules that have been developed to provide parallelism in Python, such as the multiprocessing, Parallel Python (PP) and MPI modules.  ... 
doi:10.2516/ogst/2018047 fatcat:n6bovm4gibct3ojnkn33bhzai4
« Previous Showing results 1 — 15 out of 941 results