Filters








1,866 Hits in 5.4 sec

Automatically exploiting the memory hierarchy of GPUs through just-in-time compilation

Michail Papadimitriou, Juan Fumero, Athanasios Stratikopoulos, Christos Kotselidis
2021 Proceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments  
In this paper, we propose an alternative approach based on Just-In-Time (JIT) compilation to automatically and transparently exploit local memory allocation and data locality on GPUs.  ...  Although Graphics Processing Units (GPUs) have become pervasive for data-parallel workloads, the efficient exploitation of their tiered memory hierarchy requires explicit programming.  ...  Figure 2 . 2 Overview of the proposed JIT compilation flow for automatically exploiting the GPU memory hierarchy.  ... 
doi:10.1145/3453933.3454014 fatcat:yoxhdtisurhtpnqu2mtg5kt3vy

Using compiler snippets to exploit parallelism on heterogeneous hardware: a Java reduction case study

Juan Fumero, Christos Kotselidis
2018 Proceedings of the 10th ACM SIGPLAN International Workshop on Virtual Machines and Intermediate Languages - VMIL 2018  
However, for the Java programming language, little work has been done for automatically compiling and exploiting reductions in Java applications on GPUs.  ...  The snippets are expressed in pure Java with OpenCL semantics, simplifying the JIT compiler optimizations and code generation.  ...  Authors would also like to thank David Leopoldseder and Foivos Zakkak for fruitful discussions and feedback.  ... 
doi:10.1145/3281287.3281292 dblp:conf/oopsla/FumeroK18 fatcat:d2gnnf3jojhylcljratwtc3qw4

Speculatively vectorized bytecode

Erven Rohou, Sergei Dyshel, Dorit Nuzman, Ira Rosen, Kevin Williams, Albert Cohen, Ayal Zaks
2011 Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers - HiPEAC '11  
Advanced JIT compilers can then quickly tailor this bytecode to exploit SIMD capabilities of appropriate platforms, yielding up to 14.7× and 11.8× speedups on x86 and PowerPC platforms (including JIT-compilation  ...  Efficient exploitation of SIMD instructions has become crucial for the performance of many applications.  ...  A special method name can be generated for aligned accesses, for the JIT to generate optimized code (e.g., when the offline compiler peels a loop to align a memory access).  ... 
doi:10.1145/1944862.1944871 dblp:conf/hipeac/RohouDNRWCZ11 fatcat:uugmwqaowjem3cjwliqfrb7rqu

HPA: An Opportunistic Approach to Embedded Energy Efficiency [article]

Baptiste Delporte and Roberto Rigamonti and Alberto Dassatti
2015 arXiv   pre-print
In this paper we present a transparent, on-the-fly optimization scheme that allows a generic application to automatically exploit the available computing units to partition its computational load.  ...  The idea is to use profiling to automatically select a computing-intensive candidate for acceleration, and then distribute the computations to the different units by off-loading blocks of code to them.  ...  We have chosen them as they present a different degree of sparsity, and we wanted to investigate the capabilities of a JIT-based framework to exploit this information when optimizing the computations.  ... 
arXiv:1511.08635v1 fatcat:mtcdrerrhnfwvjp2oy2ios4x34

Transparent Compiler and Runtime Specializations for Accelerating Managed Languages on FPGAs

Michail Papadimitriou, Juan Fumero, Athanasios Stratikopoulos, Foivos S. Zakkak, Christos Kotselidis
2020 The Art, Science, and Engineering of Programming  
We also provide a break-down analysis of the proposed compiler optimizations for FPGA execution, as a means to project their impact on the applications? characteristics.  ...  The proposed solution is prototyped in the context of the Java programming language and TornadoVM; an open-source programming framework for Java execution on heterogeneous hardware.  ...  The increased latency in HLS compilation times was the motivation for providing a set of execution modes in TornadoVM that can either perform a whole compilation for FPGAs at runtime (Full JIT), or load  ... 
doi:10.22152/programming-journal.org/2021/5/8 fatcat:iele6rhz5bavphj733vpsyckci

Just-In-Time Code Reuse: On the Effectiveness of Fine-Grained Address Space Layout Randomization

K. Z. Snow, F. Monrose, L. Davi, A. Dmitrienko, C. Liebchen, A. Sadeghi
2013 2013 IEEE Symposium on Security and Privacy  
API functions and gadgets, and JIT-compile a target program using those gadgets-all within a script environment at the time an exploit is launched.  ...  We demonstrate the power of our framework by using it in conjunction with a real-world exploit against Internet Explorer, and also provide extensive evaluations that demonstrate the practicality of just-in-time  ...  The authors would like to thank Stefan Nürnberger, Teryl Taylor and Andrew White for fruitful discussions about this work. We also thank the anonymous reviewers for their insightful comments.  ... 
doi:10.1109/sp.2013.45 dblp:conf/sp/SnowMDDLS13 fatcat:ctdl4xgdzvde3eatvtbmny3txu

Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation

Juan Fumero, Michel Steuwer, Lukas Stadler, Christophe Dubach
2017 Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments - VEE '17  
Using just-intime compilation, we automatically generate OpenCL code at runtime which is specialized to the actual observed data types using profiling information.  ...  However, exploiting heterogeneous hardware requires the use of low-level programming language approaches such as OpenCL, which is incredibly challenging, even for advanced programmers.  ...  Acknowledgments The authors would also like to thank the anonymous reviewers as well as Roland Schatz, Stefan Marr and Gilles Duboscq for fruitful discussions.  ... 
doi:10.1145/3050748.3050761 dblp:conf/vee/FumeroSSD17 fatcat:xh7trtbg6rhubaby3j2xoyqtqy

Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation

Juan Fumero, Michel Steuwer, Lukas Stadler, Christophe Dubach
2017 SIGPLAN notices  
Using just-intime compilation, we automatically generate OpenCL code at runtime which is specialized to the actual observed data types using profiling information.  ...  However, exploiting heterogeneous hardware requires the use of low-level programming language approaches such as OpenCL, which is incredibly challenging, even for advanced programmers.  ...  Acknowledgments The authors would also like to thank the anonymous reviewers as well as Roland Schatz, Stefan Marr and Gilles Duboscq for fruitful discussions.  ... 
doi:10.1145/3140607.3050761 fatcat:svzfaag4evg3hlfnosicgcabvy

Truffle

Christian Wimmer, Thomas Würthinger
2012 Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity - SPLASH '12  
We present Truffle, a novel framework for implementing managed languages in Java TM .  ...  When the tree reaches a stable state, partial evaluation compiles the tree into optimized machine code.  ...  We would also like to thank all members of the Virtual Machine Research Group at Oracle Labs for their support and contributions.  ... 
doi:10.1145/2384716.2384723 dblp:conf/oopsla/WimmerW12 fatcat:dhjykwoudbgxljku7yylrts2r4

Vapor SIMD: Auto-vectorize once, run everywhere

Dorit Nuzman, Sergei Dyshel, Erven Rohou, Ira Rosen, Kevin Williams, David Yuste, Albert Cohen, Ayal Zaks
2011 International Symposium on Code Generation and Optimization (CGO 2011)  
We present our design for a synergistic auto-vectorizing compilation scheme. The scheme is composed of an aggressive, generic offline stage coupled with a lightweight, target-specific online stage.  ...  Single-Instruction-Multiple-Data (SIMD) hardware is ubiquitous and markedly diverse, but can be difficult for JIT compilers to efficiently target due to resource and budget constraints.  ...  We are also thankful to Andrea Ornstein for supporting the gcc4cli backend.  ... 
doi:10.1109/cgo.2011.5764683 dblp:conf/cgo/NuzmanDRRWYCZ11 fatcat:pawpjpuurzgyrpdwapv2qpl3uq

Pricing Python parallelism: a dynamic language cost model for heterogeneous platforms

Dejice Jacob, Phil Trinder, Jeremy Singer
2020 Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages  
The ALPyNA framework analyses moderately complex Python loop nests and automatically JIT compiles code for heterogeneous CPU and GPU architectures.  ...  Auto-parallelizing compilers are common for static languages, often using a cost model to determine when the GPU execution speed will outweigh the offload overheads.  ...  Acknowledgments The authors would like to thank Alexandre Bergel for his friendly and constructive shepherding of this paper. We also thank the anonymous reviewers for their helpful suggestions.  ... 
doi:10.1145/3426422.3426979 fatcat:ex2h76pov5dgtiysm7ap4rn5de

Surgical precision JIT compilers

Tiark Rompf, Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Hassan Chafi, Kunle Olukotun
2013 Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI '14  
We present Lancet, a JIT compiler framework for Java bytecode that enables such a tight, two-way integration with the running program.  ...  In this paper, we propose to turn JIT compilation into a precision tool by adding two essential and generic metaprogramming facilities: First, allow programs to invoke JIT compilation explicitly.  ...  Acknowledgments The authors would like to thank the Graal/Truffle team for many insightful discussions.  ... 
doi:10.1145/2594291.2594316 dblp:conf/pldi/RompfSBLCO14 fatcat:f5sjt2lx5vbhncoz5pv45aiuta

Compiling and Optimizing Java 8 Programs for GPU Execution

Kazuaki Ishizaki, Akihiro Hayashi, Gita Koblents, Vivek Sarkar
2015 2015 International Conference on Parallel Architecture and Compilation (PACT)  
These features and optimizations are supported and automatically performed by a JIT compiler that is built on top of a production version of the IBM Java 8 runtime environment.  ...  This paper presents a just-in-time (JIT) compiler that can generate and optimize GPU code from a pure Java program written using lambda expressions with the new parallel streams APIs in Java 8.  ...  We thank Marcel Mitran for his encouragement and support in pursuing the parallel streams API and lambda approach, and thank Jimmy Kwa for his extensive contribution to the implementation.  ... 
doi:10.1109/pact.2015.46 dblp:conf/IEEEpact/IshizakiHKS15 fatcat:c6bwzxy7vbbg5mbo6ohajgmepa

FAuST: A Framework for Formal Verification, Automated Debugging, and Software Test Generation [chapter]

Heinz Riener, Görschwin Fey
2012 Lecture Notes in Computer Science  
We present FAuST, an extensible framework for Formal verification, AUtomated debugging, and Software Test generation.  ...  Our framework uses a highly customizeable Bounded Model Checking (BMC) algorithm for formal reasoning about software programs and provides different applications, e.g., property checking, functional equivalence  ...  Optionally, FAuST allows for validation of counterexamples on the real program using LLVM's JIT compiler and execution engine, i.e., a test driver with the values of the counterexample is automatically  ... 
doi:10.1007/978-3-642-31759-0_17 fatcat:bl2xge6t7bd3pmwwxvzxool47e

Profiling-Assisted Decoupled Access-Execute [article]

Jonatan Waern, Per Ekemark, Konstantinos Koukos, Stefanos Kaxiras, Alexandra Jimborean
2016 arXiv   pre-print
For applications whose behavior vary significantly with respect to the input data, the profiling can be performed online, accompanied by just-in-time compilation.  ...  We evaluated the benefits in energy efficiency and performance for both static and dynamic code generation and showed that precise prefetching of critical loads can result in 20% energy improvements, on  ...  DAE exploits the fact that reducing frequency during memory-bound phases saves energy without harming performance, while automatically generated coarse phases reduces the number of time frequency is scaled  ... 
arXiv:1601.01722v1 fatcat:u6u42rxglzd3thvicbjelo5x7a
« Previous Showing results 1 — 15 out of 1,866 results