A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Automatically exploiting the memory hierarchy of GPUs through just-in-time compilation
2021
Proceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
In this paper, we propose an alternative approach based on Just-In-Time (JIT) compilation to automatically and transparently exploit local memory allocation and data locality on GPUs. ...
Although Graphics Processing Units (GPUs) have become pervasive for data-parallel workloads, the efficient exploitation of their tiered memory hierarchy requires explicit programming. ...
Figure 2 . 2 Overview of the proposed JIT compilation flow for automatically exploiting the GPU memory hierarchy. ...
doi:10.1145/3453933.3454014
fatcat:yoxhdtisurhtpnqu2mtg5kt3vy
Using compiler snippets to exploit parallelism on heterogeneous hardware: a Java reduction case study
2018
Proceedings of the 10th ACM SIGPLAN International Workshop on Virtual Machines and Intermediate Languages - VMIL 2018
However, for the Java programming language, little work has been done for automatically compiling and exploiting reductions in Java applications on GPUs. ...
The snippets are expressed in pure Java with OpenCL semantics, simplifying the JIT compiler optimizations and code generation. ...
Authors would also like to thank David Leopoldseder and Foivos Zakkak for fruitful discussions and feedback. ...
doi:10.1145/3281287.3281292
dblp:conf/oopsla/FumeroK18
fatcat:d2gnnf3jojhylcljratwtc3qw4
Speculatively vectorized bytecode
2011
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers - HiPEAC '11
Advanced JIT compilers can then quickly tailor this bytecode to exploit SIMD capabilities of appropriate platforms, yielding up to 14.7× and 11.8× speedups on x86 and PowerPC platforms (including JIT-compilation ...
Efficient exploitation of SIMD instructions has become crucial for the performance of many applications. ...
A special method name can be generated for aligned accesses, for the JIT to generate optimized code (e.g., when the offline compiler peels a loop to align a memory access). ...
doi:10.1145/1944862.1944871
dblp:conf/hipeac/RohouDNRWCZ11
fatcat:uugmwqaowjem3cjwliqfrb7rqu
HPA: An Opportunistic Approach to Embedded Energy Efficiency
[article]
2015
arXiv
pre-print
In this paper we present a transparent, on-the-fly optimization scheme that allows a generic application to automatically exploit the available computing units to partition its computational load. ...
The idea is to use profiling to automatically select a computing-intensive candidate for acceleration, and then distribute the computations to the different units by off-loading blocks of code to them. ...
We have chosen them as they present a different degree of sparsity, and we wanted to investigate the capabilities of a JIT-based framework to exploit this information when optimizing the computations. ...
arXiv:1511.08635v1
fatcat:mtcdrerrhnfwvjp2oy2ios4x34
Transparent Compiler and Runtime Specializations for Accelerating Managed Languages on FPGAs
2020
The Art, Science, and Engineering of Programming
We also provide a break-down analysis of the proposed compiler optimizations for FPGA execution, as a means to project their impact on the applications? characteristics. ...
The proposed solution is prototyped in the context of the Java programming language and TornadoVM; an open-source programming framework for Java execution on heterogeneous hardware. ...
The increased latency in HLS compilation times was the motivation for providing a set of execution modes in TornadoVM that can either perform a whole compilation for FPGAs at runtime (Full JIT), or load ...
doi:10.22152/programming-journal.org/2021/5/8
fatcat:iele6rhz5bavphj733vpsyckci
Just-In-Time Code Reuse: On the Effectiveness of Fine-Grained Address Space Layout Randomization
2013
2013 IEEE Symposium on Security and Privacy
API functions and gadgets, and JIT-compile a target program using those gadgets-all within a script environment at the time an exploit is launched. ...
We demonstrate the power of our framework by using it in conjunction with a real-world exploit against Internet Explorer, and also provide extensive evaluations that demonstrate the practicality of just-in-time ...
The authors would like to thank Stefan Nürnberger, Teryl Taylor and Andrew White for fruitful discussions about this work. We also thank the anonymous reviewers for their insightful comments. ...
doi:10.1109/sp.2013.45
dblp:conf/sp/SnowMDDLS13
fatcat:ctdl4xgdzvde3eatvtbmny3txu
Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation
2017
Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments - VEE '17
Using just-intime compilation, we automatically generate OpenCL code at runtime which is specialized to the actual observed data types using profiling information. ...
However, exploiting heterogeneous hardware requires the use of low-level programming language approaches such as OpenCL, which is incredibly challenging, even for advanced programmers. ...
Acknowledgments The authors would also like to thank the anonymous reviewers as well as Roland Schatz, Stefan Marr and Gilles Duboscq for fruitful discussions. ...
doi:10.1145/3050748.3050761
dblp:conf/vee/FumeroSSD17
fatcat:xh7trtbg6rhubaby3j2xoyqtqy
Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation
2017
SIGPLAN notices
Using just-intime compilation, we automatically generate OpenCL code at runtime which is specialized to the actual observed data types using profiling information. ...
However, exploiting heterogeneous hardware requires the use of low-level programming language approaches such as OpenCL, which is incredibly challenging, even for advanced programmers. ...
Acknowledgments The authors would also like to thank the anonymous reviewers as well as Roland Schatz, Stefan Marr and Gilles Duboscq for fruitful discussions. ...
doi:10.1145/3140607.3050761
fatcat:svzfaag4evg3hlfnosicgcabvy
We present Truffle, a novel framework for implementing managed languages in Java TM . ...
When the tree reaches a stable state, partial evaluation compiles the tree into optimized machine code. ...
We would also like to thank all members of the Virtual Machine Research Group at Oracle Labs for their support and contributions. ...
doi:10.1145/2384716.2384723
dblp:conf/oopsla/WimmerW12
fatcat:dhjykwoudbgxljku7yylrts2r4
Vapor SIMD: Auto-vectorize once, run everywhere
2011
International Symposium on Code Generation and Optimization (CGO 2011)
We present our design for a synergistic auto-vectorizing compilation scheme. The scheme is composed of an aggressive, generic offline stage coupled with a lightweight, target-specific online stage. ...
Single-Instruction-Multiple-Data (SIMD) hardware is ubiquitous and markedly diverse, but can be difficult for JIT compilers to efficiently target due to resource and budget constraints. ...
We are also thankful to Andrea Ornstein for supporting the gcc4cli backend. ...
doi:10.1109/cgo.2011.5764683
dblp:conf/cgo/NuzmanDRRWYCZ11
fatcat:pawpjpuurzgyrpdwapv2qpl3uq
Pricing Python parallelism: a dynamic language cost model for heterogeneous platforms
2020
Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages
The ALPyNA framework analyses moderately complex Python loop nests and automatically JIT compiles code for heterogeneous CPU and GPU architectures. ...
Auto-parallelizing compilers are common for static languages, often using a cost model to determine when the GPU execution speed will outweigh the offload overheads. ...
Acknowledgments The authors would like to thank Alexandre Bergel for his friendly and constructive shepherding of this paper. We also thank the anonymous reviewers for their helpful suggestions. ...
doi:10.1145/3426422.3426979
fatcat:ex2h76pov5dgtiysm7ap4rn5de
Surgical precision JIT compilers
2013
Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI '14
We present Lancet, a JIT compiler framework for Java bytecode that enables such a tight, two-way integration with the running program. ...
In this paper, we propose to turn JIT compilation into a precision tool by adding two essential and generic metaprogramming facilities: First, allow programs to invoke JIT compilation explicitly. ...
Acknowledgments The authors would like to thank the Graal/Truffle team for many insightful discussions. ...
doi:10.1145/2594291.2594316
dblp:conf/pldi/RompfSBLCO14
fatcat:f5sjt2lx5vbhncoz5pv45aiuta
Compiling and Optimizing Java 8 Programs for GPU Execution
2015
2015 International Conference on Parallel Architecture and Compilation (PACT)
These features and optimizations are supported and automatically performed by a JIT compiler that is built on top of a production version of the IBM Java 8 runtime environment. ...
This paper presents a just-in-time (JIT) compiler that can generate and optimize GPU code from a pure Java program written using lambda expressions with the new parallel streams APIs in Java 8. ...
We thank Marcel Mitran for his encouragement and support in pursuing the parallel streams API and lambda approach, and thank Jimmy Kwa for his extensive contribution to the implementation. ...
doi:10.1109/pact.2015.46
dblp:conf/IEEEpact/IshizakiHKS15
fatcat:c6bwzxy7vbbg5mbo6ohajgmepa
FAuST: A Framework for Formal Verification, Automated Debugging, and Software Test Generation
[chapter]
2012
Lecture Notes in Computer Science
We present FAuST, an extensible framework for Formal verification, AUtomated debugging, and Software Test generation. ...
Our framework uses a highly customizeable Bounded Model Checking (BMC) algorithm for formal reasoning about software programs and provides different applications, e.g., property checking, functional equivalence ...
Optionally, FAuST allows for validation of counterexamples on the real program using LLVM's JIT compiler and execution engine, i.e., a test driver with the values of the counterexample is automatically ...
doi:10.1007/978-3-642-31759-0_17
fatcat:bl2xge6t7bd3pmwwxvzxool47e
Profiling-Assisted Decoupled Access-Execute
[article]
2016
arXiv
pre-print
For applications whose behavior vary significantly with respect to the input data, the profiling can be performed online, accompanied by just-in-time compilation. ...
We evaluated the benefits in energy efficiency and performance for both static and dynamic code generation and showed that precise prefetching of critical loads can result in 20% energy improvements, on ...
DAE exploits the fact that reducing frequency during memory-bound phases saves energy without harming performance, while automatically generated coarse phases reduces the number of time frequency is scaled ...
arXiv:1601.01722v1
fatcat:u6u42rxglzd3thvicbjelo5x7a
« Previous
Showing results 1 — 15 out of 1,866 results