A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
ISA-independent workload characterization and its implications for specialized architectures
2013
2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
We compare this analysis with an x86 trace and find that several of the analyses are highly sensitive to the ISA. ...
In this work, we perform ISA-independent workload characterization for a variety of important intrinsic program characteristics relating to computation, memory, and control flow. ...
We also perform workload characterization using the ISA-independent characteristics and show that this characterization can be helpful to guide accelerator designers towards opportunity for hardware specialization ...
doi:10.1109/ispass.2013.6557175
dblp:conf/ispass/ShaoB13
fatcat:g3yfyfzc7neqvf2hw35xkmjfq4
HELIX
2012
Proceedings of the Tenth International Symposium on Code Generation and Optimization - CHO '12
The framework uses an analytical model of loop speedups, combined with profile data, to choose loops to parallelize. ...
by using helper threads to prefetch synchronization signals. ...
Acknowledgements Authors thank the anonymous reviewers for their hard work that allowed us to improve the paper significantly. ...
doi:10.1145/2259016.2259028
dblp:conf/cgo/CampanoniJHRWB12
fatcat:saxndpn5rvhodc7nsnfl7tjmxq
HELIX-UP: Relaxing program semantics to unleash parallelization
2015
2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
To convert latent parallelism into performance gains, users may be willing to compromise on the quality of a program's results. ...
In addition to boosting performance, our approach limits the sensitivity of parallelized code to the parameters of target CPUs (such as core-to-core communication latency) and the accuracy of data dependence ...
HCCv2 is based on the ILDJIT compilation framework [3] . We extended ILDJIT to use the latest available version of LLVM: 3.4.1. ...
doi:10.1109/cgo.2015.7054203
dblp:conf/cgo/CampanoniHWB15
fatcat:ralljncn25a43fd7ewmbj3v47m
Power-awareness and smart-resource management in embedded computing systems
2015
2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)
Even though the technologies have improved, we continue to apply outdated approaches to our use of these resources. Key computer science abstractions have not changed since the 1960's. ...
Therefore this is the time for a fresh approach to the way systems are designed and used. ...
We used the second version of the HELIX compiler, HCCv2 [5] , which is based on the ILDJIT compilation framework [2] . ...
doi:10.1109/codesisss.2015.7331372
dblp:conf/codes/SantambrogioACC15
fatcat:k2pooqp6vbedngyces72jq6wga
The HELIX project
2012
Proceedings of the 49th Annual Design Automation Conference on - DAC '12
But because creating parallel programs by hand is difficult and prone to error, there is an urgent need for automatic ways of transforming conventional programs to exploit modern multicore systems. ...
Parallelism has become the primary way to maximize processor performance and power efficiency. ...
Acknowledgements This work was possible thanks to the sponsorship of Microsoft Research, HiPEAC, the Royal Academy of Engineering, EP-SRC and the National Science Foundation (award number IIS-0926148). ...
doi:10.1145/2228360.2228412
dblp:conf/dac/CampanoniJHWB12
fatcat:dxj5iz43czay3jkjt7vd2hsyvi
Eliminating voltage emergencies via software-guided code transformations
2010
ACM Transactions on Architecture and Code Optimization (TACO)
efficiency severely, especially looking ahead to future technology generations. ...
In this paper, we present a hardware-software collaborative approach to mitigate voltage fluctuations. ...
We extended the ILDJIT compiler to include the code injection and scheduling algorithms described in Section 2.3. ...
doi:10.1145/1839667.1839674
fatcat:uxznhy2dzfd5pbpqjvbxd5j6em
Automatically accelerating non-numerical programs by architecture-compiler co-design
2017
Communications of the ACM
Because of the high cost of communication between processors, compilers that parallelize loops automatically have been forced to skip a large class of loops that are both critical to performance and rich ...
Simulations of HELIX-RC, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85× performance speedup for six SPEC CINT2000 benchmarks. ...
The design is guided by the following objectives: Low-latency communication. ...
doi:10.1145/3139461
fatcat:lwedxnvsxzatplcnwu4wbhkzsi
HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs
2014
2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)
Simulations of these approaches, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85× performance speedup for six SPEC CINT2000 benchmarks. ...
To address these challenges, we propose a lightweight architectural enhancement co-designed with a parallelizing compiler, which together can decouple communication from thread execution. ...
Moreover, we would like to thank Glenn Holloway for his invaluable contributions to the HELIX project. ...
doi:10.1109/isca.2014.6853215
dblp:conf/isca/CampanoniBKJWB14
fatcat:uv7k7p2v4bf2pklh4fgnmkuvma
HELIX-RC
2014
SIGARCH Computer Architecture News
Simulations of these approaches, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85× performance speedup for six SPEC CINT2000 benchmarks. ...
To address these challenges, we propose a lightweight architectural enhancement co-designed with a parallelizing compiler, which together can decouple communication from thread execution. ...
Moreover, we would like to thank Glenn Holloway for his invaluable contributions to the HELIX project. ...
doi:10.1145/2678373.2665705
fatcat:g5va7ht7wndb7lec5ar3udp5g4
ACOTES Project: Advanced Compiler Technologies for Embedded Streaming
2010
International journal of parallel programming
However, programming efficiently for streaming architectures is a challenging task, having to carefully partition the computation and map it to processes in a way that best matches the underlying streaming ...
The ACOTES approach for streaming applications consists of compiler-assisted mapping of streaming tasks to highly parallel systems in order to maximize cost-effectiveness, both in terms of energy and in ...
The cost model we developed is capable of guiding the compiler which of these two alternatives is expected to be more profitable (as exaplined in the following Section). ...
doi:10.1007/s10766-010-0132-7
fatcat:dlvxlop65ngzfaezs3yjs2mfm4