A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2013; you can also visit the original URL.
The file type is application/pdf
.
Filters
Libra: Tailoring SIMD Execution Using Heterogeneous Hardware and Dynamic Configurability
2012
2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
through the use of heterogeneous hardware across the SIMD lanes. ...
The Libra accelerator increases SIMD utility by blurring the divide between vector and instruction parallelism to support efficient execution of a wider range of loops, and it increases hardware utilization ...
This research is supported by Samsung Advanced Institute of Technology and the National Science Foundation under grants CCF-0916689 and CNS-0964478. ...
doi:10.1109/micro.2012.17
dblp:conf/micro/ParkPPM12
fatcat:3skvsbe2vbeujmh2ctwoqwmthe
Software transparent dynamic binary translation for coarse-grain reconfigurable architectures
2016
2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)
custom hardware and the flexibility of software. ...
In this work we propose DORA, a Dynamic Optimizer for Reconfigurable Architectures, which achieves substantial (2X) power and performance improvements while having low hardware and insertion overhead and ...
We also thank David Albonesi and the anonymous reviewers for their feedback on the paper. ...
doi:10.1109/hpca.2016.7446060
dblp:conf/hpca/WatkinsNC16
fatcat:ssmt2kzalba2xoozatcp6imlxq
Construction and exploitation of VLIW ASIPs with heterogeneous vector-widths
2014
Microprocessors and microsystems
This paper proposes the use of heterogeneous vector widths and a method to explore the heterogeneous vector widths for VLIW ASIPs. ...
A large part of the DLP is usually exploited through application vectorization and implementation of vector operations in processors executing the applications. ...
Dynamic configurability enables lane resource to execute as a traditional SIMD processor, be re-purposed to behave as a clustered VLIW processor, or combinations of both. ...
doi:10.1016/j.micpro.2014.05.004
fatcat:hua42e74vbgllnl4aejatueoe4
Stream-Dataflow Acceleration
2017
Proceedings of the 44th Annual International Symposium on Computer Architecture - ISCA '17
SIMD, GPGPUs) are insufficient, as evidenced by the orderof-magnitude improvements and industry adoption of application and domain-specific accelerators in important areas like machine learning, computer ...
This paper explores the hardware and software implications, describes its detailed microarchitecture, and evaluates an implementation. ...
ACKNOWLEDGMENTS We would first like to thank the anonymous reviewers for their detailed questions and suggestions which helped us to clarify the presentation. ...
doi:10.1145/3079856.3080255
dblp:conf/isca/NowatzkiGAS17
fatcat:xm36xv6cbfevveabvmpafgjtli
Stream-Dataflow Acceleration
2017
SIGARCH Computer Architecture News
SIMD, GPGPUs) are insufficient, as evidenced by the orderof-magnitude improvements and industry adoption of application and domain-specific accelerators in important areas like machine learning, computer ...
This paper explores the hardware and software implications, describes its detailed microarchitecture, and evaluates an implementation. ...
ACKNOWLEDGMENTS We would first like to thank the anonymous reviewers for their detailed questions and suggestions which helped us to clarify the presentation. ...
doi:10.1145/3140659.3080255
fatcat:g5spj35pyvh7jlr6i3qr5ertlq
Exploring the potential of heterogeneous von neumann/dataflow execution models
2015
Proceedings of the 42nd Annual International Symposium on Computer Architecture - ISCA '15
Mahlke, “Libra: Tailoring simd
[3] M. Budiu, P. V. Artigas, and S. C. ...
Goldstein, “Dataflow: A complement execution using heterogeneous hardware and dynamic configurability,”
to superscalar,” in ISPASS, 2005. ...
doi:10.1145/2749469.2750380
dblp:conf/isca/NowatzkiGS15
fatcat:hql7xymzgjch3jv4dk5mvbesji
Applications and Techniques for Fast Machine Learning in Science
[article]
2021
arXiv
pre-print
training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. ...
This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. ...
Much of the advancements within ML over the past few years have originated from the use of heterogeneous computing hardware. ...
arXiv:2110.13041v1
fatcat:cvbo2hmfgfcuxi7abezypw2qrm
Applications and Techniques for Fast Machine Learning in Science
2022
Frontiers in Big Data
training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. ...
This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. ...
“Dynamic application reconfiguration on heterogeneous hardware,” in Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (Providence, RI: VEE). ...
doi:10.3389/fdata.2022.787421
pmid:35496379
pmcid:PMC9041419
fatcat:5w2exf7vvrfvnhln7nj5uppjga
Applications and Techniques for Fast Machine Learning in Science
2022
training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. ...
This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. ...
Much of the advancements within ML over the past few years have originated from the use of heterogeneous computing hardware. ...
doi:10.26083/tuprints-00021245
fatcat:q5g26rdbfbfozmfcywdpew56be