A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2014; you can also visit the original URL.
The file type is application/pdf
.
Filters
MachSuite: Benchmarks for accelerator design and customized architectures
2014
2014 IEEE International Symposium on Workload Characterization (IISWC)
To improve standardization within the accelerator research community, we present MachSuite, a collection of 19 benchmarks for evaluating high-level synthesis tools and accelerator-centric architectures ...
MachSuite spans a broad application space, captures a variety of different program behaviors, and provides implementations tailored towards the needs of accelerator designers and researchers, including ...
MachSuite is targeted for fixed-function accelerator design-a compute paradigm in which no ISA exists. ...
doi:10.1109/iiswc.2014.6983050
dblp:conf/iiswc/ReagenASWB14
fatcat:oijgahsvavczzchn7mp4jaj23y
Best-Effort FPGA Programming: A Few Steps Can Go a Long Way
[article]
2018
arXiv
pre-print
FPGA-based heterogeneous architectures provide programmers with the ability to customize their hardware accelerators for flexible acceleration of many workloads. ...
We show that for a broad class of accelerator benchmarks from MachSuite, the proposed best-effort guideline improves the FPGA accelerator performance by 42-29,030x. ...
On top of
Benchmarks This paper presents the proposed best-effort guideline through a complete accelerator design demonstration on a collection of benchmarks in MachSuite [12] . ...
arXiv:1807.01340v1
fatcat:6ocpzvp2cvgkninbtyvvyk7yiu
Designing Application-Specific Heterogeneous Architectures from Performance Models
2019
2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)
In this paper, we propose an approach for designing application-specific heterogeneous systems based on performance models through combining accelerator and processor core models. ...
This approach aims to ease the design of multi-core multi-accelerator architecture, consequently contributes to explore the design space by automating the design steps. ...
ACKNOWLEDGMENTS The authors would like to thank Bluespec for providing us the Bluespec tools and also Intel Labs for giving us access to a cluster of the integrated BDW/FPGAs, within IL's vLab academic ...
doi:10.1109/mcsoc.2019.00045
dblp:conf/mcsoc/CongC19
fatcat:nxpx56t2yna7rhpev4qhutu6ei
An Open-Source Tool Flow for the Composition of Reconfigurable Hardware Thread Pool Architectures
2015
2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines
ThreadPoolComposer closes the gap in the design flow between high-level synthesis and general-purpose IP integration by automatically composing hardware thread pools and their external interfaces from ...
high-level descriptions and opening them to software using a common API. ...
ACKNOWLEDGMENT This work was performed in the context of "REPARA -Re-engineering and Enabling Performance and poweR of Applications" [10] , a Seventh Framework Programme project of the European Union. ...
doi:10.1109/fccm.2015.22
dblp:conf/fccm/KorinthCK15
fatcat:h6mk47yrkjbkbowy6sway5ptli
Stream-Dataflow Acceleration
2017
Proceedings of the 44th Annual International Symposium on Computer Architecture - ISCA '17
Compared to a state-of-the-art domain specific accelerator (DianNao), and fixed-function accelerators for MachSuite, Softbrain can match their performance with only 2× power overhead on average. ...
The dataflow component of this architecture enables high concurrency, and the stream component enables communication and coordination at very-low power and area overhead. ...
We would also like to thank Sophia Shao and Michael Pellauer for their thoughtful suggestions and advice during the revision process. ...
doi:10.1145/3079856.3080255
dblp:conf/isca/NowatzkiGAS17
fatcat:xm36xv6cbfevveabvmpafgjtli
Stream-Dataflow Acceleration
2017
SIGARCH Computer Architecture News
Compared to a state-of-the-art domain specific accelerator (DianNao), and fixed-function accelerators for MachSuite, Softbrain can match their performance with only 2× power overhead on average. ...
The dataflow component of this architecture enables high concurrency, and the stream component enables communication and coordination at very-low power and area overhead. ...
We would also like to thank Sophia Shao and Michael Pellauer for their thoughtful suggestions and advice during the revision process. ...
doi:10.1145/3140659.3080255
fatcat:g5spj35pyvh7jlr6i3qr5ertlq
ARAPrototyper: Enabling Rapid Prototyping and Evaluation for Accelerator-Rich Architectures
[article]
2016
arXiv
pre-print
However, many design issues related to the complex interaction between general-purpose cores, accelerators, customized on-chip interconnects, and memory systems remain unclear and difficult to evaluate ...
We believe that ARAPrototyper can be an attractive alternative for ARA design and evaluation. ...
DESIGN AUTOMATION FLOW AND ARA CUSTOMIZATION INTERFACE The main challenge to do architectural design space exploration through FPGA prototyping is the long development cycle for each generation of an ARA ...
arXiv:1610.09761v1
fatcat:vc36crhlprfyroau2vgasubbtq
Automated accelerator generation and optimization with composable, parallel and pipeline architecture
2018
Proceedings of the 55th Annual Design Automation Conference on - DAC '18
These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. ...
Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space. ...
Conclusion While the FPGA-based heterogeneous architectures are becoming a promising paradigm to provide continued performance and energy improvement in modern datacenters, accelerator programming arises ...
doi:10.1145/3195970.3195999
dblp:conf/dac/CongWYZ18
fatcat:ezisbhayq5hlxfoko437bljdne
Efficient data supply for hardware accelerators with prefetching and access/execute decoupling
2016
2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
Computing systems are becoming accelerator-rich • General-purpose cores + a large number of accelerators • Challenge: Design and verification complexity • Non-recurring engineering (NRE) cost per accelerator ...
gate-level (commercial ASIC flow) • Benchmark accelerators from MachSuite Energy Comparison • 15% energy reduction on average because of reduced stalls • MemUnits/queues only consume a small amount of ...
data between SPM and main
memory
• Pros: Good performance
• Cons: High design effort,
accelerator-specific, not reusable
Cache-based accelerators
• Pros: Low design effort, cache can be reused ...
doi:10.1109/micro.2016.7783749
dblp:conf/micro/ChenS16
fatcat:7t2ghr5f5vhflkblmpjfly7uua
AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators
[article]
2021
arXiv
pre-print
The experimental results show that AutoDSE is able to identify the design point that achieves, on the geometric mean, 19.9x speedup over one CPU core for Machsuite and Rodinia benchmarks. ...
Adopting FPGA as an accelerator in datacenters is becoming mainstream for customized computing, but the fact that FPGAs are hard to program creates a steep learning curve for software programmers. ...
Peichen Pan for his invaluable support with the Merlin Compiler and Dr. Lorenzo Ferretti and Qi Sun for helping with the comparison to their work. ...
arXiv:2009.14381v2
fatcat:d4ynz74pubd4pnpko2nvkowqke
Predictable Accelerator Design with Time-Sensitive Affine Types
[article]
2020
arXiv
pre-print
High-level synthesis (HLS) tools promise to raise the level of abstraction by compiling C or C++ to accelerator designs. ...
Field-programmable gate arrays (FPGAs) provide an opportunity to co-design applications with hardware accelerators, yet they remain difficult to program. ...
Figure 8 . 8 The design spaces for three MachSuite benchmarks. ...
arXiv:2004.04852v2
fatcat:tqhcu2pqorayfhtp65kc7a4oli
Design Space Exploration of Heterogeneous-Accelerator SoCs with Hyperparameter Optimization
2021
Proceedings of the 26th Asia and South Pacific Design Automation Conference
We also applied the methodology to find the optimal architecture including its coherency interface for a complex SoC made up of six accelerated-workloads. ...
In this paper, we describe a methodology allowing to explore the design space of power-performance heterogeneous SoCs by combining an architecture simulator (gem5-Aladdin) and a hyperparameter optimization ...
The others correspond to four benchmarks from MachSuite [20] : AES-256, GEMM-nCubed, FFT-Transpose, and Stencil-3D. ...
doi:10.1145/3394885.3431415
fatcat:fsi5tqkygbd73e7zcgznf7lox4
Accelerating Face Detection on Programmable SoC Using C-Based Synthesis
2017
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '17
While HLS continues to evolve with a growing set of algorithms, methodologies, and tools to efficiently map software designs onto optimized hardware architectures, there continues to lack realistic benchmark ...
Our design is able to achieve a frame rate of 30 frames per second which is suitable for realtime applications. ...
Acknowledgement This research was supported in part by NSF Awards #1065307, #1337240, #1453378, and a research gift from Xilinx, Inc. ...
doi:10.1145/3020078.3021753
fatcat:3vnvkuv3nzho5nlhsjmbwivssy
Decision tree based hardware power monitoring for run time dynamic power management in FPGA
2017
2017 27th International Conference on Field Programmable Logic and Applications (FPL)
A flexible architecture of the hardware power monitoring is proposed, which can be instrumented in any RTL design for runtime power estimation, dispensing with the need for extra power measurement devices ...
Experimental results of applying the proposed model to benchmarks with different resource types reveal an average error up to 4% for dynamic power estimation. ...
We applied our methodology to develop the decision-tree-based power models and respectively build hardware wrappers for several benchmarks in Chstone [19] , Polybench [20] and Machsuite [21] . ...
doi:10.23919/fpl.2017.8056832
dblp:conf/fpl/LinZS17
fatcat:6g5hiseccfckxf5nfdslu5om44
Enabling Automated FPGA Accelerator Optimization Using Graph Neural Networks
[article]
2021
arXiv
pre-print
Despite this, it still can take weeks to develop a high-performance architecture mainly because there are many design choices at a higher level that requires more time to explore. ...
High-level synthesis (HLS) has freed the computer architects from developing their designs in a very low-level language and needing to exactly specify how the data should be transferred in register-level ...
Acknowledgments We would like to thank Marci Baun for editing the paper. ...
arXiv:2111.08848v2
fatcat:kpngdne3lrbybfe5mbmjia65ha
« Previous
Showing results 1 — 15 out of 27 results