Filters








27 Hits in 1.6 sec

MachSuite: Benchmarks for accelerator design and customized architectures

Brandon Reagen, Robert Adolf, Yakun Sophia Shao, Gu-Yeon Wei, David Brooks
2014 2014 IEEE International Symposium on Workload Characterization (IISWC)  
To improve standardization within the accelerator research community, we present MachSuite, a collection of 19 benchmarks for evaluating high-level synthesis tools and accelerator-centric architectures  ...  MachSuite spans a broad application space, captures a variety of different program behaviors, and provides implementations tailored towards the needs of accelerator designers and researchers, including  ...  MachSuite is targeted for fixed-function accelerator design-a compute paradigm in which no ISA exists.  ... 
doi:10.1109/iiswc.2014.6983050 dblp:conf/iiswc/ReagenASWB14 fatcat:oijgahsvavczzchn7mp4jaj23y

Best-Effort FPGA Programming: A Few Steps Can Go a Long Way [article]

Jason Cong, Zhenman Fang, Yuchen Hao, Peng Wei, Cody Hao Yu, Chen Zhang, Peipei Zhou
2018 arXiv   pre-print
FPGA-based heterogeneous architectures provide programmers with the ability to customize their hardware accelerators for flexible acceleration of many workloads.  ...  We show that for a broad class of accelerator benchmarks from MachSuite, the proposed best-effort guideline improves the FPGA accelerator performance by 42-29,030x.  ...  On top of Benchmarks This paper presents the proposed best-effort guideline through a complete accelerator design demonstration on a collection of benchmarks in MachSuite [12] .  ... 
arXiv:1807.01340v1 fatcat:6ocpzvp2cvgkninbtyvvyk7yiu

Designing Application-Specific Heterogeneous Architectures from Performance Models

Thanh Cong, Francois Charot
2019 2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)  
In this paper, we propose an approach for designing application-specific heterogeneous systems based on performance models through combining accelerator and processor core models.  ...  This approach aims to ease the design of multi-core multi-accelerator architecture, consequently contributes to explore the design space by automating the design steps.  ...  ACKNOWLEDGMENTS The authors would like to thank Bluespec for providing us the Bluespec tools and also Intel Labs for giving us access to a cluster of the integrated BDW/FPGAs, within IL's vLab academic  ... 
doi:10.1109/mcsoc.2019.00045 dblp:conf/mcsoc/CongC19 fatcat:nxpx56t2yna7rhpev4qhutu6ei

An Open-Source Tool Flow for the Composition of Reconfigurable Hardware Thread Pool Architectures

Jens Korinth, David de la Chevallerie, Andreas Koch
2015 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines  
ThreadPoolComposer closes the gap in the design flow between high-level synthesis and general-purpose IP integration by automatically composing hardware thread pools and their external interfaces from  ...  high-level descriptions and opening them to software using a common API.  ...  ACKNOWLEDGMENT This work was performed in the context of "REPARA -Re-engineering and Enabling Performance and poweR of Applications" [10] , a Seventh Framework Programme project of the European Union.  ... 
doi:10.1109/fccm.2015.22 dblp:conf/fccm/KorinthCK15 fatcat:h6mk47yrkjbkbowy6sway5ptli

Stream-Dataflow Acceleration

Tony Nowatzki, Vinay Gangadhar, Newsha Ardalani, Karthikeyan Sankaralingam
2017 Proceedings of the 44th Annual International Symposium on Computer Architecture - ISCA '17  
Compared to a state-of-the-art domain specific accelerator (DianNao), and fixed-function accelerators for MachSuite, Softbrain can match their performance with only 2× power overhead on average.  ...  The dataflow component of this architecture enables high concurrency, and the stream component enables communication and coordination at very-low power and area overhead.  ...  We would also like to thank Sophia Shao and Michael Pellauer for their thoughtful suggestions and advice during the revision process.  ... 
doi:10.1145/3079856.3080255 dblp:conf/isca/NowatzkiGAS17 fatcat:xm36xv6cbfevveabvmpafgjtli

Stream-Dataflow Acceleration

Tony Nowatzki, Vinay Gangadhar, Newsha Ardalani, Karthikeyan Sankaralingam
2017 SIGARCH Computer Architecture News  
Compared to a state-of-the-art domain specific accelerator (DianNao), and fixed-function accelerators for MachSuite, Softbrain can match their performance with only 2× power overhead on average.  ...  The dataflow component of this architecture enables high concurrency, and the stream component enables communication and coordination at very-low power and area overhead.  ...  We would also like to thank Sophia Shao and Michael Pellauer for their thoughtful suggestions and advice during the revision process.  ... 
doi:10.1145/3140659.3080255 fatcat:g5spj35pyvh7jlr6i3qr5ertlq

ARAPrototyper: Enabling Rapid Prototyping and Evaluation for Accelerator-Rich Architectures [article]

Yu-Ting Chen, Jason Cong, Zhenman Fang, Bingjun Xiao, Peipei Zhou
2016 arXiv   pre-print
However, many design issues related to the complex interaction between general-purpose cores, accelerators, customized on-chip interconnects, and memory systems remain unclear and difficult to evaluate  ...  We believe that ARAPrototyper can be an attractive alternative for ARA design and evaluation.  ...  DESIGN AUTOMATION FLOW AND ARA CUSTOMIZATION INTERFACE The main challenge to do architectural design space exploration through FPGA prototyping is the long development cycle for each generation of an ARA  ... 
arXiv:1610.09761v1 fatcat:vc36crhlprfyroau2vgasubbtq

Automated accelerator generation and optimization with composable, parallel and pipeline architecture

Jason Cong, Peng Wei, Cody Hao Yu, Peng Zhang
2018 Proceedings of the 55th Annual Design Automation Conference on - DAC '18  
These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads.  ...  Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space.  ...  Conclusion While the FPGA-based heterogeneous architectures are becoming a promising paradigm to provide continued performance and energy improvement in modern datacenters, accelerator programming arises  ... 
doi:10.1145/3195970.3195999 dblp:conf/dac/CongWYZ18 fatcat:ezisbhayq5hlxfoko437bljdne

Efficient data supply for hardware accelerators with prefetching and access/execute decoupling

Tao Chen, G. Edward Suh
2016 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)  
Computing systems are becoming accelerator-rich • General-purpose cores + a large number of accelerators • Challenge: Design and verification complexity • Non-recurring engineering (NRE) cost per accelerator  ...  gate-level (commercial ASIC flow) • Benchmark accelerators from MachSuite Energy Comparison • 15% energy reduction on average because of reduced stalls • MemUnits/queues only consume a small amount of  ...  data between SPM and main memory • Pros: Good performance • Cons: High design effort, accelerator-specific, not reusable Cache-based accelerators • Pros: Low design effort, cache can be reused  ... 
doi:10.1109/micro.2016.7783749 dblp:conf/micro/ChenS16 fatcat:7t2ghr5f5vhflkblmpjfly7uua

AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators [article]

Atefeh Sohrabizadeh, Cody Hao Yu, Min Gao, Jason Cong
2021 arXiv   pre-print
The experimental results show that AutoDSE is able to identify the design point that achieves, on the geometric mean, 19.9x speedup over one CPU core for Machsuite and Rodinia benchmarks.  ...  Adopting FPGA as an accelerator in datacenters is becoming mainstream for customized computing, but the fact that FPGAs are hard to program creates a steep learning curve for software programmers.  ...  Peichen Pan for his invaluable support with the Merlin Compiler and Dr. Lorenzo Ferretti and Qi Sun for helping with the comparison to their work.  ... 
arXiv:2009.14381v2 fatcat:d4ynz74pubd4pnpko2nvkowqke

Predictable Accelerator Design with Time-Sensitive Affine Types [article]

Rachit Nigam, Sachille Atapattu, Samuel Thomas, Zhijing Li, Theodore Bauer, Yuwei Ye, Apurva Koti, Adrian Sampson, Zhiru Zhang
2020 arXiv   pre-print
High-level synthesis (HLS) tools promise to raise the level of abstraction by compiling C or C++ to accelerator designs.  ...  Field-programmable gate arrays (FPGAs) provide an opportunity to co-design applications with hardware accelerators, yet they remain difficult to program.  ...  Figure 8 . 8 The design spaces for three MachSuite benchmarks.  ... 
arXiv:2004.04852v2 fatcat:tqhcu2pqorayfhtp65kc7a4oli

Design Space Exploration of Heterogeneous-Accelerator SoCs with Hyperparameter Optimization

Thanh Cong, François Charot
2021 Proceedings of the 26th Asia and South Pacific Design Automation Conference  
We also applied the methodology to find the optimal architecture including its coherency interface for a complex SoC made up of six accelerated-workloads.  ...  In this paper, we describe a methodology allowing to explore the design space of power-performance heterogeneous SoCs by combining an architecture simulator (gem5-Aladdin) and a hyperparameter optimization  ...  The others correspond to four benchmarks from MachSuite [20] : AES-256, GEMM-nCubed, FFT-Transpose, and Stencil-3D.  ... 
doi:10.1145/3394885.3431415 fatcat:fsi5tqkygbd73e7zcgznf7lox4

Accelerating Face Detection on Programmable SoC Using C-Based Synthesis

Nitish Kumar Srivastava, Steve Dai, Rajit Manohar, Zhiru Zhang
2017 Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '17  
While HLS continues to evolve with a growing set of algorithms, methodologies, and tools to efficiently map software designs onto optimized hardware architectures, there continues to lack realistic benchmark  ...  Our design is able to achieve a frame rate of 30 frames per second which is suitable for realtime applications.  ...  Acknowledgement This research was supported in part by NSF Awards #1065307, #1337240, #1453378, and a research gift from Xilinx, Inc.  ... 
doi:10.1145/3020078.3021753 fatcat:3vnvkuv3nzho5nlhsjmbwivssy

Decision tree based hardware power monitoring for run time dynamic power management in FPGA

Zhe Lin, Wei Zhang, Sinha Sharad
2017 2017 27th International Conference on Field Programmable Logic and Applications (FPL)  
A flexible architecture of the hardware power monitoring is proposed, which can be instrumented in any RTL design for runtime power estimation, dispensing with the need for extra power measurement devices  ...  Experimental results of applying the proposed model to benchmarks with different resource types reveal an average error up to 4% for dynamic power estimation.  ...  We applied our methodology to develop the decision-tree-based power models and respectively build hardware wrappers for several benchmarks in Chstone [19] , Polybench [20] and Machsuite [21] .  ... 
doi:10.23919/fpl.2017.8056832 dblp:conf/fpl/LinZS17 fatcat:6g5hiseccfckxf5nfdslu5om44

Enabling Automated FPGA Accelerator Optimization Using Graph Neural Networks [article]

Atefeh Sohrabizadeh, Yunsheng Bai, Yizhou Sun, Jason Cong
2021 arXiv   pre-print
Despite this, it still can take weeks to develop a high-performance architecture mainly because there are many design choices at a higher level that requires more time to explore.  ...  High-level synthesis (HLS) has freed the computer architects from developing their designs in a very low-level language and needing to exactly specify how the data should be transferred in register-level  ...  Acknowledgments We would like to thank Marci Baun for editing the paper.  ... 
arXiv:2111.08848v2 fatcat:kpngdne3lrbybfe5mbmjia65ha
« Previous Showing results 1 — 15 out of 27 results