45 Hits in 3.3 sec

Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator [article]

Tian Zhao, Yaqi Zhang, Kunle Olukotun
2019 arXiv   pre-print
Most execution models for RNN acceleration break computation graphs into BLAS kernels, which lead to significant inter-kernel data movement and resource underutilization.  ...  We evaluate our optimization strategy on such abstraction with DeepBench using a configurable spatial accelerator.  ...  We also thank Google for the cloud credits.  ... 
arXiv:1909.13654v1 fatcat:6w2ccglyanfmrohqler55k2pzu

A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective

Artur Podobas, Kentaro Sano, Satoshi Matsuoka
2020 IEEE Access  
These limitations have been recognized for decades (e.g., [15]-[17]), and have driven forth a different branch of reconfigurable architecture: the Coarse-Grained Reconfigurable Architecture (CGRAs).  ...  Recently, a particular branch of reconfigurable architecture -the Field-Programmable Gate Arrays (FPGAs) [9] -has experienced a surge of renewed interest for use in High-Performance Computing (HPC), and  ...  This article is based on results obtained from a project commissioned by New Energy and Industrial Technology Development Organization (NEDO).  ... 
doi:10.1109/access.2020.3012084 fatcat:xx6k4lxbjbc4tjebbymp42w634

Capstan: A Vector RDA for Sparsity [article]

Alexander Rucker, Matthew Vilim, Tian Zhao, Yaqi Zhang, Raghu Prabhakar, Kunle Olukotun
2021 arXiv   pre-print
This paper proposes Capstan: a scalable, parallel-patterns-based, reconfigurable-dataflow accelerator (RDA) for sparse and dense tensor applications.  ...  For sparse applications that can be mapped to Plasticine, a recent dense RDA, Capstan is 7.6x to 365x faster and only 13% larger.  ...  Capstan uses a parallel-patterns abstraction for declarative sparsity: users express what they want to compute, which permits optimized hardware for vectorized sparse iteration and dynamic memory reordering  ... 
arXiv:2104.12760v1 fatcat:k7s6dsgikvgixcip2xyrd7eriu

Morphling: A Reconfigurable Architecture for Tensor Computation

Liqiang Lu, Yun Liang
2021 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
At architecture level, we propose a reconfigurable design to support the execution model.  ...  Another critical aspect for tensor algebra is the involved tensors can be with varying mixes of dense and sparse representation. Such diversified applications are notoriously difficult to accelerate.  ...  Recently, the demand for massive parallel computation has grown continuously in the field of CGRAs [16, 66, 70, 77, 78, 87] . Plasticine [66] is a CGRA written in Chisel.  ... 
doi:10.1109/tcad.2021.3135322 fatcat:5omvjoxy3zd7jgaear5swwjlou

A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective [article]

Artur Podobas, Kentaro Sano, Satoshi Matsuoka
2020 arXiv   pre-print
Among the more salient and practical of the post-Moore alternatives are reconfigurable systems, with Coarse-Grained Reconfigurable Architectures (CGRAs) seemingly capable of striking a balance between  ...  We find that there are ample opportunities for future research on CGRAs, in particular with respect to size, functionality, support for parallel programming models, and to evaluate more complex applications  ...  ACKNOWLEDGEMENTS This article is based on results obtained from a project commissioned by the New energy and Industrial Technology Development Organization (NEDO).  ... 
arXiv:2004.04509v1 fatcat:sxnq32chxjf6hfc5ygjsxqjwl4

Software defined architectures for data analytics

Vito Giovanni Castellana, Marco Minutoli, Antonino Tumeo, Marco Lattuada, Pietro Fezzardi, Fabrizio Ferrandi
2019 Proceedings of the 24th Asia and South Pacific Design Automation Conference on - ASPDAC '19  
In this position paper, we describe a possible toolchain for reconfigurable architectures targeted at data analytics.  ...  Nevertheless, we argue that the challenges for reconfigurable computing remain in the software.  ...  The Plasticine [39] spatially reconfigurable design combines pattern compute units (PCUs), hierarchically composed of a reconfigurable pipeline with multiple stages of SIMD functional units, and pattern  ... 
doi:10.1145/3287624.3288754 dblp:conf/aspdac/CastellanaMTLFF19 fatcat:ip4n6z5ghzdubmzs7g6vsq3jmu

ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing [article]

Cheng Tan, Chenhao Xie, Tong Geng, Andres Marquez, Antonino Tumeo, Kevin Barker, Ang Li
2021 arXiv   pre-print
In this paper, we propose ARENA -- an asynchronous reconfigurable accelerator ring architecture as a potential scenario on how the future HPC and data centers will be like.  ...  The asynchronous tasking for bringing computation to data is achieved by circulating the task token, which describes the data-flow graphs to be executed for a task, among the CGRA cluster connected by  ...  either to a CPU or to a reconfigurable accelerator (e.g., CGRA).  ... 
arXiv:2011.04931v2 fatcat:by6cwfzn3zbfre3goyvg35aru4

A Survey of Coarse-Grained Reconfigurable Architecture and Design

Leibo Liu, Jianfeng Zhu, Zhaoshi Li, Yanan Lu, Yangdong Deng, Jie Han, Shouyi Yin, Shaojun Wei
2019 ACM Computing Surveys  
This article reviews the architecture and design of CGRAs thoroughly for the purpose of exploiting their full potential. First, a novel multidimensional taxonomy is proposed.  ...  As general-purpose processors have hit the power wall and chip fabrication cost escalates alarmingly, coarsegrained reconfigurable architectures (CGRAs) are attracting increasing interest from both academia  ...  [19] proposed a CGRA architecture specialized for a parallel patterns programming model in which pattern compute units were specialized for computing nested patterns as a multistage pipeline of SIMD  ... 
doi:10.1145/3357375 fatcat:pqi4d33i6bg45a6llswhwd44qi

Design and Implementation of a Coarse-grained Dynamically Reconfigurable Multimedia Accelerator

Hung K. Nguyen, Xuan-Tu Tran
2022 ACM Transactions on Parallel Computing  
This paper proposes and implements a Coarse-grained dynamically Reconfigurable Architecture (CGRA), named REMAC (Reconfigurable Multimedia Accelerator).  ...  In addition, a novel architecture of dedicated hierarchy data memory system is proposed to increase data reuse between iterations and make data always available for parallel operation of RPU.  ...  Therefore, the configuration generation is transparent and does not require a compiler. • Plasticine [15] is a CGRA that is proposed for the efficient execution of parallel patterns.  ... 
doi:10.1145/3543544 fatcat:nkogwa2wxjd57klv2ugcwgfb54


Subhankar Pal, Siying Feng, Dong-hyeon Park, Sung Kim, Aporva Amarnath, Chi-Sheng Yang, Xin He, Jonathan Beaumont, Kyle May, Yan Xiong, Kuba Kaszyk, John Magnus Morton (+8 others)
2020 Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques  
Transmuter addresses a rapidly growing set of algorithms exhibiting dynamic data movement patterns, irregularity, and sparsity, while delivering GPU-like efficiencies for traditional dense applications  ...  This is facilitated by a fabric of light-weight cores connected to a network of reconfigurable caches and crossbars.  ...  ACKNOWLEDGMENTS We thank the anonymous reviewers for their helpful feedback.  ... 
doi:10.1145/3410463.3414627 dblp:conf/IEEEpact/PalFPKAYHBMXKMS20 fatcat:kwsaun2g65b6jl6mdqrhgiv7yq

Taurus: An Intelligent Data Plane [article]

Tushar Swamy, Alexander Rucker, Muhammad Shahbaz, Kunle Olukotun
2020 arXiv   pre-print
Taurus adds custom hardware based on a map-reduce abstraction to programmable network devices, such as switches and NICs; this new hardware uses pipelined and SIMD parallelism for fast inference.  ...  Our evaluation of a Taurus-enabled switch ASIC -- supporting several real-world benchmarks -- shows that Taurus operates three orders of magnitude faster than a server-based control plane, while increasing  ...  We base Taurus's map-reduce block on Plasticine, a Coarse-Grained Reconfigurable Array (CGRA) composed of a sea of compute and memory units, which are reconfigurable to match applications' dataflow graphs  ... 
arXiv:2002.08987v1 fatcat:6hxsnoqxxnglvewm56zl7uzine

Top Picks from the 2017 Computer Architecture Conferences

Thomas F. Wenisch
2018 IEEE Micro  
I thank Benjamin Lee and Daniel Jiménez for handling articles with which I had a conflict of interest.  ...  Finally, I thank all the authors who submitted their work for consideration and the authors of the selected articles for producing the final versions of their articles for this issue.  ...  In "Plasticine: A Reconfigurable Accelera-tor for Parallel Patterns," Raghu Prabhakar and colleagues report on an accelerator architecture that can be reconfigured to exploit several patterns of parallelism  ... 
doi:10.1109/mm.2018.032271056 fatcat:jcs52oysenetto4iangvkkutty

Exploiting Fine-Grain Ordered Parallelism in Dense Matrix Algorithms [article]

Jian Weng, Vidushi Dadu, Tony Nowatzki
2019 arXiv   pre-print
A programmable accelerator with similar performance/power/area would be highly desirable.  ...  We find that fine-grain ordered parallelism can be exploited by supporting: 1. fine-grain stream-based communication/synchronization; 2. inductive data-reuse and memory access patterns; 3. implicit vector-masking  ...  Table 2 explains how to add FGOP capabilities to out-of-order (OOO) cores and Plasticine [4] , a reconfigurable dataflow fabric, programmed using parallel patterns.  ... 
arXiv:1905.06238v1 fatcat:6iuqxbtwwrevxju27gyh5mlikq

Augmented Reality Guidance with Multimodality Imaging Data and Depth-Perceived Interaction for Robot-Assisted Surgery

Rong Wen, Chin-Boon Chng, Chee-Kong Chui
2017 Robotics  
In this paper, we proposed and developed a robot-assisted surgical system with interactive surgical guidance using tablet-based AR with a Kinect sensor for three-dimensional (3D) localization of patient  ...  Depth data acquired from the Kinect sensor was visualized in cone-shaped layers for 3D AR-assisted navigation.  ...  NVIDIA has also developed GPUs (Tegra series) for mobile devices for sophisticated graphical rendering and parallel computation.  ... 
doi:10.3390/robotics6020013 fatcat:rx765palonegpec777sbcmsxgi

AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers

Kalhan Koul, Jackson Melchert, Kavya Sreedhar, Leonard Truong, Gedeon Nyengele, Keyi Zhang, Qiaoyi Liu, Jeff Setter, Po-Han Chen, Yuchen Mei, Maxwell Strange, Ross Daly (+21 others)
2022 ACM Transactions on Embedded Computing Systems  
The lack of a structured approach for updating both the compiler and the accelerator in tandem has impeded many attempts to systematize this procedure.  ...  We propose a new approach to enable flexible and evolvable domain-specific hardware specialization based on coarse-grained reconfigurable arrays (CGRAs).  ...  Our system-on-chip (SoC) with a coarse-grained reconfigurable array (CGRA) accelerator.  ... 
doi:10.1145/3534933 fatcat:atsai6sto5e67cyx7tyg2bl5w4
« Previous Showing results 1 — 15 out of 45 results