Filters








7 Hits in 4.6 sec

An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators

Seyed Morteza Nabavinejad, Mohammad Baharloo, Kun-Chih Chen, Maurizio Palesi, Tim Kogel, Masoumeh Ebrahimi
2020 IEEE Journal on Emerging and Selected Topics in Circuits and Systems  
As a result, efficient interconnection and data movement mechanisms for future on-chip artificial intelligence (AI) accelerators are worthy of study.  ...  On the other hand, with the flexible interconnection, the DNN accelerator can support different computing flow, which increases the computing flexibility.  ...  To address this challenge, a GEMM accelerator for DNN training called SIGMA [71] is proposed that can handle various irregular GEMMs dimension and different levels of sparsity, while maximizing the utilization  ... 
doi:10.1109/jetcas.2020.3022920 fatcat:idqitgwnrnegbd4dhrly3xsxbi

SPOTS: An Accelerator for Sparse Convolutional Networks Leveraging Systolic General Matrix-Matrix Multiplication [article]

Mohammadreza Soltaniyeh, Richard P. Martin, Santosh Nagarakatte
2021 arXiv   pre-print
This paper proposes a new hardware accelerator for sparse convolutional neural networks (CNNs) by building a hardware unit to perform the Image to Column (IM2COL) transformation of the input feature map  ...  coupled with a systolic array-based general matrix-matrix multiplication (GEMM) unit.  ...  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.  ... 
arXiv:2107.13386v2 fatcat:k7oampka5rdztojmmwrr2yvnfm

Self-Adaptive Reconfigurable Arrays (SARA): Using ML to Assist Scaling GEMM Acceleration [article]

Ananda Samajdar, Michael Pellauer, Tushar Krishna
2022 arXiv   pre-print
With increasing diversity in Deep Neural Network(DNN) models in terms of layer shapes and sizes, the research community has been investigating flexible/reconfigurable accelerator substrates.  ...  The second is being able to determine the right configuration of the array for the current DNN model and/or layer and reconfigure the accelerator at runtime.  ...  RSA is a flexible, scalable GEMM accelerator constructed using systolic-cells and pipelined bypass paths.  ... 
arXiv:2101.04799v2 fatcat:dxw75e3zjvgiladqmr2m6dv3lu

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights [article]

Shail Dave, Riyadh Baghdadi, Tony Nowatzki, Sasikanth Avancha, Aviral Shrivastava, Baoxin Li
2021 arXiv   pre-print
This paper provides a comprehensive survey on the efficient execution of sparse and irregular tensor computations of ML models on hardware accelerators.  ...  Unstructured sparsity and tensors with varying dimensions yield irregular computation, communication, and memory access patterns; processing them on hardware accelerators in a conventional manner does  ...  (a) Flexible dot product engine in SIGMA accelerator [105] features a data distribution NoC with configurable switches interconnected via Benes topology.  ... 
arXiv:2007.00864v2 fatcat:k4o2xboh4vbudadfiriiwjp7uu

An Accelerator for Sparse Convolutional Neural Networks Leveraging Systolic General Matrix-matrix Multiplication

Mohammadreza Soltaniyeh, Richard P. Martin, Santosh Nagarakatte
2022 ACM Transactions on Architecture and Code Optimization (TACO)  
This article proposes a novel hardware accelerator for the inference task with sparse convolutional neural networks (CNNs) by building a hardware unit to perform Image to Column ( Im2Col ) transformation  ...  The systolic-array-based GEMM unit in the accelerator can be dynamically configured as multiple GEMM units with square-shaped systolic arrays or as a single GEMM unit with a tall systolic array.  ...  This article proposes SPOTS, a hardware accelerator for sparse CNNs with a matrix multiplication formulation of convolution using the Im2Col transformation.  ... 
doi:10.1145/3532863 fatcat:7iny5yokebaepi5fedmvh4hyku

FlexBlock: A Flexible DNN Training Accelerator with Multi-Mode Block Floating Point Support [article]

Seock-Hwan Noh, Jahyun Koo, Seunghyun Lee, Jongse Park, Jaeha Kung
2022 arXiv   pre-print
Training deep neural networks (DNNs) is a computationally expensive job, which can take weeks or months even with high performance GPUs.  ...  Backed up by this algorithmic opportunity, we develop a flexible DNN training accelerator, dubbed FlexBlock, which supports three different BFP precision modes, possibly different among activation, weight  ...  SIGMA [48] proposes a training accelerator that handles both sparsity and irregular structure in GEMM operations by using a Benes network for efficient workload distribution.  ... 
arXiv:2203.06673v1 fatcat:xzsduig2mndbxitohkvq67b374

Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

Maurizio Capra, Beatrice Bussolino, Alberto Marchisio, Guido Masera, Maurizio Martina, Muhammad Shafique
2020 IEEE Access  
In a scenario where several sophisticated algorithms need to be executed with limited energy and low latency, the need for cost-effective hardware platforms capable of implementing energy-efficient DL  ...  However, to achieve impressive performance, these algorithms employ very deep networks, requiring a significant computational power, both during the training and inference time.  ...  Stripes [179] is an accelerator for DNNs with flexible bitwidth for the activations that uses bit-serial operations.  ... 
doi:10.1109/access.2020.3039858 fatcat:nticzqgrznftrcji4krhyjxudu