Filters








1,056 Hits in 4.9 sec

Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks [article]

Hokchhay Tann, Soheil Hashemi, Iris Bahar, Sherief Reda
2017 arXiv   pre-print
In addition, we propose a hardware accelerator design to achieve low-power, low-latency inference with insignificant degradation in accuracy.  ...  While Deep Neural Networks (DNNs) push the state-of-the-art in many machine learning applications, they often require millions of expensive floating-point operations for each input classification.  ...  We would like to thank NVIDIA Corporation for their generous GPU donation.  ... 
arXiv:1705.04288v1 fatcat:wd4z3lzxubht7b3nyluigymwne

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions [article]

Stylianos I. Venieris, Alexandros Kouris, Christos-Savvas Bouganis
2018 arXiv   pre-print
Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.  ...  To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs.  ...  The proposed methodology comprises a benchmark suite and guidelines for evaluation metrics.  ... 
arXiv:1803.05900v1 fatcat:3gkwtxuahrghhmhz4nmkpqe7we

Runtime configurable deep neural networks for energy-accuracy trade-off

Hokchhay Tann, Soheil Hashemi, R. Iris Bahar, Sherief Reda
2016 Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis - CODES '16  
We evaluate our techniques using both an ASIC-based hardware accelerator as well as a low-power embedded GPGPU and show that our approach leads to only a small or negligible loss in the final network accuracy  ...  We present a novel dynamic configuration technique for deep neural networks that permits step-wise energy-accuracy trade-offs during runtime.  ...  We would like to thank NVIDIA Corporation for their generous GPU donation. We also thank Professor Pedro Felzenszwalb for the discussions and his helpful inputs.  ... 
doi:10.1145/2968456.2968458 dblp:conf/codes/TannHBR16 fatcat:5mpfyjxt5fdltg6q2quacd7fw4

Exploring Model Stability of Deep Neural Networks for Reliable RRAM-based In-Memory Acceleration

Gokul Krishnan, Li Yang, Jingbo Sun, Jubin Hazra, Xiaocong Du, Maximilian Liehr, Zheng Li, Karsten Beckmann, Rajiv Joshi, Nathaniel C Cady, Deliang Fan, Yu Cao
2022 IEEE transactions on computers  
RRAM-based in-memory computing (IMC) effectively accelerates deep neural networks (DNNs).  ...  Finally, we propose a novel variation-aware training method to improve model stability, in which there exists the most stable model for the best post-mapping accuracy of compressed DNNs.  ...  For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited.  ... 
doi:10.1109/tc.2022.3174585 fatcat:h2ablod6hnhknkzrz4h6i4ytom

A Methodology for Automatic Selection of Activation Functions to Design Hybrid Deep Neural Networks [article]

Alberto Marchisio, Muhammad Abdullah Hanif, Semeen Rehman, Maurizio Martina, Muhammad Shafique
2018 arXiv   pre-print
In this paper, we propose a novel methodology to automatically select the best-possible activation function for each layer of a given DNN, such that the overall DNN accuracy, compared to considering only  ...  one type of activation function for the whole DNN, is improved.  ...  Alternatively, our Hybrid DNN can easily be implemented and integrated in a hardware accelerator for Deep Learning Inference.  ... 
arXiv:1811.03980v1 fatcat:cmua3ebnczdonpdkcwmcanbcta

MoRS: An Approximate Fault Modelling Framework for Reduced-Voltage SRAMs [article]

İsmail Emir Yüksel, Behzad Salami, Oğuz Ergin, Osman Sabri Ünsal, Adrian Cristal Kestelman
2022 arXiv   pre-print
Modern workloads such as Deep Neural Networks (DNNs) running on these heterogeneous fabrics are highly dependent on the on-chip memory architecture for efficient acceleration.  ...  We inject the faults generated by MoRS into the on-chip memory of the DNN accelerator to evaluate the resilience of the system under the test.  ...  The prior work [78] on Near-Threshold Voltage FinFET SRAMs presents a fault model for SRAMs based on uniform random distribution. Givaki et al.  ... 
arXiv:2110.05855v2 fatcat:sitwqc7lczdtnp4neva7sqoigm

Editorial: Special Issue on Compact Deep Neural Networks With Industrial Applications

Lixin Fan, Diana Marculescu, Werner Bailer, Yurong Chen
2020 IEEE Journal on Selected Topics in Signal Processing  
Javaheripi et al. introduce an adaptive sampling methodology for automated compression of DNNs for accelerated inference on resource-constrained platforms in "AdaNS: Adaptive Non-uniform Sampling for Automated  ...  "Accelerating Convolutional Neural Network via Structured Gaussian Scale Mixture Models: a Joint Grouping and Pruning Approach" by Huang et al. proposes a hybrid network compression technique for exploiting  ... 
doi:10.1109/jstsp.2020.3006323 fatcat:d75ni7ocajb4pemovq2l3ton4i

ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models

Matthias Wess, Matvey Ivanov, Christoph Unger, Anvesh Nookala, Alexander Wendt, Axel Jantsch
2020 IEEE Access  
The proposed methodology extracts a set of models from micro-kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.  ...  With new accelerator hardware for DNN, the computing power for AI applications has increased rapidly.  ...  Therefore, we propose a methodology for generating stacked mapping and layer execution time models for hardware accelerators and systematically compare the prediction accuracy of different modeling approaches  ... 
doi:10.1109/access.2020.3047259 fatcat:bncypj4ji5dwrmt46nercjph3m

Benchmarking Delay and Energy of Neural Inference Circuits

Dmitri E. Nikonov, Ian A. Young
2019 IEEE Journal on Exploratory Solid-State Computational Devices and Circuits  
A consistent and transparent methodology is proposed and used to benchmark this comprehensive set of options across several application cases.  ...  Neural network circuits and architectures are currently under active research for applications to artificial intelligence and machine learning.  ...  TREATMENT OF INTERCONNECTS The benchmarks for neural network elements, neural gates, and larger DNNs are built up hierarchically, from benchmarks for a synapse and a neuron obtained in Section IV.  ... 
doi:10.1109/jxcdc.2019.2956112 fatcat:m3cgaqljnfdnvflqp6arq6tulq

Exploring the Design Space of Deep Convolutional Neural Networks at Large Scale [article]

Forrest Iandola
2016 arXiv   pre-print
Judiciously choosing benchmarks and metrics. 2. Rapidly training CNN models. 3. Defining and describing the CNN design space. 4. Exploring the design space of CNN architectures.  ...  Taken together, these four themes comprise an effective methodology for discovering the "right" CNN architectures to meet the needs of practical applications.  ...  This benchmarking methodology appears to be catching on.  ... 
arXiv:1612.06519v1 fatcat:jwo2gyfjvfh3lbkfdntctx24o4

An acceleration strategy for randomize-then-optimize sampling via deep neural networks [article]

Liang Yan, Tao Zhou
2021 arXiv   pre-print
We present a Bayesian inverse problem governed by a benchmark elliptic PDE to demonstrate the computational accuracy and efficiency of our new algorithm (i.e., DNN-RTO).  ...  In particular, the training points for the DNN-surrogate are drawn from a local approximated posterior distribution, and it is shown that the resulting algorithm can provide a flexible and efficient sampling  ...  DNN surrogate for RTO-MH In this section, we shall present a DNN-based surrogate modeling to accelerate the RTO-MH approach. 4.1. Feedforward DNN-based surrogate modeling.  ... 
arXiv:2104.06285v1 fatcat:gjgw4sl2yzaqdie3zprvidvkne

LoRD-Net: Unfolded Deep Detection Network with Low-Resolution Receivers [article]

Shahin Khobahi, Nir Shlezinger, Mojtaba Soltanalian, Yonina C. Eldar
2021 arXiv   pre-print
We numerically evaluate the proposed receiver architecture for one-bit signal recovery in wireless communications and demonstrate that the proposed hybrid methodology outperforms both data-driven and model-based  ...  Our method is a model-aware data-driven architecture based on deep unfolding of first-order optimization iterations.  ...  We generate the channel matrices for the COST-2100 model for a narrow-band indoor scenario with closelyspaced users at 2.6 GHz band, where the BS is equipped with a uniform linear array (ULA) that has  ... 
arXiv:2102.02993v1 fatcat:bzzbnqshdjdetpboyve4xlkk64

DyVEDeep: Dynamic Variable Effort Deep Neural Networks [article]

Sanjay Ganapathy, Swagath Venkataramani, Balaraman Ravindran, Anand Raghunathan
2017 arXiv   pre-print
We build DyVEDeep versions for 5 popular image recognition benchmarks - one for CIFAR-10 and four for ImageNet (AlexNet, OverFeat and VGG-16, weight-compressed AlexNet).  ...  Previous efforts propose specialized hardware implementations for DNNs, statically prune the network, or compress the weights.  ...  To evaluate DyVEDeep, we utilized pre-trained DNN models available publicly on the Caffe Model Zoo (BVLC (a)) benchmark repository.  ... 
arXiv:1704.01137v1 fatcat:o5xkilbt6zdetizgdo576wdowi

Training DNN IoT Applications for Deployment On Analog NVM Crossbars [article]

Fernando García-Redondo, Shidhartha Das, Glen Rosendale
2020 arXiv   pre-print
We make two contributions: Firstly, we propose a training algorithm that eliminates the need for tuning individual layers of a DNN ensuring uniformity across layer weights and activations.  ...  A trend towards energy-efficiency, security and privacy has led to a recent focus on deploying DNNs on microcontrollers.  ...  Second, in CIFAR10 benchmark, the solution leads to up to 80% area saving -0.22 mm 2 vs 1.1 mm 2 for 4 bit accelerators. For HAR benchmark, up to 20% area saving is achieved.  ... 
arXiv:1910.13850v3 fatcat:ljjlbrjqwjdorfyjgzoojegqpq

SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and Training [article]

Xiaohan Chen, Yang Zhao, Yue Wang, Pengfei Xu, Haoran You, Chaojian Li, Yonggan Fu, Yingyan Lin, Zhangyang Wang
2021 arXiv   pre-print
We also design a dedicated hardware accelerator to fully utilize the SD structure to improve the real energy efficiency and latency.  ...  The record-breaking performance of deep neural networks (DNNs) comes with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage.  ...  DNN accelerators in terms of energy consumption and latency when processing representative DNN models and benchmark datasets.  ... 
arXiv:2101.01163v2 fatcat:z6aywk7qr5hbngigv65fyfrpja
« Previous Showing results 1 — 15 out of 1,056 results