Filters








652 Hits in 3.5 sec

Synergy

Guanwen Zhong, Akshat Dubey, Cheng Tan, Tulika Mitra
2019 ACM Transactions on Embedded Computing Systems  
Synergy achieves 7.3X speedup, averaged across seven CNN models, over a well-optimized software-only solution.  ...  There has been significant progress in accelerating both their training and inference using high-performance GPUs, FPGAs, and custom ASICs for datacenter-scale environments.  ...  Network-independent Frameworks leverage a fixed optimized hardware architecture for various CNN networks.  ... 
doi:10.1145/3301278 fatcat:hf5vn42mvnaqbktmqfpaajsnrm

Best Practices for the Deployment of Edge Inference: The Conclusions to Start Designing

Georgios Flamis, Stavros Kalapothas, Paris Kitsos
2021 Electronics  
The technology as frameworks and procedures are presented to the order of execution for a complete design cycle with guaranteed success.  ...  The number of Artificial Intelligence (AI) and Machine Learning (ML) designs is rapidly increasing and certain concerns are raised on how to start an AI design for edge systems, what are the steps to follow  ...  Tensorflow [88] CNN framework with GPU (CUDA) performance optimization PyTorch [89] Optimized tensor library for DL using GPU (CUDA) DeepSense [90] Mobile GPU-based CNN optimization using OpenCL  ... 
doi:10.3390/electronics10161912 fatcat:3ywb6inqzvbfxb2vjve6ffvmiq

FeCaffe: FPGA-enabled Caffe with OpenCL for Deep Learning Training and Inference on Intel Stratix 10 [article]

Ke He, Bo Liu, Yu Zhang, Andrew Ling, Dian Gu
2019 arXiv   pre-print
FPGA device, viewed as a potential heterogeneous platform, still cannot provide a comprehensive support for CNN development in popular frameworks, in particular to the training phase.  ...  FPGA-enabled Caffe, a hierarchical software and hardware design methodology based on the Caffe to enable FPGA to support mainline deep learning development features, e.g. training and inference with Caffe  ...  However, it is a positive trend and direction for FPGA designs in CNN applications.  ... 
arXiv:1911.08905v1 fatcat:k727mudp3neutbj7nnwbzvfk6a

3D Generative Adversarial Networks inference implementation on FPGAs

Chao Jiang, Herman Lam, Dave Ojika, Federico Carminati, Gul Rukh Khattak, Sofia Vallecorsa, Francisco Perez, Shawn Slocker
2019 Zenodo  
In this context, CERN openlab has a collaboration with the researchers at SHREC at the University of Florida and with Intel to accelerate the 3DGAN inferencing stage using FPGAs.  ...  A number of details of this work and preliminary results will be presented, specifically in terms of speedup, stimulating a discussion for future development.  ...   Model Optimizer • Convert mainstream deep learning framework model (TensorFlow, Caffe, etc.)  ... 
doi:10.5281/zenodo.3599552 fatcat:utno2adxxfddbehed5al2fhtmy

Accelerate Scientific Deep Learning Models on Heterogeneous Computing Platform with FPGA

Chao Jiang, David Ojika, Sofia Vallecorsa, Thorsten Kurth, Prabhat, Bhavesh Patel, Herman Lam, C. Doglioni, D. Kim, G.A. Stewart, L. Silvestris, P. Jackson (+1 others)
2020 EPJ Web of Conferences  
Using the Intel Deep Learning Acceleration (DLA) development suite to optimize existing FPGA primitives and develop new ones, we were able accelerate the scientific DNN models under study with a speedup  ...  from 2.46x to 9.59x for a single Arria 10 FPGA against a single core (single thread) of a server-class Skylake CPU.  ...  The input to the Model Optimizer is a network model trained using one of the supported frameworks.  ... 
doi:10.1051/epjconf/202024509014 fatcat:sqb2cqag6ng2njpgtjouhe6adu

A Novel FPGA Accelerator Design for Real-Time and Ultra-Low Power Deep Convolutional Neural Networks Compared with Titan X GPU

Shuai Li, Yukui Luo, Kuangyuan Sun, Nandakishor Yadav, Ken Choi
2020 IEEE Access  
In this work, we have analyzed in detail the data dependency in the CNN accelerator and propose specific pipelined operations and data organized manner to design a high throughput CNN accelerator on FPGA  ...  Besides, we have optimized the kernel operations to obtain a high power efficiency. The proposed CNN accelerator supports image classification and real-time object detection with high accuracy.  ...  ACKNOWLEDGMENT The authors thank their colleagues from KETI and KEIT who provided insight and expertise that greatly assisted the research and greatly improved the manuscript.  ... 
doi:10.1109/access.2020.3000009 fatcat:v3i4wliwdbcmfplfn4uxki74ri

Guest Editors' Introduction to the Special Issue on Machine Learning Architectures and Accelerators

Xuehai Qian, Yanzhi Wang, Avinash Karanth
2020 IEEE transactions on computers  
Deep learning models are both computation and storageintensive since it is necessary to extract high-level features for optimization.  ...  Prior research has investigated software/algorithm optimization which includes DNN model architecture search for computation/storage reduction (e.g., depthwise-separate convolutions), model compression  ...  Further on, it is our pleasure to thank the Editor-in-Chief Ahmed Louri and Associate Editors Tao Li and James Hoe for their continuous help and support with all our organizational questions in connection  ... 
doi:10.1109/tc.2020.2997574 fatcat:vfng262tlvagrmfudtv44x75ly

2020 Index IEEE Transactions on Computers Vol. 69

2020 IEEE transactions on computers  
., +, TC Aug. 2020 1143-1158 Neuromorphic System for Spatial and Temporal Information Processing. Modeling Framework for Reliability of Erasure Codes in SSD Arrays.  ...  Vasselle, A., +, TC Oct. 2020 1449-1459 Internet of Things A Hardware-Based Architecture-Neutral Framework for Real-Time IoT Workload Forensics.  ... 
doi:10.1109/tc.2020.3042405 fatcat:htwgwc6gtbcfdkcpj6dcfbuwhq

Table of contents

2018 2018 28th International Conference on Field Programmable Logic and Applications (FPL)  
University), Anthony Skjellum (University of Tennessee at Chattanooga), and Martin Herbordt (Boston University) Session M3B: Machine Learning Frameworks A Collaborative Framework for FPGA-based CNN Design  ...  Sergey Shumarayev (Intel Corporation) University of Manchester), and Mikel Luján (The University of Manchester) FlueNT10G: A Programmable FPGA-based Network Tester for Multi-10-Gigabit Ethernet Andreas  ... 
doi:10.1109/fpl.2018.00004 fatcat:lnd5nf3yczamnkv4y4h27pefnu

2L-3W: 2-Level 3-Way Hardware-Software Co-Verification for the Mapping of Deep Learning Architecture (DLA) onto FPGA Boards [article]

Tolulope A. Odetola and Katie M. Groves and Syed Rafay Hasan
2019 arXiv   pre-print
FPGAs have become a popular choice for deploying deep learning architectures (DLA). There are many researchers that have explored the deployment and mapping of DLA on FPGA.  ...  The 3-Way co-verification provides a cross-paradigm (software, design and hardware) layer-by-layer parameter check to assure the correct implementation and mapping of the DLA onto FPGA boards.  ...  Jiandong et. al [27] proposes a collaborative framework to optimize the OpenCL based CNN design for CNN applications.  ... 
arXiv:1911.05944v1 fatcat:jjfh4o3rb5fgvbbh4poqzwr2mq

hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices [article]

Farah Fahim, Benjamin Hawks, Christian Herwig, James Hirschauer, Sergo Jindariani, Nhan Tran, Luca P. Carloni, Giuseppe Di Guglielmo, Philip Harris, Jeffrey Krupa, Dylan Rankin, Manuel Blanco Valentin (+18 others)
2021 arXiv   pre-print
Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains.  ...  Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery.  ...  Majumder et al. (2019) propose an FPGA-based accelerator design to execute CNNs that leverages TensorFlow for model description and exploits reuse along all dimensions with a 1D systolic array of processing  ... 
arXiv:2103.05579v3 fatcat:5zsggdpmfng6bnfxrnv72tw7q4

A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA

Kamel Abdelouahab, Cédric Bourrasset, Maxime Pelcat, François Berry, Jean-Charles Quinton, Jocelyn Serot
2016 Proceedings of the 10th International Conference on Distributed Smart Camera - ICDSC '16  
This work presents a holistic method relying on approximate computing and design space exploration to optimize the DSP block utilization of a CNN implementation on an FPGA.  ...  Deep Neural Networks are becoming the de-facto standard models for image understanding, and more generally for computer vision tasks.  ...  Section III provides CNN background and links it to dataflow Model of Computation (MoC). Section VI introduces design space exploration for CNNs on FPGAs and our method for holistic optimizing.  ... 
doi:10.1145/2967413.2967430 dblp:conf/icdsc/AbdelouahabBPBQ16 fatcat:qguj7rqa55de5gb7t4hwwmuipm

An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration [article]

Behzad Salami, Erhan Baturay Onural, Ismail Emir Yuksel, Fahrettin Koc, Oguz Ergin, Adrian Cristal Kestelman, Osman S. Unsal, Hamid Sarbazi-Azad, Onur Mutlu
2020 arXiv   pre-print
minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning.  ...  Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators.  ...  The work done for this paper was partially supported by a HiPEAC Collaboration Grant funded by the H2020 HiPEAC Project under grant agreement No. 779656.  ... 
arXiv:2005.03451v2 fatcat:ctsyegwkebdezdrnohurlrtm3a

An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

Behzad Salami, Erhan Baturay Onural, Ismail Emir Yuksel, Fahrettin Koc, Oguz Ergin, Adrian Cristal Kestelman, Osman Unsal, Hamid Sarbazi-Azad, Onur Mutlu
2020 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)  
minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning.  ...  Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators.  ...  The work done for this paper was partially supported by a HiPEAC Collaboration Grant funded by the H2020 HiPEAC Project under grant agreement No. 779656.  ... 
doi:10.1109/dsn48063.2020.00032 dblp:conf/dsn/0001OYKEKUSM20 fatcat:y3czmzbemvhp3o26q2x42wsnia

A Review of FPGA‐Based Custom Computing Architecture for Convolutional Neural Network Inference

Peng Xiyuan, Yu Jinxiang, Yao Bowen, Liu Liansheng, Peng Yu
2021 Chinese journal of electronics  
In this paper, the mainstream methods of CNN structure design, hardwareoriented model compression and FPGA-based custom architecture design are summarized, and the improvement of CNN inference performance  ...  Field-programmable gate array (FPGA)-based custom computing architecture is a promising solution to further enhance the CNN inference performance.  ...  Section II gives a briefly review for the evolution of CNN model, introduces the latest progress on light-weight CNN model and its design methods, which provides base model for Section III.  ... 
doi:10.1049/cje.2020.11.002 fatcat:vt4n4x67k5g6bhkywe7rhm7tda
« Previous Showing results 1 — 15 out of 652 results