223 Hits in 5.5 sec

Design and Applications of Approximate Circuits by Gate-Level Pruning

Jeremy Schlachter, Vincent Camus, Krishna V. Palem, Christian Enz
2017 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
Energy-efficiency is a critical concern for many systems, ranging from IoT objects and mobile devices to highperformance computers.  ...  This significant saving is achieved thanks to the pruned arithmetic circuits which sets some nodes at constant values, enabling the synthesis tool to further simplify the circuit and memory.  ...  A good solution to narrow the design space is to apply the same level of pruning p i to each adder and subtractor inside a given stage i.  ... 
doi:10.1109/tvlsi.2017.2657799 fatcat:5gblp5efzrdcbndlqelcvjhsy4

Efficient Enumeration of Unidirectional Cuts for Technology Mapping of Boolean Networks [article]

Niranjan Kulkarni, Sarma Vrudhula
2016 arXiv   pre-print
In technology mapping, enumeration of subcircuits or cuts to be replaced by a standard cell is an important step that decides both the quality of the solution and execution speed.  ...  We propose an efficient enumeration method based on a novel graph pruning algorithm that utilizes network flow to approximate minimum strong line cut.  ...  Wu, and Y. Ding. Cut ranking and pruning: enabling a general and efficient FPGA mapping solution. In Proc. FPGA’99, pages 29–35, New York, 21–23, Feb. 1999. ACM. [8] T. Cormen, C.  ... 
arXiv:1603.07371v1 fatcat:sk3gvl33gjg3thmm654kweydcu

A Survey on Efficient Convolutional Neural Networks and Hardware Acceleration

Deepak Ghimire, Dayoung Kil, Seong-heum Kim
2022 Electronics  
Recent advances in light-weight deep learning models and network architecture search (NAS) algorithms are reviewed, starting with simplified layers and efficient convolution and including new architectural  ...  The learning capability of convolutional neural networks (CNNs) originates from a combination of various feature extraction layers that fully utilize a large amount of data.  ...  In surveying efficient CNN architectures and hardware acceleration, we are deeply grateful again for all the researchers and their contributions to our science.  ... 
doi:10.3390/electronics11060945 fatcat:bxxgccwkujatzh4onkzh5lgspm

Design, synthesis and evaluation of heterogeneous FPGA with mixed LUTs and macro-gates

Yu Hu, Satyaki Das, Steve Trimberger, Lei He
2007 Computer-Aided Design (ICCAD), IEEE International Conference on  
area-efficient packing, and a SAT-based packing.  ...  The flow includes a cut-based delay-optimal technology mapping, a mixed binary integer and linear programming based area recovery algorithm to balance the resource utilization of macro-gates and LUTs for  ...  Effective and efficient synthesis tools are key enablers for the exploration of different architecture options.  ... 
doi:10.1109/iccad.2007.4397264 dblp:conf/iccad/HuDTH07 fatcat:jbik2mtxxff2xm6icfd2xto57i

Approximate LSTMs for Time-Constrained Inference: Enabling Fast Reaction in Self-Driving Cars [article]

Alexandros Kouris, Stylianos I. Venieris, Michail Rizakis, Christos-Savvas Bouganis
2019 arXiv   pre-print
In this paper, we introduce a progressive inference computing scheme that combines model pruning and computation restructuring leading to the best possible approximation of the result given the available  ...  and robustness.  ...  The goal is to generate an optimised hardware mapping of a given LSTM on a target FPGA, tailored to The concept of progressive inference: Conventional and target behaviour of time-constrained AI systems  ... 
arXiv:1905.00689v2 fatcat:wdx5cbijrfcifpf2hzeqmnx4hy

Learning on Hardware: A Tutorial on Neural Network Accelerators and Co-Processors [article]

Lukas Baischer, Matthias Wess, Nima TaheriNejad
2021 arXiv   pre-print
Deep neural networks (DNNs) have the advantage that they can take into account a large number of parameters, which enables them to solve complex tasks.  ...  Their strengths and weaknesses are shown and a recommendation of suitable applications is given.  ...  Recent advances in sparsity and quantization enabled FPGAs to achieve a throughput comparable to general-purpose GPUs while having a higher power efficiency.  ... 
arXiv:2104.09252v1 fatcat:625wtuskhff3lbswhwmj7decni

Towards an extensible efficient event processing kernel

Mohammad Sadoghi
2012 Proceedings of the on SIGMOD/PODS 2012 PhD Symposium - PhD '12  
The efficient processing of large collections of patterns (Boolean expressions, XPath expressions, or continuous SQL queries) over data streams plays a central role in major data intensive applications  ...  We achieve real-time data analysis requirements by leveraging reconfigurable hardware -FPGAs -to sustain line-rate processing by exploiting unprecedented degrees of parallelism and potential for pipelining  ...  In short, partitioning space based on a high-ranking attri enables the pruning of search space more efficiently while coping with the curse of dimensionality by considering a single attri for each partitioning  ... 
doi:10.1145/2213598.2213602 dblp:conf/sigmod/Sadoghi12 fatcat:dqbmivxweva77nth3vkepgs2iy

Efficient ASIP design for configurable processors with fine-grained resource sharing

Quang Dinh, Deming Chen, Martin D. F. Wong
2008 Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays - FPGA '08  
Firstly, we efficiently generate custom instructions with multi-cycle IO (which allows multi-outputs), thus removing the constraint imposed by the ports of the register file.  ...  In this paper, we investigate two techniques to improve these flows, so that ASIP can be efficiently applied to simple computer architectures in embedded applications.  ...  To keep the problem tractable, we use a greedy pruning technique: we keep only a fixed number of top-ranked patterns during the generation process.  ... 
doi:10.1145/1344671.1344687 dblp:conf/fpga/DinhCW08 fatcat:baxyb7dazbf3hke3lwjbvebnj4

A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference

Arnab Neelim Mazumder, Jian Meng, Hasib-Al Rashid, Utteja Kallakuri, Xin Zhang, Jae-sun Seo, Tinoosh Mohsenin
2021 IEEE Journal on Emerging and Selected Topics in Circuits and Systems  
In this work, we aim to provide a comprehensive survey about the recent developments in the domain of energy-efficient deployment of DNNs on micro-AI platforms.  ...  The main goal is to allow efficient processing of the DNNs on low-power micro-AI platforms without compromising hardware resources and accuracy.  ...  In feature map reuse, a single feature map is selected and partial sums are generated by processing the single feature with multiple filters.  ... 
doi:10.1109/jetcas.2021.3129415 fatcat:nknpy4eernaeljz2hpqafe7sja

Compression of Convolutional Neural Network for Natural Language Processing

Krzysztof Wróbel, Michał Karwatowski, Maciej Wielgosz, Marcin Pietroń, Kazimierz Wiatr
2020 Computer Science  
The main steps involve pruning and quantization. The process of mapping the compressed network to FPGA and the results of this implementation are described.  ...  Due to CNNs memory and computing requirements, to map them to hardware they need to be compressed.This paper presents the results of compression of the efficient CNNs for sentiment analysis.  ...  the Ministry of Science and Higher Education.  ... 
doi:10.7494/csci.2020.21.1.3375 fatcat:dubp5svjpvh2bpkcj4nlpj2aey

CoCoPIE: Making Mobile AI Sweet As PIE –Compression-Compilation Co-Design Goes a Long Way [article]

Shaoshan Liu, Bin Ren, Xipeng Shen, Yanzhi Wang
2020 arXiv   pre-print
solutions in terms of energy efficiency and/or performance.  ...  on off-the-shelf mobile devices that have been previously regarded possible only with special hardware support; making off-the-shelf mobile devices outperform a number of representative ASIC and FPGA  ...  outperform a number of ASIC and FPGA solutions in performance and energy efficiency.  ... 
arXiv:2003.06700v3 fatcat:5m3tjrdw2nahzpnckpmi6kuvgq

Accelerating Neural Network Inference on FPGA-Based Platforms—A Survey

Ran Wu, Xinmin Guo, Jian Du, Junbao Li
2021 Electronics  
Based on the analysis, we generalize the acceleration strategies into five aspects—computing complexity, computing parallelism, data reuse, pruning and quantization.  ...  Neural network, which is one of representative applications of deep learning, has been widely used and developed many efficient models.  ...  An algorithm combining unstructured pruning and structured pruning is proposed in [85] . A hardware-friendly compact model is generated.  ... 
doi:10.3390/electronics10091025 doaj:92e7eb4228a44c6387f846a1203529d0 fatcat:2xa7dv5hsjbczpvc4w6acdehwu

hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices [article]

Farah Fahim, Benjamin Hawks, Christian Herwig, James Hirschauer, Sergo Jindariani, Nhan Tran, Luca P. Carloni, Giuseppe Di Guglielmo, Philip Harris, Jeffrey Krupa, Dylan Rankin, Manuel Blanco Valentin (+18 others)
2021 arXiv   pre-print
Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery.  ...  Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains.  ...  Here we focus specifically on parameter pruning: the selective removal of weights based on a particular ranking [58, [71] [72] [73] [74] [75] .  ... 
arXiv:2103.05579v3 fatcat:5zsggdpmfng6bnfxrnv72tw7q4

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang, Ligeng Zhu, Song Han
2022 ACM Transactions on Design Automation of Electronic Systems  
To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization.  ...  Therefore, methods and techniques that are able to lift the efficiency bottleneck while preserving the high accuracy of DNNs are in great demand to enable numerous edge AI applications.  ...  It computes a low-rank CP-decomposition of the 4D convolution kernel tensor into a sum of a small number of rank-one tensors.  ... 
doi:10.1145/3486618 fatcat:h6xwv2slo5eklift2fl24usine

Improvements to technology mapping for LUT-based FPGAs

Alan Mishchenko, Satrajit Chatterjee, Robert Brayton
2006 Proceedings of the internation symposium on Field programmable gate arrays - FPGA'06  
subset of cuts for a node and generates other cuts from that subset as needed.  ...  Two cut factorization schemes are presented and a new algorithm is proposed that uses cut factorization for delay oriented mapping for FPGAs with large LUTs. (3) Improved area recovery leads to mappings  ...  The authors are grateful to Jason Cong and Deming Chen for providing the set of pre-optimized benchmarks from [4] , which allowed for a comparison with DAOmap in Table 6 .  ... 
doi:10.1145/1117201.1117208 dblp:conf/fpga/MishchenkoCB06 fatcat:x2pwpj3taza7bbl2llpgga4cvm
« Previous Showing results 1 — 15 out of 223 results