371 Hits in 4.2 sec

ChipNet: Real-Time LiDAR Processing for Drivable Region Segmentation on an FPGA

Yecheng Lyu, Lin Bai, Xinming Huang
2018 IEEE Transactions on Circuits and Systems Part 1: Regular Papers  
This paper presents a field-programmable gate array (FPGA) design of a segmentation algorithm based on convolutional neural network (CNN) that can process light detection and ranging (LiDAR) data in real-time  ...  In this paper, a convolutional neural network model is proposed and trained to perform semantic segmentation using data from the LiDAR sensor.  ...  However, we cannot simply quantize all variables and parameters of a pre-trained neural network from floating-point into fixed-point.  ... 
doi:10.1109/tcsi.2018.2881162 fatcat:zr4fiqp7rfdmnpa5karg47b6fi

FxpNet: Training a deep convolutional neural network in fixed-point representation

Xi Chen, Xiaolin Hu, Hucheng Zhou, Ningyi Xu
2017 2017 International Joint Conference on Neural Networks (IJCNN)  
These primal parameters are ususally represented in full resolution of floating-point values in previous binarzied and quantized neural networks.  ...  neural networks.  ...  ACKNOWLEDGMENT We would like to express our appreciation to Hao Liang, Wenqiang Wang and Fangzhou Liao for their continuous support and valuable feedback.  ... 
doi:10.1109/ijcnn.2017.7966159 dblp:conf/ijcnn/ChenHZX17 fatcat:wd4rataswvc4po76j6yh7wp5ea

Pruning and Quantization for Deep Neural Network Acceleration: A Survey [article]

Tailin Liang, John Glossner, Lei Wang, Shaobo Shi, Xiaotong Zhang
2021 arXiv   pre-print
This paper provides a survey on two types of network compression: pruning and quantization. Pruning can be categorized as static if it is performed offline or dynamic if it is performed at run-time.  ...  We compare current techniques, analyze their strengths and weaknesses, present compressed network accuracy results on a number of frameworks, and provide practical guidance for compressing networks.  ...  This paper focuses primarily on network optimization for convolutional neural networks.  ... 
arXiv:2101.09671v3 fatcat:a34q7ca24zbylmjrddlkt3ggai

hls4ml: deploying deep learning on FPGAs for L1 trigger and Data Acquisition

Nhan Viet Tran, Vladimir Loncar
2019 Zenodo  
We also report on recent progress in the past year on newer neural network architectures and networks with orders of magnitude more parameters.  ...  We present hls4ml, a user-friendly software, based on High-Level Synthesis (HLS), designed to deploy network architectures on FPGAs.  ...  4 (link) -Run inference in 160ns on currently used boards (Virtex 7) 18 Conclusions hls4ml -software package for translation of trained neural networks into synthesizable FPGA firmware -19  ... 
doi:10.5281/zenodo.3598988 fatcat:jaqavik2cjbkfic4q2ydho3dyq

FPGA-Based Hybrid-Type Implementation of Quantized Neural Networks for Remote Sensing Applications

Xin Wei, Wenchao Liu, Lei Chen, Long Ma, He Chen, Yin Zhuang
2019 Sensors  
Recently, extensive convolutional neural network (CNN)-based methods have been used in remote sensing applications, such as object detection and classification, and have achieved significant improvements  ...  Then, a training approach for the quantized network is introduced to reduce accuracy degradation.  ...  Integer/Floating-Point Hybrid-Type Inference Most of the current state-of-the-art convolutional neural networks adopt BN to accelerate training and LeakyRelu Activation to solve the problem of the vanishing  ... 
doi:10.3390/s19040924 fatcat:dze4mvoyjjfs7g45wbkz4gv2rm

Best Practices for the Deployment of Edge Inference: The Conclusions to Start Designing

Georgios Flamis, Stavros Kalapothas, Paris Kitsos
2021 Electronics  
The training phase is executed with the use of 32-bit floating point arithmetic as this is the convenient format for GPU platforms.  ...  The inference phase on the other hand, uses a trained network with new data. The sensitive optimization and back propagation phases are removed and forward propagation is only used.  ...  , with Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN).  ... 
doi:10.3390/electronics10161912 fatcat:3ywb6inqzvbfxb2vjve6ffvmiq

Cheetah: Mixed Low-Precision Hardware Software Co-Design Framework for DNNs on the Edge [article]

Hamed F. Langroudi, Zachariah Carmichael, David Pastuch, Dhireesha Kudithipudi
2019 arXiv   pre-print
Additionally, the framework is amenable for different quantization approaches and supports mixed-precision floating point and fixed-point numerical formats.  ...  Cheetah is evaluated on three datasets: MNIST, Fashion MNIST, and CIFAR-10. Results indicate that 16-bit posits outperform 16-bit floating point in DNN training.  ...  convolution neural networks on various datasets.  ... 
arXiv:1908.02386v1 fatcat:szxgn75itvgplggolli56kop4u

Fast Convolutional Neural Networks in Low Density FPGAs Using Zero-Skipping and Weight Pruning

Mário P. Véstias, Rui Policarpo Duarte, José T. de Sousa, Horácio C. Neto
2019 Electronics  
Deep learning and, in particular, convolutional neural networks (CNN) are more efficient than previous algorithms for several computer vision applications such as security and surveillance, where image  ...  The developed architecture supports the execution of large CNNs in FPGA devices with reduced on-chip memory and computing resources.  ...  The big advantage of binarized neural networks is that they are faster. The work also proposes a framework to map a trained binarized neural network on FPGA.  ... 
doi:10.3390/electronics8111321 fatcat:3dql2oqbs5evnbxc4xz4vdj5ja

Configurable Hardware Core for IoT Object Detection

Pedro R. Miranda, Daniel Pestana, João Daniel Lopes, Rui Policarpo Duarte, Mário P. Véstias, Horácio C. Neto, José T. de de Sousa
2021 Future Internet  
It achieved a performance from 7 to 14 FPS in a low-cost ZYNQ7020 FPGA, depending on the quantization, with an accuracy reduction from 2.1 to 1.4 points of mAP50.  ...  Therefore, it is necessary to provide low-cost, fast solutions for object detection. This work proposes a configurable hardware core on a field-programmable gate array (FPGA) for object detection.  ...  Table 2 presents an accuracy comparison between floating-point and DFP implementations for two known neural networks.  ... 
doi:10.3390/fi13110280 fatcat:qre2qx7k4benbpedx5fivtame4

FPGA-Based Convolutional Neural Network Accelerator with Resource-Optimized Approximate Multiply-Accumulate Unit

Mannhee Cho, Youngmin Kim
2021 Electronics  
Convolutional neural networks (CNNs) are widely used in modern applications for their versatility and high classification accuracy.  ...  Our accelerator model achieves 66% less memory usage and approximately 50% reduced network latency, compared to a floating point design and its resource utilization is optimized to use 78% fewer DSP blocks  ...  Introduction In many modern applications, convolutional neural networks (CNNs) are adopted for image classification based on their high versatility and accuracy.  ... 
doi:10.3390/electronics10222859 fatcat:ie3kqrb5tjhg3cnu7o6bpmlsfm

Efficient Design of Pruned Convolutional Neural Networks on FPGA

Mário Véstias
2020 Journal of Signal Processing Systems  
In this paper, we proposed an architecture for the inference of pruned convolutional neural networks in any density FPGAs.  ...  Convolutional Neural Networks (CNNs) have improved several computer vision applications, like object detection and classification, when compared to other machine learning algorithms.  ...  Convolutional Neural Network A convolutional neural network is a type of deep neural network used for image classification and object recognition [48] .  ... 
doi:10.1007/s11265-020-01606-2 fatcat:qjcknnkjhrhx7fjvk3aloab7kq

CNN implementation in Resource Limited FPGAs - Key Concepts and Techniques

José Rosa, Monica Figueiredo, Luis Bento
2021 Zenodo  
Convolutional Neural Network (CNN) is a type of algorithm used to solve complex problems with a superior performance when compared to traditional computational methods.  ...  Field Programmable Gate Array (FPGA) is a good option for implementing CNN in the edge, since even the lowest cost FPGAs have a good energy efficiency and a sufficient throughput to enable real-time applications  ...  Quantization After training, CNN parameters are traditionally represented with a 32-bit floating point format (FP-32).  ... 
doi:10.5281/zenodo.5080239 fatcat:s2r3zgus7jfdnfupin3dtbmeym

An FPGA-Based Hardware Accelerator for CNNs Using On-Chip Memories Only: Design and Benchmarking with Intel Movidius Neural Compute Stick

Gianmarco Dinelli, Gabriele Meoni, Emilio Rapuano, Gionata Benelli, Luca Fanucci
2019 International Journal of Reconfigurable Computing  
In this article, we propose a full on-chip field-programmable gate array hardware accelerator for a separable convolutional neural network, which was designed for a keyword spotting application.  ...  For such reasons, commercial hardware accelerators have become popular, thanks to their architecture designed for the inference of general convolutional neural network models.  ...  In addition, fixed-point arithmetic requires simpler calculation than floating-point arithmetic, with advantages in terms of complexity and power consumption [23] . e quantization of the original floating-point  ... 
doi:10.1155/2019/7218758 fatcat:dqekhocx6zac5ldx2jhm4zxmsm

A Survey of FPGA-Based Neural Network Accelerator [article]

Kaiyuan Guo, Shulin Zeng, Jincheng Yu, Yu Wang, Huazhong Yang
2018 arXiv   pre-print
On the other hand, FPGA-based neural network inference accelerator is becoming a research topic.  ...  In this paper, we give an overview of previous work on neural network inference accelerators based on FPGA and summarize the main techniques used.  ...  Various network models, like convolutional neural network (CNN), recurrent neural network (RNN), have been proposed for image, video, and speech process.  ... 
arXiv:1712.08934v3 fatcat:vbrf3s27e5gdtcr7uzg3smbpli


Adam Page, Ali Jafari, Colin Shea, Tinoosh Mohsenin
2017 ACM Journal on Emerging Technologies in Computing Systems  
In particular, deep convolutional neural networks have been shown to dominate on several popular public benchmarks such as the ImageNet database.  ...  In the second contribution, we propose SPARCNet, a hardware accelerator for efficient deployment of SPARse Convolutional NETworks.  ...  Quantization Evaluation Initial experiments were performed on the baseline network to investigate the effect of limited precision fixed-point and floating-point formats of the network weights.  ... 
doi:10.1145/3005448 fatcat:quxiy72jtrfipdpeup75mhiizm
« Previous Showing results 1 — 15 out of 371 results