41 Hits in 8.3 sec

Performance evaluation over HW/SW co-design SoC memory transfers for a CNN accelerator [article]

A. Rios-Navarro, R. Tapiador-Morales, A. Jimenez-Fernandez, M. Dominguez-Morales, C. Amaya, A. Linares-Barranco
2018 arXiv   pre-print
In this paper we analyses the performance of exhaustive data transfers between PS and PL for a Xilinx Zynq FPGA in a co-design real scenario for Convolutional Neural Networks (CNN) accelerator, which processes  ...  We present and evaluate several data partitioning techniques to improve the balance between RX and TX transfer and two different ways of transfers management: through a polling routine at the userlevel  ...  In this paper a performance evaluation over a Xilinx PSoC memory transfers is presented and tested for a CNN accelerator application [10] .  ... 
arXiv:1806.01106v1 fatcat:mme5js2gnfgczkrjj2ml5kdzaq


Guanwen Zhong, Akshat Dubey, Cheng Tan, Tulika Mitra
2019 ACM Transactions on Embedded Computing Systems  
In this context, we present Synergy, an automated, hardware-software co-designed, pipelined, high-throughput CNN inference framework on embedded heterogeneous system-on-chip (SoC) architectures (Xilinx  ...  Synergy achieves 7.3X speedup, averaged across seven CNN models, over a well-optimized software-only solution.  ...  Heterogeneous HW/SW Acceleration: Synergy leverages all the compute resources available on a heterogeneous SoC for maximum performance.  ... 
doi:10.1145/3301278 fatcat:hf5vn42mvnaqbktmqfpaajsnrm

Sparse convolutional neural network acceleration with lossless input feature map compression for resource‐constrained systems

Jisu Kwon, Joonho Kong, Arslan Munir
2021 IET Computers & Digital Techniques  
However, the effects of data transfer between main memory and the CNN accelerator have been largely overlooked.  ...  With several design optimization techniques, the authors have implemented their technique in a field-programmable gate array (FPGA) system-on-chip platform and evaluated their technique for six different  ...  We have implemented our HW/SW co-designed CNN accelerator on an FPGA-SoC platform (Xilinx ZCU106).  ... 
doi:10.1049/cdt2.12038 fatcat:e4iuorfhgfaldba63oyneualrm

Multimodal Neural Network Acceleration on a Hybrid CPU-FPGA Architecture: A Case Study

Mehdi Trabelsi Ajili, Yuko Hara-Azumi
2022 IEEE Access  
We implemented the accelerator on two FPGA boards and performed a quantitative evaluation by varying the DPU parameter settings to support our design approach.  ...  We present a field-programmable gate array (FPGA)-based acceleration for DeepSense incorporated into a hardware/software co-design approach to achieve better latency and energy efficiency using the Xilinx  ...  In summary, the contributions of this study are as follows: • We present an FPGA-based acceleration of a time-series multimodal DL framework, DeepSense, in a HW/SW co-design approach on a single SoC to  ... 
doi:10.1109/access.2022.3144977 fatcat:fc4vmltiyzdkjhjqcgn5qs3nje

Accelerating Deep Neural Networks implementation: A survey

Meriam Dhouibi, Ahmed Karim Ben Salem, Afef Saidi, Slim Ben Saoud
2021 IET Computers & Digital Techniques  
However, it is necessary to guarantee the best performance when designing hardware accelerators for DL applications to run at full speed, despite the constraints of low power, high accuracy and throughput  ...  Given that the number of operations and parameters increases with the complexity of the model architecture, the performance will strongly depend on the hardware target resources and basically the memory  ...  Zhang et al. designed and implemented Caffeine [99] , a HW/SW co-designed library which decreased underutilised memory bandwidth.  ... 
doi:10.1049/cdt2.12016 fatcat:3kl4j5ztl5eahmgv7vetu2egay

A scalable and efficient convolutional neural network accelerator using HLS for a System on Chip design [article]

Kim Bjerge, Jonathan Horsted Schougaard, Daniel Ejnar Larsen
2020 arXiv   pre-print
This paper presents a configurable Convolutional Neural Network Accelerator (CNNA) for a System on Chip design (SoC).  ...  The presented CNNA has a scalable architecture which uses High Level Synthesis (HLS) and SystemC for the hardware accelerator.  ...  Acknowledgments We would like to thank Freia Martensen for language and proof reading the article.  ... 
arXiv:2004.13075v2 fatcat:jwiidxknengirfjxbjh6ttt3py

He-P2012: Performance and Energy Exploration of Architecturally Heterogeneous Many-Cores

Francesco Conti, Andrea Marongiu, Chuck Pilkington, Luca Benini
2015 Journal of Signal Processing Systems  
Our work provides three contributions towards a scalable and effective methodology for design space exploration in embedded MC-SoCs.  ...  Second, we propose a novel methodology for the semi-automatic definition and instantiation of shared-memory HWPEs from a C source, supporting both simple and structured data types.  ...  Evolution towards heterogeneity in multi-and manycore SoCs stimulates an evolution also in traditional HW/SW codesign techniques [?] [?] .  ... 
doi:10.1007/s11265-015-1056-7 fatcat:w6kpxdeyurfvnnkjvamsoq4meu

Domain Adaptive Processor Architectures [chapter]

Florian Fricke, Safdar Mahmood, Javier Hoffmann, Muhammad Ali, Keyvan Shahin, Michael Hübner, Diana Göhringer
2020 Technologien für die intelligente Automation  
A novel class of processors which provide more data throughput with a simultaneously tremendously reduced energy consumption are required as a backbone for these "Things".  ...  This paper shows a brief overview of novel processor architectures providing high flexibility to adapt during design-and runtime to changing requirements of the application and the internal and external  ...  [FLO18] proposed a hardware/software co-design technique to explore DPR technique to accelerate Convolution Neural Networks (CNN).  ... 
doi:10.1007/978-3-662-59895-5_23 fatcat:c3rzfowftfh4ncjxsjsgzbpy4a

Deep Neural Network Augmented Wireless Channel Estimation on System on Chip [article]

Syed Asrar ul haq, Abdul Karim Gizzini, Shakti Shrey, Sumit J. Darak, Sneh Saurabh, Marwa Chafii
2022 arXiv   pre-print
Via software-hardware co-design, word-length optimization, and reconfigurable architectures, we demonstrate the superiority of the LSDNN architecture over the LS and LMMSE for a wide range of SNR, number  ...  Further, we evaluate the performance, power, and area (PPA) of the LS and LSDNN application-specific integrated circuit (ASIC) implementations in 45 nm technology.  ...  different Hw/Sw codesign approaches for DNN Augmented LS Estimation Sr.  ... 
arXiv:2209.02213v1 fatcat:wrdkygserzhczohohpa2ykzr5q

A TinyML Platform for On-Device Continual Learning with Quantized Latent Replays [article]

Leonardo Ravaglia, Manuele Rusci, Davide Nadalini, Alessandro Capotondi, Francesco Conti, Luca Benini
2021 arXiv   pre-print
In this work, we introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power (PULP) processor.  ...  On an advanced 22nm prototype of our platform, called VEGA, the proposed solution performs onaverage 65x faster than a low-power STM32 L4 microcontroller, being 37x more energy efficient enough for a lifetime  ...  ACKNOWLEDGEMENT We thank Vincenzo Lomonaco and Lorenzo Pellegrini for the insightful discussions.  ... 
arXiv:2110.10486v1 fatcat:5iiuz42bjfaubmxfbi7wpzcnny

Design, Development and Evaluation of an Intelligent Animal Repelling System for Crop Protection based on Embedded Edge-AI

Davide Adami, Mike O. Ojo, Stefano Giordano
2021 IEEE Access  
In addition, for each HW/SW platform, the experimental study provides a cost/performance analysis, as well as measurements of the average and peak CPU temperature.  ...  (YOLO and Tiny-YOLO) with custom-trained models to identify the most suitable animal recognition HW/SW platform to be integrated with the ultrasound generator.  ...  ACKNOWLEDGMENT The authors thank Alessandro Vaselli for his assistance in the preparation of the software used in this work.  ... 
doi:10.1109/access.2021.3114503 fatcat:dh5mhup4gvcalbkidbt5lpdrji

GeneSys: Enabling Continuous Learning through Neural Network Evolution in Hardware [article]

Ananda Samajdar, Parth Mannan, Kartikay Garg, Tushar Krishna
2018 arXiv   pre-print
To address this need, we present GENESYS, an HW-SW prototype of an EA-based learning system, that comprises a closed loop learning engine called EvE and an inference engine called ADAM.  ...  Modern deep learning systems rely on (a) a hand-tuned neural network topology, (b) massive amounts of labeled training data, and (c) extensive training over large-scale compute resources to build a system  ...  However, these demonstrations have still relied on big compute and memory (challenge #4), which we attempt to solve in this work via clever HW-SW co-design.  ... 
arXiv:1808.01363v2 fatcat:fqizvyhwyzeqpdtwkhg4teikgq

Energy Efficient Computing Systems: Architectures, Abstractions and Modeling to Techniques and Standards [article]

Rajeev Muralidhar and Renata Borovica-Gajic and Rajkumar Buyya
2020 arXiv   pre-print
used for implementing energy efficiency at different levels of the stack, (d) verification techniques used to provide guarantees that the functionality of complex designs are preserved, and (e) energy  ...  This survey aims to bring these domains together and is composed of a systematic categorization of key aspects of building energy efficient systems - (a) specification - the ability to precisely specify  ...  In [85] , the authors talk about HW-SW co-design and verifying energy efficiency features in pre-silicon, and the need for simulating end-to-end use cases in such verification methodologies.  ... 
arXiv:2007.09976v2 fatcat:enrfj2qgerhyteapwykxcb5pni

Embedded Brain Computer Interface: State-of-the-Art in Research

Kais Belwafi, Sofien Gannouni, Hatim Aboalsamh
2021 Sensors  
those for full PCs.  ...  There is a wide area of application that uses cerebral activity to restore capabilities for people with severe motor disabilities, and actually the number of such systems keeps growing.  ...  Hardware/Software Architecture Co-design, HW/SW architecture, is based on the system specification, architectural design, hardware, and software partition.  ... 
doi:10.3390/s21134293 pmid:34201788 fatcat:nr5gptgj5zadzfd365nlruvi44

Machine Learning for Microcontroller-Class Hardware – A Review [article]

Swapnil Sayan Saha, Sandeep Singh Sandha, Mani Srivastava
2022 arXiv   pre-print
Researchers use a specialized model development workflow for resource-limited applications to ensure the compute and latency budget is within the device limits while still maintaining the desired performance  ...  We characterize a closed-loop widely applicable workflow of machine learning model development for microcontroller class devices and show that several classes of applications adopt a specific instance  ...  (iii) Support from HW/SW: Not all microcontrollers and TinyML software suites support or can reap the benefits of quantization of intermediate or sub-byte bitwidth [81] .  ... 
arXiv:2205.14550v3 fatcat:y272riitirhwfgfiotlwv5i7nu
« Previous Showing results 1 — 15 out of 41 results