A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
An Algorithm-Hardware Co-Design for Bayesian Neural Network Utilizing SOT-MRAM's Inherent Stochasticity
2022
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits
Until now, there are few hardware considerations to address the intensive computation and true random number generation for Bayesian neural network (BayesNN), whose weights are represented by probability ...
The evaluation on the CIFAR-10 dataset suggests that BayesNN could achieve comparable accuracy as conventional deep neural network (DNN) with acceptable hardware overhead but provide much better uncertainty ...
ALGORITHM-HARDWARE CO-DESIGN OVERVIEW A. ...
doi:10.1109/jxcdc.2022.3177588
fatcat:jbeu6uxzcfhtpoqrlhgn6e6nxq
SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network
[article]
2021
arXiv
pre-print
Resistive Random-Access-Memory (ReRAM) crossbar is a promising technique for deep neural network (DNN) accelerators, thanks to its in-memory and in-situ analog computing abilities for Vector-Matrix Multiplication-and-Accumulations ...
Second, we propose a novel weigh mapping mechanism to slice the bits of a weight across the crossbars and splice the activation results in peripheral circuits. ...
We propose an algorithm-hardware co-design framework called SME by novel weight mapping schemes and data path design to squeeze out the bit-wise sparsity. ...
arXiv:2103.01705v1
fatcat:uzoslqgegzewhiuso5pj6sddzq
An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM
[article]
2019
arXiv
pre-print
The high computation and memory storage of large deep neural networks (DNNs) models pose intensive challenges to the conventional Von-Neumann architecture, incurring substantial data movements in the memory ...
Experimental results show that our proposed framework achieves 29.81X (20.88X) weight compression ratio, with 98.38% (96.96%) and 98.29% (97.47%) power and area reduction on VGG-16 (ResNet-18) network ...
We thank all anonymous reviewers for their feedback. ...
arXiv:1908.11691v1
fatcat:krezi3afofdjxfiv5eohaffxay
FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator
[article]
2021
arXiv
pre-print
Instead of trying to represent the positive/negative weights, our key design principle is to enforce exactly what is assumed in the in-situ computation -- ensuring that all weights in the same column of ...
intensive and key computation in DNNs. ...
CONCLUSION We propose FORMS, a fine-grained ReRAM-based DNN accelerator with algorithm/hardware co-design optimizations. ...
arXiv:2106.09144v1
fatcat:qsn6nmh6u5entbk5qjdm6fkcoe
A Survey of Near-Data Processing Architectures for Neural Networks
[article]
2021
arXiv
pre-print
and in/near-memory computation/search engine. ...
, and especially neural network (NN)-based accelerators has grown significantly. ...
it may be important to investigate novel algorithm-hardware
to perform accumulations, and although the technology pro- co-designs ...
arXiv:2112.12630v1
fatcat:drkwrztkazd3hlblxc7i4kgn2a
Roadmap on emerging hardware and technology for machine learning
2020
Nanotechnology
Recent progress in artificial intelligence is largely attributed to the rapid development of machine learning, especially in the algorithm and neural network models. ...
Data-centric computing requires a revolution in hardware systems, since traditional digital computers based on transistors and the von Neumann architecture were not purposely designed for neuromorphic ...
Acknowledgments We thanks Dr Zhongrui Wang, Dr Peng Lin, Dr Can Li for their help in preparing this section of roadmap.
ORCID iDs ...
doi:10.1088/1361-6528/aba70f
pmid:32679577
fatcat:t6me4pfxgfhdvbdqjnyjuksf2e
Closed-Loop Neural Prostheses with On-Chip Intelligence: A Review and A Low-Latency Machine Learning Model for Brain State Detection
[article]
2021
arXiv
pre-print
Next, we review state-of-the-art neural prostheses with on-chip machine learning, focusing on application-specific integrated circuits (ASIC). ...
The application of closed-loop approaches in systems neuroscience and therapeutic stimulation holds great promise for revolutionizing our understanding of the brain and for developing novel neuromodulation ...
techniques, accommodating high channel counts and the need for online learning. • Novel algorithm-hardware co-design techniques for nextgeneration energy-efficient neural prostheses. ...
arXiv:2109.05848v1
fatcat:m6ib42vqpngb5mal4wkkmuydfm
P2M: A Processing-in-Pixel-in-Memory Paradigm for Resource-Constrained TinyML Applications
[article]
2022
arXiv
pre-print
Our solution includes a holistic algorithm-circuit co-design approach and the resulting P2M paradigm can be used as a drop-in replacement for embedding memory-intensive first few layers of convolutional ...
neural network (CNN) models within foundry-manufacturable CMOS image sensor platforms. ...
Acknowledgements We would like to acknowledge the DARPA HR00112190120 award for supporting this work. ...
arXiv:2203.04737v2
fatcat:xcwvrkayljaujeygdcnsstf334
Survey on Near-Data Processing: Applications and Architectures
2021
Journal of Integrated Circuits and Systems
It occurred together with the appearance of 3D-stacked chips with logic and memory stacked layers. ...
One of the main challenges for modern processors is the data transfer between processor and memory. Such data movement implies high latency and high energy consumption. ...
Mondrian [79, 80] implements an algorithm-hardware co-design using the 3Dstacked memory for NDP of data analytics operators, which also apply to the data partitioning and shuffling phase of MapReduce ...
doi:10.29292/jics.v16i2.502
fatcat:3uiswd6z65djpjgvsxclutthxu
Applications and Techniques for Fast Machine Learning in Science
[article]
2021
arXiv
pre-print
In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing ...
training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. ...
of peripheral circuits [576] . ...
arXiv:2110.13041v1
fatcat:cvbo2hmfgfcuxi7abezypw2qrm
FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling
2021
Journal of Signal Processing Systems
AbstractConvolutional neural networks (CNNs) exhibit state-of-the-art performance while performing computer-vision tasks. ...
A different method is used to minimize the input image size, for real-time processing, but it causes a considerable drop in accuracy. ...
Acknowledgements This research is supported in part by the Grants in Aid for Scientistic Research of JSPS, Industry-academia collaborative R&D programs centre of innovation (COI) program, Core Research ...
doi:10.1007/s11265-021-01642-6
fatcat:vfcfmfjvyfahphf6xiuy5zagqu
2022 Roadmap on Neuromorphic Computing and Engineering
[article]
2022
arXiv
pre-print
built-in capabilities to learn or deal with complex data as our brain does. ...
This data transfer is responsible for a large part of the power consumption. The next generation computer technology is expected to solve problems at the exascale with 1018 calculations each second. ...
Concluding Remarks Integrating event-based vision sensing and processing with neuromorphic computation techniques is expected to yield solutions that will be able to penetrate the artificial vision market ...
arXiv:2105.05956v3
fatcat:pqir5infojfpvdzdwgmwdhsdi4
Correlation Power Analysis Attack Resisted Cryptographic RISC-V SoC with Random Dynamic Frequency Scaling Countermeasure
2021
IEEE Access
In these systems, cryptographic accelerators are integrated with processor cores to provide users with the software's flexibility and hardware's high performance. ...
The proposed RDFS countermeasure improved the power analysis resistance while maintaining low-performance overhead and hardware costs by generating more than 219,000 distinct frequencies for driving only ...
In [23] , Benadjila et al. introduced a profiled DL-SCA attack utilizing a convolutional neural network (CNN) that is suitable for attacking highly desynchronized power traces. ...
doi:10.1109/access.2021.3126703
fatcat:nirhzlv2q5hyvaeq3fybs7sffi
Improving Reliability, Security, and Efficiency of Reconfigurable Hardware Systems
[article]
2018
arXiv
pre-print
In the area of reliability, countermeasures against radiation-induced faults and aging effects for long mission times were investigated and applied to SRAM-FPGA-based satellite systems. ...
This technique was applied to the acceleration of SQL queries for large database applications as well as for image and signal processing applications. ...
The research has been carried out in collaboration with several doctoral researchers, master and bachelor students from my research group Reconfigurable Computing. In ...
arXiv:1809.11156v1
fatcat:6ttulp2tancyvds7fk2coxoptq
Advancing Neuromorphic Computing With Loihi: A Survey of Results and Outlook
2021
Proceedings of the IEEE
This is now changing with the advent of Intel's Loihi, a neuromorphic research processor designed to support a broad range of spiking neural networks with sufficient scale, performance, and features to ...
KEYWORDS | Computer architecture; neural network hardware; neuromorphics. I. ...
We believe that these will be well matched for the unique features of neuromorphic architectures and will come in time with ongoing algorithm-hardware codevelopment. ...
doi:10.1109/jproc.2021.3067593
fatcat:krqdmy3u6jdvfl7btjglek5ag4
« Previous
Showing results 1 — 15 out of 25 results