Filters








802 Hits in 5.9 sec

PAMS: Pattern Aware Memory System for embedded systems

Tassadaq Hussain, Nehir Sonmez, Oscar Palomar, Osman Unsal, Adrian Cristal, Eduard Ayguade, Mateo Valero, S. A. Gursal
2014 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)  
When compared with a Baseline Memory System, the PAMS consumes between 3 and 9 times and 1.13 and 2.66 times less program memory for static and dynamic data structures respectively.  ...  In this paper, we propose a hardware mechanism for embedded multi-core memory system called Pattern Aware Memory System (PAMS).  ...  The concept of Scratch-Pad memory [4] is an important architectural consideration in modern HPC embedded systems, where advanced technologies have made it possible to combine with DRAM.  ... 
doi:10.1109/reconfig.2014.7032544 dblp:conf/reconfig/HussainSPUCAVG14 fatcat:4fytjqvrrzhcnggo4sdnkzueqe

Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

Sumesh Udayakumaran, Rajeev Barua
2003 Proceedings of the international conference on Compilers, architectures and synthesis for embedded systems - CASES '03  
In this research we propose a highly predictable, low overhead and yet dynamic, memory allocation strategy for embedded systems with scratch-pad memory.  ...  run-time checks (iv) has extremely low overheads, and (v) yields 100% predictable memory access times.  ...  ARM968E-S The ARM968E-S [9] , which belongs to ARM9E family, is targeted for embedded real-time applications. Its key characteristic is that it is small and low power.  ... 
doi:10.1145/951746.951747 fatcat:e5ml6sw54nhcdc6pq6nnij5hqa

SODA: A High-Performance DSP Architecture for Software-Defined Radio

Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, Krisztian Flautner
2007 IEEE Micro  
His research interests include architecture and compiler design for embedded and signal processing applications. Lin has a BS in electrical engineering from Cornell University.  ...  Her research interests are in VLSI architectures and algorithms for signal processing and communications and low- Mark  ...  They have shown that scratch-pad memories, instead of cache, are best suited for streaming applications. We find that streaming computation is also suitable for wireless protocols.  ... 
doi:10.1109/mm.2007.22 fatcat:6rnb4gqbknfl3htdgstmrt5xgy

Dynamic allocation for scratch-pad memory using compile-time decisions

Sumesh Udayakumaran, Angel Dominguez, Rajeev Barua
2006 ACM Transactions on Embedded Computing Systems  
In this research we propose a highly predictable, low overhead and yet dynamic, memory allocation strategy for embedded systems with scratch-pad memory.  ...  run-time checks (iv) has extremely low overheads, and (v) yields 100% predictable memory access times.  ...  Given the power, cost, performance and real time advantages of scratch-pad, it is not surprising that scratch-pads are the most common form of SRAM in embedded CPUs today.  ... 
doi:10.1145/1151074.1151085 fatcat:6wqwhzbgkfbjzljrgim7cpod4m

Efficient dynamic heap allocation of scratch-pad memory

Ross McIlroy, Peter Dickman, Joe Sventek
2008 Proceedings of the 7th international symposium on Memory management - ISMM '08  
While there has been promising work in compile time allocation of scratch-pad memory, there will always be applications which require run-time allocation.  ...  Scratch-pad memory provides low latency data storage, like on-chip caches, but under explicit software control.  ...  Acknowledgments We would like to thank Doug Lea for his open source implementation of DLmalloc. We thank the anonymous reviewers for their insightful comments.  ... 
doi:10.1145/1375634.1375640 dblp:conf/iwmm/McIlroyDS08 fatcat:vn27q242kvbbzjkmyb2grcixga

Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression [article]

Zhuang Shao, Xiaoliang Chen, Li Du, Lei Chen, Yuan Du, Wei Zhuang, Huadong Wei, Chenjia Xie, Zhongfeng Wang
2021 arXiv   pre-print
To maintain real-time processing in embedded systems, large on-chip memory is required to buffer the interlayer feature maps.  ...  The on-chip memory allocation scheme is designed to support dynamic configuration of the feature map buffer size and scratch pad size according to different network-layer requirements.  ...  However, it will result in a low memory utilization ratio if the scratch pad size cannot be changed.  ... 
arXiv:2110.06155v1 fatcat:xxwaszuurnavxdxrhz3cxiebqy

Compiling high throughput network processors

Maysam Lavasani, Larry Dennison, Derek Chiou
2012 Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays - FPGA '12  
A 40Gbps version of that network processor was run with an embedded test rig on a Xilinx Virtex-6 FPGA, verifying for performance and correctness.  ...  Gorilla achieves high performance and low power through the use of FPGA-tailored parallelization techniques and application-specific hardwired accelerators, processing engines, and communication mechanisms  ...  The packet processing engines, accelerators, packet splitter and reassembly are written by domain experts in a subset of sequential C, automatically compiled to hardware, and then merged with parameterized  ... 
doi:10.1145/2145694.2145709 dblp:conf/fpga/LavasaniDC12 fatcat:xdotup5uavc6hh5sqkkarttj6a

Aristotle: A performance Impact Indicator for the OpenCL Kernels Using Local Memory

Jianbin Fang, Henk Sips, Ana Lucia Varbanescu
2014 Scientific Programming  
Due to the increasing complexity of multi/many-core architectures (with their mix of caches and scratch-pad memories) and applications (with different memory access patterns), the performance of many workloads  ...  To do so, we systematically describe memory access patterns (MAPs) in an application-agnostic manner.  ...  In [2] , Sumesh Udayakumaran et al. propose a highly predictable, low overhead, and dynamic memory-allocation strategy for embedded systems with scratch pad memory.  ... 
doi:10.1155/2014/623841 fatcat:dxsqk6vjpjh4bmgjnzkb4gwpvy

An energy-efficient coarse grained spatial architecture for convolutional neural networks AlexNet

Boya Zhao, Mingjiang Wang, Ming Liu
2017 IEICE Electronics Express  
In this paper, we propose a CGSA (Coarse Grained Spatial Architecture) which processes different kinds of convolution with high performance and low energy consumption.  ...  We evaluated the architecture by comparing some recent CNN's accelerators.  ...  Using register array instead of scratch pad in PE units reduces data redundancy and the corresponding power consumption. (3).  ... 
doi:10.1587/elex.14.20170595 fatcat:mujgvpftarhaxnmmwrkfhbf2we

Energy savings through compression in embedded Java environments

G. Chen, M. Kandemir, N. Vijaykrishnan, M. J. Irwin, W. Wolf
2002 Proceedings of the tenth international symposium on Hardware/software codesign - CODES '02  
Our results show that compression is effective in reducing energy even when considering the runtime decompression overheads for most applications.  ...  Limited energy and memory resources are important constraints in the design of an embedded system.  ...  This architecture has a CPU core, a scratch-pad memory (SPM), and two main memory modules. The processor in our SoC is a microSPARC-IIep embedded core.  ... 
doi:10.1145/774821.774823 fatcat:4bgpyhj2gjfanlzpbxpnjo6ngi

Energy savings through compression in embedded Java environments

G. Chen, M. Kandemir, N. Vijaykrishnan, M. J. Irwin, W. Wolf
2002 Proceedings of the tenth international symposium on Hardware/software codesign - CODES '02  
Our results show that compression is effective in reducing energy even when considering the runtime decompression overheads for most applications.  ...  Limited energy and memory resources are important constraints in the design of an embedded system.  ...  This architecture has a CPU core, a scratch-pad memory (SPM), and two main memory modules. The processor in our SoC is a microSPARC-IIep embedded core.  ... 
doi:10.1145/774789.774823 dblp:conf/codes/ChenKVIW02 fatcat:oxqclgji3beijibkpjpkgyljru

Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

Sumesh Udayakumaran, Rajeev Barua
2003 Proceedings of the international conference on Compilers, architectures and synthesis for embedded systems - CASES '03  
This paper presents a highly predictable, low overhead and yet dynamic, memory allocation strategy for embedded systems with scratch-pad memory.  ...  requires no run-time checks (iv) has extremely low overheads, and (v) yields 100% predictable memory access times.  ...  Their results are startling: a scratch pad memory has 34% smaller area and 40% lower power consumption than a cache memory of the same capacity.  ... 
doi:10.1145/951710.951747 dblp:conf/cases/UdayakumaranB03 fatcat:2dokuzx4gnfullqxs75q47ramu

Multiprocessor System-on-Chip (MPSoC) Technology

W. Wolf, A.A. Jerraya, G. Martin
2008 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
The multiprocessor system-on-chip (MPSoC) uses multiple CPUs along with other hardware subsystems to implement a system. A wide range of MPSoC architectures have been developed over the past decade.  ...  This paper surveys the history of MPSoCs to argue that they represent an important and distinct category of computer architecture.  ...  Dutta for the helpful discussions of their MPSoCs.  ... 
doi:10.1109/tcad.2008.923415 fatcat:p37pvh5iezfdjd4acepney4zmy

Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim [article]

Farzad Farshchi, Qijing Huang, Heechul Yun
2019 arXiv   pre-print
We then identify that sharing the memory system with the accelerator can result in unpredictable execution time for the real-time tasks running on this platform.  ...  We believe this is an important issue that must be addressed in order for on-chip DNN accelerators to be incorporated in real-time embedded systems.  ...  the capability of running computationally-intensive tasks in real-time on low-cost platforms with limited power and size.  ... 
arXiv:1903.06495v2 fatcat:zzsqp6nfd5cbdln4i6mvtfjfli

Data-reuse exploration under an on-chip memory constraint for low-power FPGA-based systems

Q. Liu, G.A. Constantinides, K. Masselos, P.Y.K. Cheung
2009 IET Computers & Digital Techniques  
Exploiting data reuse can introduce significant power savings, but also introduces the extra requirement for on-chip memory.  ...  memory happens in the code, in order to minimize power consumption for a fixed on-chip memory size.  ...  Cache (hardware-controlled on-chip memory) is suitable to accelerate general applications, where potentially reused data are determined at run-time.  ... 
doi:10.1049/iet-cdt.2008.0039 fatcat:eqpgfd75cbgltcqho26op6d63e
« Previous Showing results 1 — 15 out of 802 results