A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2006; you can also visit the original URL.
The file type is application/pdf
.
Filters
Compiler-directed array interleaving for reducing energy in multi-bank memories
Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design
We propose a sourcelevel data space transformation technique called array interleaving that colocates simultaneously used array elements in a small set of memory modules. ...
We validate the effectiveness of this transformation using a set of array-dominated benchmarks and observe significant savings in memory energy. ...
Based on these results, we conclude that array interleaving is very beneficial from an energy viewpoint and should be supported by compilers targeting multi-bank memory systems. ...
doi:10.1109/aspdac.2002.994936
dblp:conf/vlsid/DelaluzKVISK02
fatcat:rhliy7w5ojaz7lk2rixxlkweke
Exploiting Both Pipelining and Data Parallelism with SIMD Reconfigurable Architecture
[chapter]
2012
Lecture Notes in Computer Science
In this paper we introduce SIMD reconfigurable architecture, which allows for SIMD mapping at multiple levels of granularity, and investigate ways to minimize bank conflicts in a SIMD reconfigurable architecture ...
Reconfigurable Architecture (RA), which provides extremely high energy efficiency for certain domains of applications, have one problem that current mapping algorithms for it do not scale well with the ...
In macrocore mapping with interleaving, bank conflicts can be inevitable array elements that are never accessed. This will effectively reduce the number of banks, and increase bank conflicts. ...
doi:10.1007/978-3-642-28365-9_4
fatcat:n2mauiwx65a2vic6wripfqjkm4
Code Transformations for TLB Power Reduction
2010
International journal of parallel programming
In this paper, we propose compiler techniques (specifically, instruction and operand reordering, array interleaving, and loop unrolling) to reduce the page switchings in data accesses. ...
The Translation Look-aside Buffer (TLB) is a very important part in the hardware support for virtual memory management implementation of high performance embedded systems. ...
Compiler based Approaches A compiler-directed array interleaving technique [17] was proposed to save energy in multi-bank memory architectures with power control features. ...
doi:10.1007/s10766-009-0123-8
fatcat:yyfwuk2qvjdg3g55mudtllezqe
Code Transformations for TLB Power Reduction
2009
2009 22nd International Conference on VLSI Design
In this paper, we propose compiler techniques (specifically, instruction and operand reordering, array interleaving, and loop unrolling) to reduce the page switchings in data accesses. ...
The Translation Look-aside Buffer (TLB) is a very important part in the hardware support for virtual memory management implementation of high performance embedded systems. ...
Compiler based Approaches A compiler-directed array interleaving technique [17] was proposed to save energy in multi-bank memory architectures with power control features. ...
doi:10.1109/vlsi.design.2009.39
dblp:conf/vlsid/JeyapaulMS09
fatcat:gcc2rwucjvchdajm2rcqczk5oe
I-cache multi-banking and vertical interleaving
2007
Proceedings of the 17th great lakes symposium on Great lakes symposium on VLSI - GLSVLSI '07
Unlike previous multi-banking and interleaving techniques to increase cache bandwidth, the proposed vertical interleaving further divides memory banks in a cache into vertically arranged sub-banks, which ...
This research investigates the impact of a microarchitectural technique called vertical interleaving in multi-banked caches. ...
In addition to implementing multiple ports for higher bandwidth, cache multi-banking has been used to reduce access time, to reduce energy consumption, or to facilitate aspect ratio fitting [20, 21] . ...
doi:10.1145/1228784.1228794
dblp:conf/glvlsi/Cho07
fatcat:d7mbnjscarepxanuvq7rdunwxa
Automatic data migration for reducing energy consumption in multi-bank memory systems
2002
Proceedings - Design Automation Conference
An architectural solution to reducing memory energy consumption is to adopt a multi-bank memory system instead of a monolithic (single-bank) memory system. ...
Some recent multi-bank memory architectures help reduce memory energy by allowing an unused bank to be placed into a low-power operating mode. ...
In a multi-bank memory architecture, unused memory banks (idle banks) can be disabled, thereby saving energy. ...
doi:10.1145/513918.513973
dblp:conf/dac/LuzKK02
fatcat:wxb2ku2rxzbwjhooxw3qnshefe
Automatic data migration for reducing energy consumption in multi-bank memory systems
2002
Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324)
An architectural solution to reducing memory energy consumption is to adopt a multi-bank memory system instead of a monolithic (single-bank) memory system. ...
Some recent multi-bank memory architectures help reduce memory energy by allowing an unused bank to be placed into a low-power operating mode. ...
In a multi-bank memory architecture, unused memory banks (idle banks) can be disabled, thereby saving energy. ...
doi:10.1109/dac.2002.1012622
fatcat:hvmjx6bfzffcdlkoguv6r725tm
Automatic data migration for reducing energy consumption in multi-bank memory systems
2002
Proceedings - Design Automation Conference
An architectural solution to reducing memory energy consumption is to adopt a multi-bank memory system instead of a monolithic (single-bank) memory system. ...
Some recent multi-bank memory architectures help reduce memory energy by allowing an unused bank to be placed into a low-power operating mode. ...
In a multi-bank memory architecture, unused memory banks (idle banks) can be disabled, thereby saving energy. ...
doi:10.1145/513972.513973
fatcat:7hhssns2z5f4pi7vsuocjxd4ti
A Co-Design Framework with OpenCL Support for Low-Energy Wide SIMD Processor
2014
Journal of Signal Processing Systems
This compiler can analyze the static memory access patterns in OpenCL kernels, generate efficient mappings, and schedule the code to fully utilize the explicit datapath. ...
In this paper, we propose a design framework for a configurable wide SIMD architecture that utilizes an explicit datapath to achieve high energy efficiency. ...
In addition, further changes in the architecture, e.g., clustering PE memory banks to reduce memory energy, also require the adaption of the compiler. ...
doi:10.1007/s11265-014-0957-1
fatcat:bydv4yarnjcuhgmjehzj4r2i4e
Energy-oriented compiler optimizations for partitioned memory architectures
2000
Proceedings of the international conference on Compilers, architectures, and synthesis for embedded systems - CASES '00
This paper presents a compiler-based optimization framework that targets reducing the energy consumption in a partitioned off-chip memory architecture that contains multiple memory banks by organizing ...
The optimizations considered in this work take advantage of low-power operating modes and the partitioned (multi-bank) structure of the off-chip memory. ...
Specifically, we make the following contributions: We summarize the operation of a multi-bank memory system and explain how low-power operating modes can reduce its energy consumption. ...
doi:10.1145/354880.354900
dblp:conf/cases/DelaluzKVI00
fatcat:httjchxxnzhixeprmihjlyoc6y
An Energy-Efficient Integrated Programmable Array Accelerator and Compilation flow for Near-Sensor Ultra-low Power Processing
2018
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
To optimize the performance and energy efficiency, we explore the IPA architecture with special focus on shared memory access, with the help of the flexible compilation flow presented in this paper. ...
In this paper we give a fresh look to Coarse Grained Reconfigurable Arrays (CGRAs) as ultra-low power accelerators for near-sensor processing. ...
We also thank STMicroelectronics for granting access to the FDSOI 28nm technology libraries. ...
doi:10.1109/tcad.2018.2834397
fatcat:hf6fekk4ivdkhncpycvpabsswq
Synergistic Architecture and Programming Model Support for Approximate Micropower Computing
2015
2015 IEEE Computer Society Annual Symposium on VLSI
Energy consumption is a major constraining factor for embedded multi-core systems. Using aggressive voltage scaling can reduce power consumption, but memory operations become unreliable. ...
Several embedded applications exhibit inherent tolerance to computation approximation, for which indicating parts that can tolerate errors has proven a viable way to reduce energy consumption. ...
This allows concurrent access to memory locations mapped on different banks, via a one-cycle-latency logarithmic interconnect implementing word-level interleaving to reduce contention. ...
doi:10.1109/isvlsi.2015.64
dblp:conf/isvlsi/TagliaviniRBM15
fatcat:do3naxkiibh33lxwddhvyhwiwq
Interconnect synthesis of heterogeneous accelerators in a shared memory architecture
2015
2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)
The second layer of interconnect tries to interleave possible conflicting long-burst memory requests for prefetching data from off-chip memory. ...
Furthermore, the performance of an ARA can be improved by 36% -52% with a well-designed interleaved network in a real ARA prototype for medical imaging applications. ...
port index 2: d: the array recording the memory bank demands for all accelerators 3: n: the number of accelerators 4: m: the number of memory banks 5: c: the number of simultaneous powered-on accelerators ...
doi:10.1109/islped.2015.7273540
dblp:conf/islped/ChenC15
fatcat:cnjlygfbtzeihg34ldmeyskmqm
RowClone
2013
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-46
Our results show that RowClone can signi cantly improve both single-core and multi-core system performance, while also signi cantly reducing main memory bandwidth and energy consumption. ...
In this work, we propose RowClone, a new and simple mechanism to perform bulk copy and initialization completely within DRAM -eliminating the need to transfer any data over the memory channel to perform ...
This research was partially supported by NSF (CCF-0953246, CCF-1147397, CCF-1212962), Intel University Research O ce Memory Hierarchy Program, Intel Science and Technology Center for Cloud Computing, and ...
doi:10.1145/2540708.2540725
dblp:conf/micro/SeshadriKFLAPLMGKM13
fatcat:raul5gjizzan3knaladyxquvze
Data access optimization in a processing-in-memory system
2015
Proceedings of the 12th ACM International Conference on Computing Frontiers - CF '15
In this paper, we describe a combination of programming language features, compiler techniques, operating system interfaces, and hardware design that can effectively hide memory latency for the processing ...
The AMC architecture includes general-purpose host processors and specially designed in-memory processors (processing lanes) that would be integrated in a logic layer within 3D DRAM memory. ...
Further, the proximity to memory reduces the latency of memory accesses. • No caches or scratchpad memory for in-memory processors: This design choice saves area and power that would have been spent on ...
doi:10.1145/2742854.2742863
dblp:conf/cf/SuraJCRSBABPON15
fatcat:yelf7omghbc2jmfia7kcv73yeq
« Previous
Showing results 1 — 15 out of 1,085 results