Filters








730 Hits in 8.4 sec

Shrinking L1 Instruction Caches to Improve Energy–Delay in SMT Embedded Processors [chapter]

Alexandra Ferrerón-Labari, Marta Ortín-Obón, Darío Suárez-Gracia, Jesús Alastruey-Benedé, Víctor Viñals-Yúfera
2013 Lecture Notes in Computer Science  
Instruction caches are responsible for a high percentage of the chip energy consumption, becoming a critical issue for battery-powered embedded devices.  ...  We can potentially reduce the energy consumption of the first level instruction cache (L1-I) by decreasing its size and associativity.  ...  The authors would like to thank Manolis Katevenis for his suggestions about the transport network.  ... 
doi:10.1007/978-3-642-36424-2_22 fatcat:f57hsdiahjgwzfb6mxpodjuzgq

Exploring MRAM Technologies for Energy Efficient Systems-On-Chip

Sophiane Senni, Lionel Torres, Gilles Sassatelli, Abdoulaye Gamatie, Bruno Mussard
2016 IEEE Journal on Emerging and Selected Topics in Circuits and Systems  
A significant proportion of total power is spent on memory systems due to the increasing trend of embedding volatile memory into systems-on-chip devices.  ...  This paper describes an approach to obtain large, fine-grained exploration of how magnetic memory can be included in the memory hierarchy of processor-based systems by analyzing both performance and energy  ...  ACKNOWLEDGMENT The authors wish to acknowledge all people from ADAC team at LIRMM and people from Crocus technology for their support in this work.  ... 
doi:10.1109/jetcas.2016.2547680 fatcat:pra4mg4qrvamrfm2b5rlqctcqq

Efficient Multilevel Cache Design for Solid State Drive's

Bakhtiar Kasi, Mumraiz Kasi, Riaz UlAmin
2019 Journal of Applied and Emerging Sciences  
We found a significant reduction in number of writes with multi-level caching. The overhead was comparable.  ...  The cache memory has multiple novel features including advanced support for performance monitoring, data pre-fetching, and coherency.  ...  In this paper, we have turned our focus on to the write-cache buffer management system. We have tried to implement a two-level cache buffer management system on top of the FTL layer of a typical SSD.  ... 
doi:10.36785/jaes.91274 fatcat:32cpkolym5giznsa5g6adnidw4

Embedded memory hierarchy exploration based on magnetic RAM

Luis Vitorio Cargnini, Lionel Torres, Raphael Martins Brum, Sophiane Senni, Gilles Sassatelli
2013 2013 IEEE Faible Tension Faible Consommation  
We demonstrate that adopting STT-MRAM in L1 and L2 caches mitigates the impact of higher write latencies and increased current draw due to the use of MRAM.  ...  Through our experiments, we demonstrate that STT-MRAM is a candidate for the memory hierarchy of embedded systems, due to the higher densities and reduced leakage of MRAM.  ...  He also supervised the work previously done using SuperScalar for the "L1 Cache Exploration for a Low Performance System" section, reviewing this entire work.  ... 
doi:10.1109/ftfc.2013.6577780 fatcat:iwjqa6knkvbghgi5y4udyykhk4

Embedded Memory Hierarchy Exploration Based on Magnetic Random Access Memory

Luís Cargnini, Lionel Torres, Raphael Brum, Sophiane Senni, Gilles Sassatelli
2014 Journal of Low Power Electronics and Applications  
We demonstrate that adopting STT-MRAM in L1 and L2 caches mitigates the impact of higher write latencies and increased current draw due to the use of MRAM.  ...  Through our experiments, we demonstrate that STT-MRAM is a candidate for the memory hierarchy of embedded systems, due to the higher densities and reduced leakage of MRAM.  ...  He also supervised the work previously done using SuperScalar for the "L1 Cache Exploration for a Low Performance System" section, reviewing this entire work.  ... 
doi:10.3390/jlpea4030214 fatcat:4cj7oo7s2bcltj6aophgtv2mpu

Efficient Embedded Software Migration towards Clusterized Distributed-Memory Architectures

Rafael Garibotti, Anastasiia Butko, Luciano Ost, Abdoulaye Gamatie, Gilles Sassatelli, Chris Adeniyi-Jones
2016 IEEE transactions on computers  
However with the growing number of cores in modern manycore embedded architectures, they present a bottleneck related to their centralized memory accesses.  ...  It shows how performance, area and energy consumption are significantly improved thanks to the scalability of these architectures.  ...  ACKNOWLEDGMENTS The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under the Mont-Blanc Project: www.montblanc-project.eu  ... 
doi:10.1109/tc.2015.2485202 fatcat:v2lbaqig5zd6zdr3h5afrhit3u

IBM POWER7 multicore server processor

B. Sinharoy, R. Kalla, W. J. Starke, H. Q. Le, R. Cargnoni, J. A. Van Norstrand, B. J. Ronchetti, J. Stuecheli, J. Leenstra, G. L. Guthrie, D. Q. Nguyen, B. Blaner (+3 others)
2011 IBM Journal of Research and Development  
The memory subsystem contains three levels of on-chip cache, with SOI embedded dynamic random access memory (DRAM) devices used as the last level of cache.  ...  A new memory interface using buffered double-data-rate-three DRAM and improvements in reliability, availability, and serviceability are discussed.  ...  Acknowledgments This paper is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002.  ... 
doi:10.1147/jrd.2011.2127330 fatcat:kztcasllyvgs5cuvzyf54myeyy

Exploiting Data Compression for Adaptive Block Placement in Hybrid Caches

Beomjun Kim, Yongtae Kim, Prashant Nair, Seokin Hong
2022 Electronics  
Metadata embedded in the cache block are then extracted and used to determine the block's write intensity when it is fetched from main memory.  ...  In order to store as many write-intensive blocks in the SRAM region as possible in hybrid caches, an intelligent block placement policy is essential.  ...  The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.  ... 
doi:10.3390/electronics11020240 fatcat:e6niqpn6unhrhkz2uivaff7h2i

Texture Filter Memory — a power-efficient and scalable texture memory architecture for mobile graphics processors

B. V. N. Silpa, Anjul Patney, Tushar Krishna, Preeti Ranjan Panda, G. S. Visweswaran
2008 2008 IEEE/ACM International Conference on Computer-Aided Design  
With increasing interest in sophisticated graphics capabilities in mobile systems, energy consumption of graphics hardware is becoming a major design concern in addition to the traditional performance  ...  We argue that a standard cache hierarchy, commonly used by researchers and commercial graphics processors for texture mapping, is wasteful of energy, and propose the Texture Filter Memory, an energy efficient  ...  We have also used CACTI models [22] for estimating the energy of caches and SRAMs in the designs.  ... 
doi:10.1109/iccad.2008.4681631 dblp:conf/iccad/SilpaPKPV08 fatcat:fvoo5jek6rf2xpyn3goilncoh4

Logic filter cache for wide-VDD-range processors

Alen Bardizbanyan, Oskar Andersson, Joachim Rodrigues, Per Larsson-Edefors
2016 2016 IEEE International Conference on Electronics, Circuits and Systems (ICECS)  
We implement a data and instruction filter cache, using logic cells located in the CPU VDD domain, to permit the level-1 (L1) cache to be reliably powered at a higher SRAM VDD.  ...  Wide-VDD-range processors offer high energy efficiency for varying embedded workloads.  ...  , we do not consider the energy of the level shifters in the evaluation in Sec.  ... 
doi:10.1109/icecs.2016.7841211 dblp:conf/icecsys/BardizbanyanARL16 fatcat:s44zq2wq5ram3bzx6iimsq7v4e

A high level implementation and performance evaluation of level-I asynchronous cache on FPGA

Mansi Jhamb, R.K. Sharma, A.K. Gupta
2017 Journal of King Saud University: Computer and Information Sciences  
The implemented architecture comprises of two direct-mapped, write-through caches for data and instruction.  ...  Cache is responsible for a major part of energy consumption (approx. 50%) of processors. This paper presents a high level implementation of a micropipelined asynchronous architecture of L1 cache.  ...  the high level implementation of L1-Cache system based on asynchronous communication oriented design styles.  ... 
doi:10.1016/j.jksuci.2015.06.003 fatcat:mo4is2qi4jg57czzh2prajralu

Memory power optimization of Java-based embedded systems exploiting garbage collection information

Jose Manuel Velasco, David Atienza, Katzalin Olcoz
2012 Journal of systems architecture  
Thus, in this paper we present an exploration, from an energy viewpoint, of the different possibilities of memory hierarchies for high-performance embedded systems when used by state-of-the-art GCs.  ...  which means, in terms of JVM execution, a global reduction of 29% and 17% for energy and cycles, respectively.  ...  For example, the addition of a 64 KB scratchpad to a system with 32 KB of direct mapped L1 for instruction and data produces a 25% reduction in the number of cycles of the collector and a 40% reduction  ... 
doi:10.1016/j.sysarc.2011.11.002 fatcat:afs4ppcmx5bvzfaab47t4u5khy

Memory Considerations for Low Energy Ray Tracing

D. Kopta, K. Shkurko, J. Spjut, E. Brunvand, A. Davis
2014 Computer graphics forum (Print)  
First, we use a streaming data model and configure part of the L2 cache into a ray stream memory to enable efficient data processing through ray reordering.  ...  We propose two hardware mechanisms to decrease energy consumption on massively parallel graphics processors for ray tracing.  ...  Sibenik Cathedral is from Marko Dabrovic, Fairy Forest is from the University of Utah, Crytek Sponza is from Frank Meinl at Crytek and Marko Dabrovic, Conference is from Anat Grynberg and Greg Ward, Dragon  ... 
doi:10.1111/cgf.12458 fatcat:txct2nsoq5fzpnuwsoxbbmdefq

Stack Caching Using Split Data Caches

Carsten Nielsen, Martin Schoeberl
2015 2015 IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops  
In most embedded and general purpose architectures, stack data and non-stack data is cached together, meaning that writing to or loading from the stack may expel non-stack data from the data cache.  ...  The performance of the stack cache architectures was evaluated using the SimpleScalar toolset where the window and prefilling stack cache without tag resulted in an execution speedup of up to 3.5% for  ...  ACKNOWLEDGMENT The work presented in this paper was partially funded by the Danish Council for Independent Research | Technology and Production Sciences under the project RTEMP, contract no. 12-127600.  ... 
doi:10.1109/isorcw.2015.59 dblp:conf/isorc/NielsenS15 fatcat:u6jsz5og6vf2tm7texz6wnbbpe

A Systematic Approach to Reduce the System Bus Load and Power in Multimedia Algorithms

Koen Danckaert, Chidamber Kulkarni, Francky Catthoor, Hugo De Man, Vivek Tiwari
2001 VLSI design (Print)  
Multimedia algorithms deal with enormous amounts of data transfers and storage, resulting in huge bandwidth requirements at the off-chip memory and system bus level.  ...  As a result the related energy consumption becomes critical. Even for execution time the bottleneck can shift from the CPU to the external bus load.  ...  For the cavity detection example, two levels of data reuse can be identified from the above figures which illustrate the loop transformations: line buffers and pixel buffers.  ... 
doi:10.1155/2001/61965 fatcat:hs42ojo2urfhlbwhrgdml7bdka
« Previous Showing results 1 — 15 out of 730 results