2,731 Hits in 8.2 sec


Osman S. Unsal, Raksit Ashok, Israel Koren, C. Mani Krishna, Csaba Andras Moritz
2003 ACM Transactions on Embedded Computing Systems  
Our goal is exploring energy savings for embedded/multimedia workloads without sacrificing performance.  ...  Across a wide range of cache and architectural configurations we obtain up to 77% energy savings, while the performance varies from 14% improvement to 4% degradation depending on the application. § ¦¨  ...  [3] discuss an optimal SRAM partitioning scheme for an embedded system-on-a-chip. Panda et al. [25] propose use of a scratchpad memory in embedded processor applications. Kin et al.  ... 
doi:10.1145/860176.860182 fatcat:uan6sea2nnhgrlmvapyy6evuva

Integrating cache coherence protocols for heterogeneous multiprocessor systems. 1

T. Suh, H.-H.S. Lee, D.M. Blough
2004 IEEE Micro  
With advances in lithography, an entire system fits on a single chip, in what the industry calls a system on chip (SoC).  ...  Even though this ever-increasing chip capacity offers embedded-system designers much more flexibility, the requirements of a contemporary embedded system-high performance, low power, real-time constraints  ... 
doi:10.1109/mm.2004.33 fatcat:ltfbpcs4dzd3noxh4443pv7tn4

Thread-Local Scope Caching for Real-time Java

Andy Wellings, Martin Schoeberl
2009 2009 IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing  
We show how to enforce access to the memory area to a single realtime thread. We implement the model on the JOP multiprocessor system and report on our experiences.  ...  In order to promote scalability and obtain predictability, on-chip processor-local private memory subsystems will be used.  ...  For the thread-local scope cache an array, mapped to the on-chip memory address, is requested from a system internal factory.  ... 
doi:10.1109/isorc.2009.13 dblp:conf/isorc/WellingsS09 fatcat:idzxkoihifa4jfxt74pk227gaq

On-chip communication and synchronization mechanisms with cache-integrated network interfaces

Stamatis G. Kavadias, Manolis G.H. Katevenis, Michail Zampetakis, Dimitrios S. Nikolopoulos
2010 Proceedings of the 7th ACM international conference on Computing frontiers - CF '10  
We implemented these mechanisms in a four-core FPGA prototype, and evaluated the on-chip communication performance on the prototype as well as on a CMP simulator with up to 128 cores.  ...  We have designed cache-integrated network interfaces (NIs), appropriate for scalable multicores, that combine the best of two worlds -the flexibility of caches and the efficiency of scratchpad memories  ...  We also thank, for their assistance in designing the architecture and in implementing the prototype: Vassilis Papaefstathiou, Giorgos Kalokairinos, George Nikiforos, Dionisios Pnevmatikatos, Dimitris Nikolopoulos  ... 
doi:10.1145/1787275.1787328 dblp:conf/cf/KavadiasKZN10 fatcat:f2lt5p2fnbctjok5ima7syxtie

Improving cache locking performance of modern embedded systems via the addition of a miss table at the L2 cache level

Abu Asaduzzaman, Fadi N. Sibai, Manira Rani
2010 Journal of systems architecture  
Herein, we introduce a miss table (MT) based cache locking scheme at level-2 (L2) cache to further improve the timing predictability and system performance/power ratio.  ...  The MT holds information of block addresses related to the application being processed which cause most cache misses if not locked.  ...  Also, our proposed MT-based cache locking scheme can be flexibly implemented at either level-1 or level-2 cache.  ... 
doi:10.1016/j.sysarc.2010.02.002 fatcat:ceqhuqyqpzcu7ec7q7x23xv5a4

Location-aware cache management for many-core processors with deep cache hierarchy

Jongsoo Park, Richard M. Yoo, Daya S. Khudia, Christopher J. Hughes, Daehyun Kim
2013 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13  
As cache hierarchies become deeper and the number of cores on a chip increases, managing caches becomes more important for performance and energy.  ...  Our instructions provide a 1.07× speedup and a 1.24× energy efficiency boost, on average, according to simulations on a 64-core system with private L1 and L2 caches.  ...  Van der Wijngaart for discussion during the initial stage of our project.  ... 
doi:10.1145/2503210.2503224 dblp:conf/sc/ParkYKHK13 fatcat:yvtqvwtg3rbnbcfgdbamqq5dy4

Exploring DMA-assisted prefetching strategies for software caches on multicore clusters

Christian Pinto, Luca Benini
2014 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors  
Software prefetching for irregular memory references: • Based on compiler hints Regular memory references: • resolved using direct buffering [1] + performance -flexibility + flexibility -performance  ...  (a); /*some more computation on the same line, further lookups -only hits*/ int a = cache_lookup(address + K, &y); ... } Software caches are great, but… Cache miss!  ... 
doi:10.1109/asap.2014.6868666 dblp:conf/asap/PintoB14 fatcat:dnl52emyqzfpnpj4l22dyq7xt4

Cache-Integrated Network Interfaces: Flexible On-Chip Communication and Synchronization for Large-Scale CMPs

Stamatis Kavadias, Manolis Katevenis, Michail Zampetakis, Dimitrios S. Nikolopoulos
2011 International journal of parallel programming  
We have designed cache-integrated network interfaces, appropriate for scalable multicores, that combine the best of two worlds -the flexibility of caches and the efficiency of scratchpad memories: on-chip  ...  The proposed architecture allows sharing on-chip SRAM, at cache-line granularity, for caching, scratchpad, and network interface (NI) communication functions, all mapped in the application's virtual address  ...  We also thank, for their assistance in designing the architecture and in implementing the prototype: Vassilis Papaefstathiou, Giorgos Kalokairinos, George Nikiforos, Dionisios Pnevmatikatos, Dimitris Nikolopoulos  ... 
doi:10.1007/s10766-011-0173-6 fatcat:7dkwtkt6ivgiplorfjfhvjvfbu

Extending Magny-Cours Cache Coherence

A. Ros, B. Cuesta, R. Fernandez-Pascual, M. E. Gomez, M. E. Acacio, A. Robles, J. M. Garcia, J. Duato
2012 IEEE transactions on computers  
One cost-effective way to meet the increasing demand for larger high-performance shared-memory servers is to build clusters with off-the-shelf processors connected with low-latency point-to-point interconnections  ...  Unfortunately, HyperTransport addressing limitations prevent building systems with more than 8 nodes.  ...  memory system for each run.  ... 
doi:10.1109/tc.2011.65 fatcat:k5hq6i6q2zghzdgorognjhixnq

A Survey of Techniques for Cache Locking

Sparsh Mittal
2016 ACM Transactions on Design Automation of Electronic Systems  
Cache memory, although important for boosting application performance, is also a source of execution time variability, and this makes its use difficult in systems requiring worst case execution time (WCET  ...  Cache locking is a promising approach for simplifying WCET estimation and providing predictability and hence, several commercial processors provide ability for locking cache.  ...  [2012] use CLT for saving energy in µC/OS-II real-time kernel on an ARM7TDMI-based embedded system.  ... 
doi:10.1145/2858792 fatcat:ki7vo5hlqvaurn3e63gpv764fa

Hardware Architectural Support for Caching Partitioned Reconfigurations in Reconfigurable Systems

Juan Antonio Clemente, Ruben Gran, Abel Chocano, Carlos del Prado, Javier Resano
2016 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
When a reconfiguration must be carried out, our controller provides the blocks stored on-chip and looks for the remaining blocks by accessing to the off-chip configuration memory.  ...  This paper presents a hardware implementation of an on-chip configuration memory controller that efficiently manages run-time reconfigurations.  ...  This limits their applicability in modern embedded systems, where the on-chip memory capacities are usually very restricted.  ... 
doi:10.1109/tvlsi.2015.2417595 fatcat:2x2zwebabbcy3hhrt2kc3xsttu

A Survey of Emerging Architectural Techniques for Improving Cache Energy Consumption

Washington Bhebhe, Michael Opoku
2016 Communications on Applied Electronics  
A lot of research effort has been put on finding techniques that can improve the energy efficiency of cache architectures.  ...  in material negative system performance.  ...  The cache locking mechanism is a dynamic one and can decrease the cache miss rate of the system by only locking the appropriate memory blocks.  ... 
doi:10.5120/cae2016652443 fatcat:hvi6m63qaredfeg3dzecvjws2e

Random Fill Cache Architecture

Fangfei Liu, Ruby B. Lee
2014 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture  
We show that our random fill cache does not degrade performance, and in fact, improves the performance for some types of applications.  ...  We propose a novel random fill cache architecture that replaces demand fetch with random cache fill within a configurable neighborhood window.  ...  ACKNOWLEDGMENT We thank the reviewers for their helpful comments. This work was supported in part by DHS/AFRL FA8750-12-2-0295 and NSF CNS-1218817.  ... 
doi:10.1109/micro.2014.28 dblp:conf/micro/LiuL14 fatcat:p6cbk2ad2ratdhrxcmvzwyoop4

A Survey on Cache Management Mechanisms for Real-Time Embedded Systems

Giovani Gracioli, Ahmed Alhammad, Renato Mancuso, Antônio Augusto Fröhlich, Rodolfo Pellizzoni
2015 ACM Computing Surveys  
One of the main factors for unpredictability in a multicore processor is the cache memory hierarchy.  ...  In this article, we present a survey of cache management techniques for real-time embedded systems, from the first studies of the field in 1990 up to the latest research published in 2014.  ...  The authors proposed a genetic algorithm to select the best lines to be locked.  ... 
doi:10.1145/2830555 fatcat:nckhashqprghfnbcaqqu7vk5vi

A low power unified cache architecture providing power and performance flexibility

A. Malik, B. Moyer, D. Cermak
2000 ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514)  
In this paper, we focus on the features of the M340 cache sub-system and illustrate the effect on power and performance through benchmark analysis and actual silicon measurements.  ...  The M•CORE M3 architecture was developed specifically for these embedded applications.  ...  CACHE WRITE MODES The M340 cache sub-system supports two write modes: copyback and writethrough. The M340 cache supports both methods for integration flexibility based on the given application.  ... 
doi:10.1109/lpe.2000.155290 fatcat:dixfp5aznbhybbetn76jpoylwm
« Previous Showing results 1 — 15 out of 2,731 results