419 Hits in 6.5 sec

Achieving Predictable Performance with On-Chip Shared L2 Caches for Manycore-Based Real-Time Systems

Sangyeun Cho, Lei Jin, Kiyeon Lee
2007 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2007)  
We observe that both the aspects have to do with where, among many cache slices, a cache block is mapped to, and present an OS-based approach to managing the on-chip L2 cache memory by carefully mapping  ...  This paper focuses on the problem of sharing on-chip caching capacity among multiple programs scheduled together, especially at the L2 cache level.  ...  Accordingly, the focus of this paper is on achieving low performance variability when a program with real-time constraints is running on a manycore processor employing a distributed shared L2 cache.  ... 
doi:10.1109/rtcsa.2007.16 dblp:conf/rtcsa/ChoJL07 fatcat:hymlichla5fn3czxfjztawbulq

Run-time energy management of manycore systems through reconfigurable interconnects

Jie Meng, Chao Chen, Ayse Kivilcim Coskun, Ajay Joshi
2011 Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI - GLSVLSI '11  
The experimental results show that our policy reduces EDP by 49.3% and 23.9% for private L2 cache and 65.5% and 20.6% for distributed L2 cache on average in systems with bus and crossbar, respectively,  ...  The active on-chip network channel width has a direct impact on the cache and memory access latency in manycore processors.  ...  The manycore system uses MESI cache coherence protocol for shared caches.  ... 
doi:10.1145/1973009.1973019 dblp:conf/glvlsi/MengCCJ11 fatcat:y2nlu5v7lzdjfglf26zxk5gde4

McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling

Jung Ho Ahn, Sheng Li, Seongil O, Norman P. Jouppi
2013 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)  
The uncore subsystems have also become unprecedentedly powerful and complex with deeper cache hierarchies, advanced on-chip interconnects, and high-performance memory controllers.  ...  Manycore processors are highly integrated complex system-on-chips with complicated core and uncore subsystems. The core subsystems can consist of a large number of traditional and asymmetric cores.  ...  Each fat core is assumed to have a 2MB L2 cache based on the Nehalem [24] design, while each thin core is assumed to have a 512KB L2 cache based on the Pineview Atom [16] design.  ... 
doi:10.1109/ispass.2013.6557148 dblp:conf/ispass/AhnLSJ13 fatcat:ywqys7o75ndkfjdqal5j4vvu4e


Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, Norman P. Jouppi
2009 Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture - Micro-42  
At the microarchitectural level, McPAT includes models for the fundamental components of a chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, integrated  ...  Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and outof-order manycore designs shows that when  ...  ACKNOWLEDGMENTS The authors would like to thank Victor Zyuban and Shyamkumar Thoziyoor at IBM for answering our questions on circuit implementation and the anonymous reviewers for their constructive comments  ... 
doi:10.1145/1669112.1669172 dblp:conf/micro/LiASBTJ09 fatcat:grtv5brsxzgwxdiqjcdhkfkqwa

The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches

David Tarjan, Kevin Skadron
2010 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis  
Unlike hardware cache coherence, a sharing tracker only needs to track cache lines in the private caches imprecisely, because it is only a performance hint.  ...  The sharing tracker is motivated by but not specific to the GPU and could be used in other manycore organizations.  ...  We would also like to thank the anonymous reviewers for their helpful comments.  ... 
doi:10.1109/sc.2010.54 dblp:conf/sc/TarjanS10 fatcat:owg3nmzkbfbo7hbuqupznpfsvy

Comparing Separate and Statically-Partitioned Caches for Time-Predictable Multicore Processors

Lan Wu, Yiqiang Ding, Wei Zhang
2014 Journal of Computing Science and Engineering  
Current research trends primarily focus on partitioned-cache architectures in order to achieve time predictability for hard real-time multicore based systems, and our experiments reveal that separate caches  ...  actually lead to much better performance and energy efficiency when compared to staticallypartitioned caches, and both of them are adequate for timing analysis for real-time multicore applications.  ...  on developing elegant partitioned schemes of shared caches for hard real-time multicore systems, it is perhaps more meaningful to revisit multicore chips with separate caches in order to achieve time  ... 
doi:10.5626/jcse.2014.8.1.25 fatcat:73x7doozm5gajijqxvwsujauay

Partially Shared Cache and Adaptive Replacement Algorithm for NoC-based Many-core Systems

Pengfei Yang, Quan Wang, Hongwei Ye, Zhiqiang Zhang
2019 Journal of systems architecture  
including the URL of the record and the reason for the withdrawal request.  ...  To increase the cache performance of NoC-based manycore systems, this paper constructs a partially shared cache structure and proposes the core-aware re-reference interval prediction(CA-RRIP) algorithm  ...  the average data transmission time is T 2 , and the time to access data in off-chip memory is T 3 , the access time for system T is shown in Formula 5.  ... 
doi:10.1016/j.sysarc.2019.05.002 fatcat:v2ftf4k3x5ceple2tdcdx44ixm

TPTS: A Novel Framework for Very Fast Manycore Processor Architecture Simulation

Sangyeun Cho, Socrates Demetriades, Shayne Evans, Lei Jin, Hyunjin Lee, Kiyeon Lee, Michael Moeng
2008 2008 37th International Conference on Parallel Processing  
We design and implement tsim, an event-driven manycore processor simulator that models detailed memory hierarchy, interconnect, and coherence protocol models based on the proposed TPTS framework.  ...  This paper proposes and evaluates a fast manycore processor simulation framework called Two-Phase Trace-driven Simulation (TPTS), which splits detailed timing simulation into a trace generation phase and  ...  Authors thank the anonymous reviewers for their constructive comments.  ... 
doi:10.1109/icpp.2008.7 dblp:conf/icpp/ChoDEJLLM08 fatcat:nv3ezljbwfgqjcyr44yb3fsi5a

Can Manycores Support the Memory Requirements of Scientific Applications? [chapter]

Milan Pavlovic, Yoav Etsion, Alex Ramirez
2011 Lecture Notes in Computer Science  
∼100 cores on a single chip, but may become a performance bottleneck for manycores consisting of more than 200 cores.  ...  Manycores are very effective in scaling parallel computational performance. However, it is not clear if current memory technologies can scale to support such highly parallel processors.  ...  Compared with the off-chip bandwidth, the observed L2 cache bandwidth is an order of magnitude higher.  ... 
doi:10.1007/978-3-642-24322-6_7 fatcat:ghrx7in44ncs7ieyerxdyiaav4

Hosting an object heap on manycore hardware

David Ungar, Sam S. Adams
2009 Proceedings of the 5th symposium on Dynamic languages - DLS '09  
In order to construct a test-bed for investigating new programming paradigms for future "manycore" systems (i.e. those with at least a thousand cores), we are building a Smalltalk virtual machine that  ...  attempts to efficiently use a collection of 56-on-chip caches of 64KB each to host a multi-megabyte object heap.  ...  for its passionate advancement and preservation of the original Smalltalk IDE; Leo Ungar for his editing; and Richard Schooler, VP SW Engineering at Tilera, and his team for their excellent support during  ... 
doi:10.1145/1640134.1640149 dblp:conf/dls/UngarA09 fatcat:jzt632nnfzh7jcv7urjibhk3by

The McPAT Framework for Multicore and Manycore Architectures

Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, Norman P. Jouppi
2013 ACM Transactions on Architecture and Code Optimization (TACO)  
At microarchitectural level, McPAT includes models for the fundamental components of a complete chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches  ...  Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks for manycore designs at the 22nm technology shows that 8-core clustering gives the best energy-delay  ...  The total capacity of on-chip L2 caches of the OOO manycore is the same as the in-order manycore. The L2 caches are shared within a cluster and are coherent among different clusters.  ... 
doi:10.1145/2445572.2445577 fatcat:3gid2zqgefd2xhdtdtavvbmw2q

Cache-aware Parallel Programming for Manycore Processors [article]

Ashkan Tousimojarad, Wim Vanderbauwhede
2014 arXiv   pre-print
Instead, we provide the programmer with a novel technique on how to program future Non-Uniform Cache Architecture (NUCA) manycore systems, bearing in mind their caching organisation.  ...  The TILEPro64 is a manycore accelerator, composed of 64 tiles interconnected via multiple 8x8 mesh networks. It contains per-tile caches and supports cache-coherent shared memory by default.  ...  CONCLUSION The use of a distributed shared caching hierarchy is a necessity in emerging manycore systems. Improving the data locality is key to optimising performance.  ... 
arXiv:1403.8006v1 fatcat:cnoxlp3zrjc5rjykzle2bolwou

Hierarchical Dataflow Model for efficient programming of clustered manycore processors

Julien Hascoet, Karol Desnos, Jean-Francois Nezan, Benoit Dupont de Dinechin
2017 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)  
Programming Multiprocessor Systems-on-Chips (MPSoCs) with hundreds of heterogeneous Processing Elements (PEs), complex memory architectures, and Networks-on-Chips (NoCs) remains a challenge for embedded  ...  system designers.  ...  in real-time.  ... 
doi:10.1109/asap.2017.7995270 dblp:conf/asap/HascoetDND17 fatcat:zrl6ffctonhrzixhzhh35nbzdy

Evaluating the Cost of Atomic Operations on Modern Architectures [article]

Hermann Schweizer, Maciej Besta, Torsten Hoefler
2020 arXiv   pre-print
Yet, performance tradeoffs between these operations and various characteristics of such systems, such as the structure of caches, are unclear and have not been thoroughly analyzed.  ...  In this paper we establish an evaluation methodology, develop a performance model, and present a set of detailed benchmarks for latency and bandwidth of different atomics.  ...  We thank Gregorz Kwaśniewski for his help with Xeon Phi.  ... 
arXiv:2010.09852v1 fatcat:s3sejpapknarraqe3optk6g6ym

Characterization and modeling of multicast communication in cache-coherent manycore processors

Sergi Abadal, Raúl Martínez, Josep Solé-Pareta, Eduard Alarcón, Albert Cabellos-Aparicio
2016 Computers & electrical engineering  
The scalability of Network-on-Chip (NoC) designs has become a rising concern as we enter the manycore era.  ...  In Section 2, we provide background on cache coherence and on-chip networking which may further motivate this work and be useful to understand its results.  ...  of cache blocks on a shared write.  ... 
doi:10.1016/j.compeleceng.2015.12.018 fatcat:wbyxjkqw45hxzbicwmwictm3yq
« Previous Showing results 1 — 15 out of 419 results