43,377 Hits in 6.6 sec

Modeling performance variation due to cache sharing

A. Sandberg, A. Sembrant, E. Hagersten, D. Black-Schaffer
2013 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)  
Shared cache contention can cause significant variability in the performance of co-running applications from run to run.  ...  This paper introduces a method for efficiently investigating the performance variability due to cache contention.  ...  variations due to shared cache contention on modern hardware by combining a cache sharing model [16] with phase optimizations [19, 25] . • A comparison with previous cache-sharing methods [16] demonstrating  ... 
doi:10.1109/hpca.2013.6522315 dblp:conf/hpca/SandbergSHB13 fatcat:wvbja2ttgrawvfopyxcim4nvau

Understanding the interplay between task scheduling, memory and performance

Germán Ceballos, Erik Hagersten, David Black-Schaffer
2017 Proceedings Companion of the 2017 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity - SPLASH Companion 2017  
TaskInsight provides high-level information that can be correlated with tasks' performance variation over time to understand data reuse through the caches due to scheduling choices.  ...  We demonstrate how TaskInsight can diagnose cases where poor scheduling caused over 60% difference on average (and up to 7x slowdowns) due to changes in the tasks' data reuse through the caches.  ...  Results and Conclusion TaskInsight goes beyond previous work which typically used aggregate metrics to look at overall memory system behavior or ignored the task-level performance variation due to cache  ... 
doi:10.1145/3135932.3135942 dblp:conf/oopsla/CeballosHB17 fatcat:6apytah7ojglzlejtdhxcdamze

Modeling cache coherence overhead with geometric objects [chapter]

R. Kattner, M. Eger, C. Müller-Schloer
1994 Lecture Notes in Computer Science  
Subsequently, we incorporate these parameters (i.e., the cache coherence protocol, the cache coherence block size and the sharing of data inherent in the parallel workload) into a model by using a finite  ...  This model also allows a fast and thorough evaluation of the sharing behavior of parallel applications.  ...  in each cache and due to the active sharing of writable data.  ... 
doi:10.1007/3-540-58430-7_39 fatcat:ndvtot4cizaipcpmmtw4fivqea

Variations in cache behavior

Chris Roadknight, Ian Marshall
1998 Computer networks and ISDN systems  
In order to optimise the design of the cache network this variation needs to be understood.  ...  HTTP cache servers reduce network traffic by storing popular files nearer to the client and have been implemented worldwide. Their reported performance on key metrics such as hit rate varies greatly.  ...  In order to construct accurate models the variation needs to be better understood. Inter-cache differences.  ... 
doi:10.1016/s0169-7552(98)00034-8 fatcat:bn26fgxsd5etbkj7snun6iyddi

Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis

Meng-Ju Wu, Donald Yeung
2012 Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness - MSPC '12  
In today's hierarchies, performance is determined by complicated thread interactions, such as interference in shared caches and replication and communication in private caches.  ...  Understanding multicore memory behavior is crucial, but can be challenging due to the complex cache hierarchies employed in modern CPUs.  ...  Acknowledgment The authors would like to thank the anonymous reviewers for their helpful comments, and Abdel-Hameed Badawy for insightful discussions about the AMAT models and for reviewing the paper.  ... 
doi:10.1145/2247684.2247687 dblp:conf/pldi/WuY12 fatcat:bdtfalx5lnbuxcgqgxupjqadse

Reliability improvement in multicore architectures through computing in embedded memory

Hadi Hajimiri, Somnath Paul, Anandaroop Ghosh, Swarup Bhunia, Prabhat Mishra
2011 2011 IEEE 54th International Midwest Symposium on Circuits and Systems (MWSCAS)  
The private as weU as shared caches are used to perform computation on demand using a lookup table.  ...  When a functional unit fails, temporarily due to temperature induced variations or permanently, the associated computation is transfered to caches.  ...  This is due to the fact that tasks on different cores compete for the shared resources (memory access or shared L2 cache) and as the number of cores increases, the overall performance decreases.  ... 
doi:10.1109/mwscas.2011.6026672 fatcat:pgzjfeh6zfb7vo4r5fceboytt4

Power efficiency for variation-tolerant multicore processors

James Donald, Margaret Martonosi
2006 Proceedings of the 2006 international symposium on Low power electronics and design - ISLPED '06  
We validate our analytical model using Turandot to simulate an 8-core PowerPC TM processor.  ...  This work introduces an analytical approach for ensuring timing reliability while meeting the appropriate performance and power demands in spite of process variation.  ...  ACKNOWLEDGEMENTS We thank the Architecture/Performance Group at Apple and Jonathan Chang for their assistance with various tools.  ... 
doi:10.1145/1165573.1165645 dblp:conf/islped/DonaldM06 fatcat:vfcos6qzwrciho4cgudgu6w3xa

Power Efficiency for Variation-Tolerant Multicore Processors

James Donald, Margaret Martonosi
2006 ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design  
We validate our analytical model using Turandot to simulate an 8-core PowerPC TM processor.  ...  This work introduces an analytical approach for ensuring timing reliability while meeting the appropriate performance and power demands in spite of process variation.  ...  ACKNOWLEDGEMENTS We thank the Architecture/Performance Group at Apple and Jonathan Chang for their assistance with various tools.  ... 
doi:10.1109/lpe.2006.4271854 fatcat:spfo44njvre4rmhfe6urwzicdq

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

Alexandra Ferreron, Dario Suarez-Gracia, Jesus Alastruey-Benede, Teresa Monreal, Victor Vinals
2014 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing  
A microarchitectural technique to cope with cache reliability at ultra-low voltages is block disabling; however, in many cases, the savings in on-chip caches do not compensate for the consumption in the  ...  Understanding the effects of operating below Vdd min requires complex modeling, so we introduce an updated probability failure model of SRAM cells at 22nm and explore the reliability impact of lowering  ...  shared caches.  ... 
doi:10.1109/sbac-pad.2014.12 dblp:conf/sbac-pad/Ferreron-LabariGAAV14 fatcat:cpzzpah3xfae5c7kzkwho3r2ju

An Analytical Performance Model for Co-management of Last-Level Cache and Bandwidth Sharing

Taecheol Oh, Kiyeon Lee, Sangyeun Cho
2011 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems  
Processor cores in a chip multiprocessor (CMP) typically share a large last-level cache and the off-chip memory bandwidth.  ...  This paper develops a hybrid analytical model that takes into account the two partitioning problems together in order to capture their inter-dependence.  ...  The CPIs are normalized to the CPI with 4MB cache size in (a) and to the CPI with slot count 10 in (b). Fig. 3 . 3 Shared resources (cache and off-chip bandwidth) variation on 3dimensional graph.  ... 
doi:10.1109/mascots.2011.17 dblp:conf/mascots/OhLC11 fatcat:fx4sgbcmr5eljfpshi66kvygky


Germán Ceballos, Thomas Grass, Andra Hugo, David Black-Schaffer
2017 Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM'17  
Task-Insight provides high-level, quantitative information that can be correlated with tasks' performance variation over time to understand data reuse through the caches due to scheduling choices.  ...  and shared caches, in single and multi-threaded executions of the same application.  ...  Finally, we show how to extend TaskInsight to explain the e↵ects of shared caches in performance variation, by modeling also spatial data locality of each schedule (Section 5.3).  ... 
doi:10.1145/3026937.3026943 dblp:conf/ppopp/CeballosGHB17 fatcat:izksbvfacvdgzj3kxmlwkwwvp4

Performance and energy trade-offs analysis of L2 on-chip cache architectures for embedded MPSoCs

Mohamed M. Sabry, Martino Ruggiero, Pablo G. Del Valle
2010 Proceedings of the 20th symposium on Great lakes symposium on VLSI - GLSVLSI '10  
Cache architectures that work for high performance computers turn out to be inefficient for embedded systems (mainly due to power-efficiency issues).  ...  performs better than the shared one.  ...  Due to the flexibility and accuracy of the MPARM simulator for MPSoC modeling, we decided it was the perfect candidate to incorporate our L2 cache model.  ... 
doi:10.1145/1785481.1785552 dblp:conf/glvlsi/SabryRV10 fatcat:whq5r6zfmnhojimxs4q3qm5wwq

Performance and Energy Evaluation of Memory Organizations in NoC-Based MPSoCs under Latency and Task Migration [chapter]

Gustavo Girão, Daniel Barcelos, Flávio Rech Wagner
2011 IFIP Advances in Information and Communication Technology  
In addition, the nDMA memory model presents a smaller overhead when compared to the shared memory models and tends to reduce the traffic in the migration process due to the concentration of all memory  ...  Shared and distributed shared memory models are shown to present lower tolerance to high latencies.  ...  The shared memory and distributed shared memory models seem to suffer less from this communication workload variation, although, in absolute numbers, the overall decrease of performance is higher than  ... 
doi:10.1007/978-3-642-23120-9_4 fatcat:ayblacdnsbcfdkofcbe7e2s5yy

SHARP control

Shekhar Srikantaiah, Mahmut Kandemir, Qian Wang
2009 Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture - Micro-42  
While sharing resources like on-chip last level cache is generally beneficial due to increased resource utilization, lack of control over management of these resources can lead to loss of determinism,  ...  Shared resources in a chip multiprocessors (CMPs) pose unique challenges to the seamless adoption of CMPs in virtualization environments and high performance computing systems.  ...  Cache Performance Modeling Partitioned shared cache lends itself to accurate and better modeling as cache behavior of each application can be modeled independently in the absence of inter-thread interferences  ... 
doi:10.1145/1669112.1669177 dblp:conf/micro/SrikantaiahKW09 fatcat:cv4u5vlotfcczhye67lflxjqke

A Cache coherence protocol for MIN-based multiprocessors

Mazin S. Yousif, Chita R. Das, Matthew J. Thazhuthaveetil
1994 Journal of Supercomputing  
Assuming homogeneity of all nodes, a single-node queuing model (phase 3) is developed to analyze system performance.  ...  private to a process and shared-blocks caches (SCache) containing data accessible by all processes.  ...  However, in addition to the overhead incurred due to an SCathe miss, in-SCache shared data modification leads to the cache coherence problem.  ... 
doi:10.1007/bf01204660 fatcat:ari6h2fbhrcoriz6znxzlsjv2a
« Previous Showing results 1 — 15 out of 43,377 results