Filters








117 Hits in 2.5 sec

Optimal Footprint Symbiosis in Shared Cache

Xiaolin Wang, Yechen Li, Yingwei Luo, Xiameng Hu, Jacob Brock, Chen Ding, Zhenlin Wang
2015 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing  
FOOTPRINT-BASED OPTIMAL SYMBIOSIS In this section, we introduce the footprint theory, which we use to compute shared-cache locality.  ...  Then we formalize the linearity assumption, which relates shared-cache locality to shared-cache performance. Finally we describe the symbiotic optimization. A.  ...  We conclude that ABF sampling and miss-ratio minimization gives a costeffective solution for optimal program symbiosis in shared cache.  ... 
doi:10.1109/ccgrid.2015.153 dblp:conf/ccgrid/WangLLHBDW15 fatcat:nyjkfts2ijdrboj25dy56cqfqe

Optimal Cache Partition-Sharing

Jacob Brock, Chencheng Ye, Chen Ding, Yechen Li, Xiaolin Wang, Yingwei Luo
2015 2015 44th International Conference on Parallel Processing  
Finally, the paper evaluates the effect of optimal cache sharing and compares it with conventional solutions for thousands of 4-program co-run groups, with nearly 180 million different ways to share the  ...  When a cache is shared by multiple cores, its space may be allocated either by sharing, partitioning, or both. We call the last case partition-sharing.  ...  Independent validation can be found in the use of the footprint theory in optimal program symbiosis in shared cache [12] , optimal memory allocation in Memcached [2] , and a study from the OS community  ... 
doi:10.1109/icpp.2015.84 dblp:conf/icpp/BrockYDLWL15 fatcat:cp632yw4kjaszjqlpn5tm6ofkq

Reducing Shared Cache Misses via dynamic Grouping and Scheduling on Multicores

Wael Amr, Hany Mohamed, Ihab ElSayed
2014 International Journal of Advanced Computer Science and Applications  
However, this performance can't be exploited well due to the high miss rate in the second level shared cache among the cores which represents one of the multicore's challenges.  ...  This paper addresses the dynamic co-scheduling of tasks in multicore real-time systems.  ...  Atta for his help in completing the presented work.  ... 
doi:10.14569/ijacsa.2014.050920 fatcat:b3mwjgrzp5gjnilfmj3ri3xaja

Kinetic Modeling of Data Eviction in Cache

Xiameng Hu, Xiaolin Wang, Lan Zhou, Yingwei Luo, Chen Ding, Zhenlin Wang
2016 USENIX Annual Technical Conference  
The reuse distance (LRU stack distance) is an essential metric for performance prediction and optimization of storage and CPU cache.  ...  Furthermore, AET is a composable model that can characterize shared cache behavior through modeling individual programs.  ...  We have developed and evaluated shared cache program symbiosis, which used ABF sampling and footprint composition to co-locate co-run applications to minimize their interference in shared cache [20] .  ... 
dblp:conf/usenix/HuWZLDW16 fatcat:kkgr4ljbevb3dblxekuu6elcqa

LAMA: Optimized Locality-aware Memory Allocation for Key-value Cache

Xiameng Hu, Xiaolin Wang, Yechen Li, Lan Zhou, Yingwei Luo, Chen Ding, Song Jiang, Zhenlin Wang
2015 USENIX Annual Technical Conference  
The in-memory cache system is a performance-critical layer in today's web server architecture. Memcached is one of the most effective, representative, and prevalent among such systems.  ...  The new solution is close to optimal, achieving over 98% of the theoretical potential.  ...  Later validation includes optimal program symbiosis in shared cache [41] and a study on server cache performance prediction [42] .  ... 
dblp:conf/usenix/HuWLZLDJW15 fatcat:ycaehmyrnfaftnopg53kshjahe

Exploiting inter-thread temporal locality for chip multithreading

Jiayuan Meng, Jeremy W Sheaffer, Kevin Skadron
2010 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)  
to improve inter-thread cache sharing.  ...  Threads on a core usually share a single first-level data cache, so thread schedulers must try to minimize cache contention among threads.  ...  ACKNOWLEDGEMENTS This work was supported in part by SRC grant No. 1607, NSF grant nos. IIS-0612049 and CNS-0615277, and a grant from Intel Research.  ... 
doi:10.1109/ipdps.2010.5470465 dblp:conf/ipps/MengSS10 fatcat:6b33ba2lmzcnzjo24mlogswgza

Oversubscription on multicore processors

Costin Iancu, Steven Hofmeyr, Filip Blagojevic, Yili Zheng
2010 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)  
Rather than "resource" symbiosis, our results indicate that the determining behavioral factor when applications share a system is the granularity of the synchronization operations.  ...  Our results indicate that oversubscription provides beneficial effects for applications running in competitive environments.  ...  Fedorova et al [8] present an OS scheduler able to improve the cache symbiosis of multiprogrammed workloads.  ... 
doi:10.1109/ipdps.2010.5470434 dblp:conf/ipps/IancuHBZ10 fatcat:vomcoi2h7fg43cnu4jxyhqvjbe

Improving IBM POWER8 Performance Through Symbiotic Job Scheduling

Josue Feliu, Stijn Eyerman, Julio Sahuquillo, Salvador Petit, Lieven Eeckhout
2017 IEEE Transactions on Parallel and Distributed Systems  
The proposed models achieve higher accuracy than previous models by predicting job symbiosis from throttled CPI stacks, i.e., CPI stacks of the applications when running in the same SMT mode to consider  ...  SMT cores share most of their microarchitectural components among the co-running applications, which causes performance interference between them.  ...  This work was supported in part by the Spanish  ... 
doi:10.1109/tpds.2017.2691708 fatcat:wjjfycjmv5bb3ajnauy5juemk4

VESPA: VIPT Enhancements for Superpage Accesses [article]

Mayank Parasar, Abhishek Bhattacharjee, Tushar Krishna
2017 arXiv   pre-print
We propose VIPT Enhancements for SuperPage Accesses or VESPA in response.  ...  L1 caches are critical to the performance of modern computer systems.  ...  In this work, we identify the opportunity presented by superpages in virtual memory systems today, to optimize current VIPT L1 caches.  ... 
arXiv:1701.03499v2 fatcat:txbymqphprcdfc2f2a4zpfd55i

Compatible phase co-scheduling on a CMP of multi-threaded processors

A. El-Moursy, R. Garg, D.H. Albonesi, S. Dwarkadas
2006 Proceedings 20th IEEE International Parallel & Distributed Processing Symposium  
In such an environment, the co-scheduling of phases from different threads plays a significant role in the overall throughput.  ...  In this paper, we devise phase co-scheduling policies for a dual-core CMP of dual-threaded SMT processors.  ...  The subsequent optimization phase uses this information to determine the symbiosis among the threads in order to make the best scheduling decision.  ... 
doi:10.1109/ipdps.2006.1639376 dblp:conf/ipps/El-MoursyGAD06 fatcat:tysu5tinmzb3rc25fxiyv3czum

Characterizing the resource-sharing levels in the UltraSPARC T2 processor

Vladimir Čakarević, Petar Radojković, Javier Verdú, Alex Pajuelo, Francisco J. Cazorla, Mario Nemirovsky, Mateo Valero
2009 Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture - Micro-42  
Commonly, processor designs provide two levels of resource sharing: Inter-core in which only the highest levels of the cache hierarchy are shared, and Intra-core in which most of the hardware resources  ...  In this work, we provide the first characterization of a three-level resource sharing processor, the UltraSPARC T2, and we show how multi-level resource sharing affects the operating system design.  ...  Other works, though, propose techniques to co-schedule threads that exhibit a good symbiosis in the shared cache levels and solve problems of cache contention [12, 6, 9] .  ... 
doi:10.1145/1669112.1669173 dblp:conf/micro/CakarevicRVPCNV09 fatcat:xephoimwgnbp3azmokxs65o5ta

SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers

Yunqi Zhang, Michael A. Laurenzano, Jason Mars, Lingjia Tang
2014 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture  
For SMT servers, the interference on different shared resources, including private caches, memory ports, as well as integer and floating-point functional units, do not correlate with each other.  ...  In this paper, we demonstrate through a real-system investigation that the fundamental difference between resource sharing behaviors on CMP and SMT architectures calls for a redesign of the way we model  ...  data_chunk[RAND % FOOTPRINT]++;! ! ! ……! ! ! data_chunk[RAND % FOOTPRINT]++;! ! } (e) MEM (L1, L2 Cache) ……! ! first_chunk = data_chunk;! ! second_chunk = data_chunk + FOOTPRINT / 2;! ! while (1) {!  ... 
doi:10.1109/micro.2014.53 dblp:conf/micro/ZhangLMT14 fatcat:fyx7gxw7jfcxjktrm7q3rjn7me

The shared-thread multiprocessor

Jeffery A. Brown, Dean M. Tullsen
2008 Proceedings of the 22nd annual international conference on Supercomputing - ICS '08  
This shared thread state allows the system to schedule threads from a shared pool onto individual cores, allowing for rapid movement of threads between cores.  ...  This paper describes initial results for an architecture called the Shared-Thread Multiprocessor (STMP).  ...  This research was supported in part by NSF grant CCF-0702349 and Semiconductor Research Corporation Grant 2005-HJ-1313.  ... 
doi:10.1145/1375527.1375541 dblp:conf/ics/BrownT08 fatcat:whmgnelnu5hczkhhls4q2eqgte

Computation spreading

Koushik Chakraborty, Philip M. Wells, Gurindar S. Sohi
2006 Proceedings of the 12th international conference on Architectural support for programming languages and operating systems - ASPLOS-XII  
We examine two specific thread assignment policies for CSP, and show that these policies, across four server workloads, are able to reduce instruction misses in private L2 caches by 27-58%, private L2  ...  ., in our server workloads, 45-65% of all instruction blocks are accessed by all processors).  ...  Acknowledgments This work was supported in part by National Science Foundation grants CCR-0311572 and EIA-0071924, and by donations from Intel Corporation.  ... 
doi:10.1145/1168857.1168893 dblp:conf/asplos/ChakrabortyWS06 fatcat:pxj4eycxjze7hnqkwxx4roidoe

Computation spreading

Koushik Chakraborty, Philip M. Wells, Gurindar S. Sohi
2006 ACM SIGOPS Operating Systems Review  
We examine two specific thread assignment policies for CSP, and show that these policies, across four server workloads, are able to reduce instruction misses in private L2 caches by 27-58%, private L2  ...  ., in our server workloads, 45-65% of all instruction blocks are accessed by all processors).  ...  Acknowledgments This work was supported in part by National Science Foundation grants CCR-0311572 and EIA-0071924, and by donations from Intel Corporation.  ... 
doi:10.1145/1168917.1168893 fatcat:uedaf7wegnef3fvilanbobkg6q
« Previous Showing results 1 — 15 out of 117 results