Filters








974 Hits in 6.3 sec

A compiler-directed data prefetching scheme for chip multiprocessors

Seung Woo Son, Mahmut Kandemir, Mustafa Karakoy, Dhruva Chakrabarti
2009 SIGPLAN notices  
a compiler-directed data prefetching scheme for shared on-chip cache based CMPs.  ...  However, data prefetching in multi-threaded applications running on chip multiprocessors (CMPs) can be problematic when multiple cores compete for a shared on-chip cache (L2 or L3).  ...  directed data prefetching scheme that targets shared cache based CMPs.  ... 
doi:10.1145/1594835.1504208 fatcat:frabzunknjc4ze3mtz3jqb4mia

A compiler-directed data prefetching scheme for chip multiprocessors

Seung Woo Son, Mahmut Kandemir, Mustafa Karakoy, Dhruva Chakrabarti
2008 Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '09  
a compiler-directed data prefetching scheme for shared on-chip cache based CMPs.  ...  However, data prefetching in multi-threaded applications running on chip multiprocessors (CMPs) can be problematic when multiple cores compete for a shared on-chip cache (L2 or L3).  ...  directed data prefetching scheme that targets shared cache based CMPs.  ... 
doi:10.1145/1504176.1504208 dblp:conf/ppopp/SonKKC09 fatcat:jgx6pat4gbbhhjydzrc7eguqxi

An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors [chapter]

Matthew Curtis-Maury, Xiaoning Ding, Christos D. Antonopoulos, Dimitrios S. Nikolopoulos
2008 Lecture Notes in Computer Science  
We find that the high level of resource sharing in SMTs results in performance complications, should more than 1 thread be assigned on a single physical processor.  ...  Multiprocessors based on simultaneous multithreaded (SMT) or multicore (CMP) processors are continuing to gain a significant share in both highperformance and mainstream computing markets.  ...  On the CMP-based multiprocessor the L2 cache miss rate generally appears to be uncorrelated to the exploitation of 1 or 2 execution cores per physical processor.  ... 
doi:10.1007/978-3-540-68555-5_11 fatcat:5sdth4krs5b3hhnsht24ymz5k4

Scheduling threads for constructive cache sharing on CMPs

Shimin Chen, Todd C. Mowry, Chris Wilkerson, Phillip B. Gibbons, Michael Kozuch, Vasileios Liaskovitis, Anastassia Ailamaki, Guy E. Blelloch, Babak Falsafi, Limor Fix, Nikos Hardavellas
2007 Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '07  
Many multithreaded programs provide opportunities for constructive cache sharing, in which concurrently scheduled threads share a largely overlapping working set.  ...  In chip multiprocessors (CMPs), limiting the number of offchip cache misses is crucial for good performance.  ...  We run CMP simulation based on the new trace.  ... 
doi:10.1145/1248377.1248396 dblp:conf/spaa/ChenGKLABFFHMW07 fatcat:7zuvfmkmorbzzdwlmkdl5pmwa4

Fast and fair

Thomas Y. Yeh, Glenn Reinman
2005 Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems - CASES '05  
In single thread mode, PDAS, on average, improves by 26%, 27%, and 13% over Private, Shared, and NUCA caches respectively.  ...  On-chip caches for CMPs must be able to handle the increased demand and contention of multiple cores.  ...  ACKNOWLEDGMENTS We would like to thank Bill Mangione-Smith for guidance on this work, Yuval Tamir for constructive feedback, and the anonymous reviewers for providing useful comments on this paper.  ... 
doi:10.1145/1086297.1086328 dblp:conf/cases/YehR05 fatcat:w66xogpsufbuxecoha3krqgnr4

Cache Topology Aware Mapping of Stream Processing Applications onto CMPs

Fang Zheng, Chitra Venkatramani, Rohit Wagle, Karsten Schwan
2013 2013 IEEE 33rd International Conference on Distributed Computing Systems  
Our major idea is to map application threads to CPU cores to facilitate data sharing AND mitigate memory resource contention among threads in a holistic manner.  ...  Since the performance of stream processing applications largely depends on their effective use of the complex cache structure present on CMPs, this paper proposes the StreamMap approach for tuning streaming  ...  On a machine like that shown in Figure 2 (a), if the two threads reside on two cores that share L2 cache (e.g., on core 0 and 1), then the consumer thread may directly read the data from the L2 cache;  ... 
doi:10.1109/icdcs.2013.13 dblp:conf/icdcs/ZhengVWS13 fatcat:kw4bjdf52rdurfzfe4gxwm6wsm

FlexDCP

Miquel Moreto, Francisco J. Cazorla, Alex Ramirez, Rizos Sakellariou, Mateo Valero
2009 ACM SIGOPS Operating Systems Review  
This information allows the OS to convert QoS requirements into resource assignments.  ...  Our results show that FlexDCP is able to force applications in a workload to run at a certain percentage of their maximum performance in 94% of the cases considered, being on average 1.48% under the objective  ...  In particular, FlexDCP focuses on the shared caches as one of the main sources of interaction between threads in CMP architectures.  ... 
doi:10.1145/1531793.1531806 fatcat:e3mn4d5xwne6ngpfdjeuxqyuiy

Computation spreading

Koushik Chakraborty, Philip M. Wells, Gurindar S. Sohi
2006 ACM SIGOPS Operating Systems Review  
We examine two specific thread assignment policies for CSP, and show that these policies, across four server workloads, are able to reduce instruction misses in private L2 caches by 27-58%, private L2  ...  , while grouping similar computation fragments from different threads together.  ...  We also thank our anonymous reviewers for their comments on this paper.  ... 
doi:10.1145/1168917.1168893 fatcat:uedaf7wegnef3fvilanbobkg6q

Computation spreading

Koushik Chakraborty, Philip M. Wells, Gurindar S. Sohi
2006 SIGARCH Computer Architecture News  
We examine two specific thread assignment policies for CSP, and show that these policies, across four server workloads, are able to reduce instruction misses in private L2 caches by 27-58%, private L2  ...  , while grouping similar computation fragments from different threads together.  ...  We also thank our anonymous reviewers for their comments on this paper.  ... 
doi:10.1145/1168919.1168893 fatcat:cnzzcwz22nfzdjoyqdqvv4gdym

Computation spreading

Koushik Chakraborty, Philip M. Wells, Gurindar S. Sohi
2006 Proceedings of the 12th international conference on Architectural support for programming languages and operating systems - ASPLOS-XII  
We examine two specific thread assignment policies for CSP, and show that these policies, across four server workloads, are able to reduce instruction misses in private L2 caches by 27-58%, private L2  ...  , while grouping similar computation fragments from different threads together.  ...  We also thank our anonymous reviewers for their comments on this paper.  ... 
doi:10.1145/1168857.1168893 dblp:conf/asplos/ChakrabortyWS06 fatcat:pxj4eycxjze7hnqkwxx4roidoe

Computation spreading

Koushik Chakraborty, Philip M. Wells, Gurindar S. Sohi
2006 SIGPLAN notices  
We examine two specific thread assignment policies for CSP, and show that these policies, across four server workloads, are able to reduce instruction misses in private L2 caches by 27-58%, private L2  ...  , while grouping similar computation fragments from different threads together.  ...  We also thank our anonymous reviewers for their comments on this paper.  ... 
doi:10.1145/1168918.1168893 fatcat:qblb7lbmxrb5rlndvdtqtedns4

A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts

Masayuki SATO, Ryusuke EGAWA, Hiroyuki TAKIZAWA, Hiroaki KOBAYASHI
2013 IEICE transactions on information and systems  
Based on the estimation used for cache partitioning, the thread scheduler decides thread combinations sharing one cache so as to avoid capacity shortage.  ...  This paper focuses on two causes of inter-thread cache conflicts. In shared caches of CMPs, cached data fetched by one thread are frequently evicted by another thread.  ...  Finally, n/m threads are assigned to each group, which represents a thread combination sharing one cache.  ... 
doi:10.1587/transinf.e96.d.2047 fatcat:72xazjs2yze7jp7ibiivtgfcau

The Coming Wave of Multithreaded Chip Multiprocessors

James Laudon, Lawrence Spracklen
2007 International journal of parallel programming  
on-chip shared secondary cache allows for more fine-grain parallelism to be effectively exploited by the CMP.  ...  We examine two multi-threaded CMPs built using a large number of processor cores: Sun's Niagara and Niagara 2 processors. We also explore the programming issues for CMPs with large number of threads.  ...  Each thread group has its own dedicated execution pipeline, although the two thread groups do share access to the data and instruction caches and to the floating-point unit.  ... 
doi:10.1007/s10766-007-0033-6 fatcat:4gzhbtdumvablcjfy62osfb2g4

CPU Accounting in CMP Processors

C. Luque, M. Moreto, F.J. Cazorla, R. Gioiosa, A. Buyuktosunoglu, M. Valero
2009 IEEE computer architecture letters  
Chip-MultiProcessors (CMP) introduce complexities when accounting CPU utilization to processes because the progress done by a process during an interval of time highly depends on the activity of the other  ...  We propose a new hardware accounting mechanism to improve the accuracy when measuring the CPU utilization in CMPs and compare it with the previous accounting mechanisms.  ...  Benchmarks in the memory group (denoted M ) are those presenting a bad L2 cache behavior (mainly art, equake, lucas, mcf and swim), while benchmarks in the ILP group (denoted I) have a low L2 cache miss  ... 
doi:10.1109/l-ca.2009.3 fatcat:hgzgkrfop5c3fcqq67fcchbe3i

Achieving Fair or Differentiated Cache Sharing in Power-Constrained Chip Multiprocessors

Xiaorui Wang, Kai Ma, Yefu Wang
2010 2010 39th International Conference on Parallel Processing  
threads co-scheduled on the CMP are impacted more uniformly.  ...  In order to enable chip-level power capping, the peak power consumption of on-chip L2 caches in a CMP often needs to be constrained by dynamically transitioning selected cache banks into low-power modes  ...  Acknowledgements We thank Naveen Muralimanohar at HP Labs for providing the source code of SimpleScalar S-NUCA cache implementation and anonymous reviewers for their valuable comments.  ... 
doi:10.1109/icpp.2010.9 dblp:conf/icpp/WangMW10 fatcat:kygunyahe5f3jc7wxua327z7oq
« Previous Showing results 1 — 15 out of 974 results