A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2011; you can also visit the original URL.
The file type is application/pdf
.
Filters
A compiler-directed data prefetching scheme for chip multiprocessors
2009
SIGPLAN notices
a compiler-directed data prefetching scheme for shared on-chip cache based CMPs. ...
However, data prefetching in multi-threaded applications running on chip multiprocessors (CMPs) can be problematic when multiple cores compete for a shared on-chip cache (L2 or L3). ...
directed data prefetching scheme that targets shared cache based CMPs. ...
doi:10.1145/1594835.1504208
fatcat:frabzunknjc4ze3mtz3jqb4mia
A compiler-directed data prefetching scheme for chip multiprocessors
2008
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '09
a compiler-directed data prefetching scheme for shared on-chip cache based CMPs. ...
However, data prefetching in multi-threaded applications running on chip multiprocessors (CMPs) can be problematic when multiple cores compete for a shared on-chip cache (L2 or L3). ...
directed data prefetching scheme that targets shared cache based CMPs. ...
doi:10.1145/1504176.1504208
dblp:conf/ppopp/SonKKC09
fatcat:jgx6pat4gbbhhjydzrc7eguqxi
An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors
[chapter]
2008
Lecture Notes in Computer Science
We find that the high level of resource sharing in SMTs results in performance complications, should more than 1 thread be assigned on a single physical processor. ...
Multiprocessors based on simultaneous multithreaded (SMT) or multicore (CMP) processors are continuing to gain a significant share in both highperformance and mainstream computing markets. ...
On the CMP-based multiprocessor the L2 cache miss rate generally appears to be uncorrelated to the exploitation of 1 or 2 execution cores per physical processor. ...
doi:10.1007/978-3-540-68555-5_11
fatcat:5sdth4krs5b3hhnsht24ymz5k4
Scheduling threads for constructive cache sharing on CMPs
2007
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '07
Many multithreaded programs provide opportunities for constructive cache sharing, in which concurrently scheduled threads share a largely overlapping working set. ...
In chip multiprocessors (CMPs), limiting the number of offchip cache misses is crucial for good performance. ...
We run CMP simulation based on the new trace. ...
doi:10.1145/1248377.1248396
dblp:conf/spaa/ChenGKLABFFHMW07
fatcat:7zuvfmkmorbzzdwlmkdl5pmwa4
Fast and fair
2005
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems - CASES '05
In single thread mode, PDAS, on average, improves by 26%, 27%, and 13% over Private, Shared, and NUCA caches respectively. ...
On-chip caches for CMPs must be able to handle the increased demand and contention of multiple cores. ...
ACKNOWLEDGMENTS We would like to thank Bill Mangione-Smith for guidance on this work, Yuval Tamir for constructive feedback, and the anonymous reviewers for providing useful comments on this paper. ...
doi:10.1145/1086297.1086328
dblp:conf/cases/YehR05
fatcat:w66xogpsufbuxecoha3krqgnr4
Cache Topology Aware Mapping of Stream Processing Applications onto CMPs
2013
2013 IEEE 33rd International Conference on Distributed Computing Systems
Our major idea is to map application threads to CPU cores to facilitate data sharing AND mitigate memory resource contention among threads in a holistic manner. ...
Since the performance of stream processing applications largely depends on their effective use of the complex cache structure present on CMPs, this paper proposes the StreamMap approach for tuning streaming ...
On a machine like that shown in Figure 2 (a), if the two threads reside on two cores that share L2 cache (e.g., on core 0 and 1), then the consumer thread may directly read the data from the L2 cache; ...
doi:10.1109/icdcs.2013.13
dblp:conf/icdcs/ZhengVWS13
fatcat:kw4bjdf52rdurfzfe4gxwm6wsm
FlexDCP
2009
ACM SIGOPS Operating Systems Review
This information allows the OS to convert QoS requirements into resource assignments. ...
Our results show that FlexDCP is able to force applications in a workload to run at a certain percentage of their maximum performance in 94% of the cases considered, being on average 1.48% under the objective ...
In particular, FlexDCP focuses on the shared caches as one of the main sources of interaction between threads in CMP architectures. ...
doi:10.1145/1531793.1531806
fatcat:e3mn4d5xwne6ngpfdjeuxqyuiy
Computation spreading
2006
ACM SIGOPS Operating Systems Review
We examine two specific thread assignment policies for CSP, and show that these policies, across four server workloads, are able to reduce instruction misses in private L2 caches by 27-58%, private L2 ...
, while grouping similar computation fragments from different threads together. ...
We also thank our anonymous reviewers for their comments on this paper. ...
doi:10.1145/1168917.1168893
fatcat:uedaf7wegnef3fvilanbobkg6q
Computation spreading
2006
SIGARCH Computer Architecture News
We examine two specific thread assignment policies for CSP, and show that these policies, across four server workloads, are able to reduce instruction misses in private L2 caches by 27-58%, private L2 ...
, while grouping similar computation fragments from different threads together. ...
We also thank our anonymous reviewers for their comments on this paper. ...
doi:10.1145/1168919.1168893
fatcat:cnzzcwz22nfzdjoyqdqvv4gdym
Computation spreading
2006
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems - ASPLOS-XII
We examine two specific thread assignment policies for CSP, and show that these policies, across four server workloads, are able to reduce instruction misses in private L2 caches by 27-58%, private L2 ...
, while grouping similar computation fragments from different threads together. ...
We also thank our anonymous reviewers for their comments on this paper. ...
doi:10.1145/1168857.1168893
dblp:conf/asplos/ChakrabortyWS06
fatcat:pxj4eycxjze7hnqkwxx4roidoe
Computation spreading
2006
SIGPLAN notices
We examine two specific thread assignment policies for CSP, and show that these policies, across four server workloads, are able to reduce instruction misses in private L2 caches by 27-58%, private L2 ...
, while grouping similar computation fragments from different threads together. ...
We also thank our anonymous reviewers for their comments on this paper. ...
doi:10.1145/1168918.1168893
fatcat:qblb7lbmxrb5rlndvdtqtedns4
A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts
2013
IEICE transactions on information and systems
Based on the estimation used for cache partitioning, the thread scheduler decides thread combinations sharing one cache so as to avoid capacity shortage. ...
This paper focuses on two causes of inter-thread cache conflicts. In shared caches of CMPs, cached data fetched by one thread are frequently evicted by another thread. ...
Finally, n/m threads are assigned to each group, which represents a thread combination sharing one cache. ...
doi:10.1587/transinf.e96.d.2047
fatcat:72xazjs2yze7jp7ibiivtgfcau
The Coming Wave of Multithreaded Chip Multiprocessors
2007
International journal of parallel programming
on-chip shared secondary cache allows for more fine-grain parallelism to be effectively exploited by the CMP. ...
We examine two multi-threaded CMPs built using a large number of processor cores: Sun's Niagara and Niagara 2 processors. We also explore the programming issues for CMPs with large number of threads. ...
Each thread group has its own dedicated execution pipeline, although the two thread groups do share access to the data and instruction caches and to the floating-point unit. ...
doi:10.1007/s10766-007-0033-6
fatcat:4gzhbtdumvablcjfy62osfb2g4
CPU Accounting in CMP Processors
2009
IEEE computer architecture letters
Chip-MultiProcessors (CMP) introduce complexities when accounting CPU utilization to processes because the progress done by a process during an interval of time highly depends on the activity of the other ...
We propose a new hardware accounting mechanism to improve the accuracy when measuring the CPU utilization in CMPs and compare it with the previous accounting mechanisms. ...
Benchmarks in the memory group (denoted M ) are those presenting a bad L2 cache behavior (mainly art, equake, lucas, mcf and swim), while benchmarks in the ILP group (denoted I) have a low L2 cache miss ...
doi:10.1109/l-ca.2009.3
fatcat:hgzgkrfop5c3fcqq67fcchbe3i
Achieving Fair or Differentiated Cache Sharing in Power-Constrained Chip Multiprocessors
2010
2010 39th International Conference on Parallel Processing
threads co-scheduled on the CMP are impacted more uniformly. ...
In order to enable chip-level power capping, the peak power consumption of on-chip L2 caches in a CMP often needs to be constrained by dynamically transitioning selected cache banks into low-power modes ...
Acknowledgements We thank Naveen Muralimanohar at HP Labs for providing the source code of SimpleScalar S-NUCA cache implementation and anonymous reviewers for their valuable comments. ...
doi:10.1109/icpp.2010.9
dblp:conf/icpp/WangMW10
fatcat:kygunyahe5f3jc7wxua327z7oq
« Previous
Showing results 1 — 15 out of 974 results