A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2010; you can also visit the original URL.
The file type is application/pdf
.
Filters
Enabling software management for multicore caches with a lightweight hardware support
2009
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09
In order to turn cache partitioning methods into reality in the management of multicore processors, we propose to provide an affordable and lightweight hardware support to coordinate with OS-based cache ...
The management of shared caches in multicore processors is a critical and challenging task. Many hardware and OS-based methods have been proposed. ...
This research was supported in part by the National Science Foundation under grants CNS-0834476, CCF-0514085, CNS-0834393, and CCF-0913050. ...
doi:10.1145/1654059.1654074
dblp:conf/sc/LinLDZZS09
fatcat:g5grixuigre55o2krwerkovrui
Our metric provides net processor cycles saved because of prefetching by approximating the cycles saved across the memory subsystem, from last-level cache to DRAM. ...
CAFFEINE uses CAFFEINATION when the prefetcher-caused interference is tolerable (we define in Section 3.1) and it uses DE-CAFFEINATION when the prefetcher-caused interference is intolerable. ...
ACKNOWLEDGMENTS The authors would like to thank Dr. Rupesh Nasre, Dr. Madhu Mutyam, and Prof. R. Govindarajan for their valuable comments. ...
doi:10.1145/2806891
fatcat:fzcf6ngcpfa4jktirp2eac5qua
Memory management in NUMA multicore systems
2011
Proceedings of the international symposium on Memory management - ISMM '11
N-MASS is fine-tuned to support memory management on NUMA-multicores and improves performance up to 32%, and 7% on average, over the default setup in current Linux implementations. ...
As the cores of a processor share a common cache, the issues of memory management and process mapping must be revisited. ...
The N-MASS scheme described in this paper successfully combines memory management and process scheduling to better exploit the potential of NUMA-multicore processors. ...
doi:10.1145/1993478.1993481
dblp:conf/iwmm/MajoG11
fatcat:qoftbiu4zrgj3ork2sfrquiupy
Memory management in NUMA multicore systems
2011
SIGPLAN notices
N-MASS is fine-tuned to support memory management on NUMA-multicores and improves performance up to 32%, and 7% on average, over the default setup in current Linux implementations. ...
As the cores of a processor share a common cache, the issues of memory management and process mapping must be revisited. ...
The N-MASS scheme described in this paper successfully combines memory management and process scheduling to better exploit the potential of NUMA-multicore processors. ...
doi:10.1145/2076022.1993481
fatcat:czau3i5xsjdmdh4mkogl5adpfy
A generic and compositional framework for multicore response time analysis
2015
Proceedings of the 23rd International Conference on Real Time and Networks Systems - RTNS '15
The MRTA framework provides a general approach to timing verification for multicore systems that is parametric in the hardware configuration and so can be used at the architectural design stage to compare ...
In this paper, we introduce a Multicore Response Time Analysis (MRTA) framework. ...
Acknowledgements This work was supported in part by the COST Action IC1202 TACLe, by the DFG as part of the Transregional Collaborative Research Centre SFB/TR 14 (AVACS), by National Funds through FCT/ ...
doi:10.1145/2834848.2834862
dblp:conf/rtns/AltmeyerDIMNR15
fatcat:4ad3vtbawjer7otj44q4dbotqe
Energy Discounted Computing on Multicore Smartphones
2016
USENIX Annual Technical Conference
In addition, we use available ARM performance counters to identify co-run resource contention on the multicore processor and throttle best-effort task when it interferes with interactivity. ...
Experimental results on a multicore smartphone show that we can reach up to 63% energy discount in the best-effort task processing with little performance impact on the interactive applications. ...
We also thank the anonymous USENIX ATC reviewers and our shepherd Rodrigo Fonseca for comments that helped improve this paper. ...
dblp:conf/usenix/ZhuS16
fatcat:crtgvu6jtvhfhbsh36jsgbxxvy
Resource management for isolation enhanced cloud services
2009
Proceedings of the 2009 ACM workshop on Cloud computing security - CCSW '09
Experimental results demonstrate that these approaches are effective in isolating cache interference impacts a VM may have on another VM. ...
We identify last level cache (LLC) sharing as one of the impediments to finer grain isolation required by a service, and advocate two resource management approaches to provide performance and security ...
This multicore trend is expected to continue in the future. Shared caches are commonly used in such multicore architectures. ...
doi:10.1145/1655008.1655019
dblp:conf/ccs/RajNSE09
fatcat:x5xgbgvtr5ai5hs633peljebka
Designing lab sessions focusing on real processors for computer architecture courses: A practical perspective
2018
Journal of Parallel and Distributed Computing
Unfortunately, simulators that model current multicore processors are getting more and more complex, which lengthens the learning phase and complicates their use in time-bounded lab sessions. ...
For example, how last level cache (LLC) misses impact processor performance. ...
of Multicore Processors, and we plan to introduce it in the Architecture and Computer Engineering in the next year. ...
doi:10.1016/j.jpdc.2018.02.026
fatcat:apa6byknfbftddikyqlq226p3a
Randomization for Safer, more Reliable and Secure, High-Performance Automotive Processors
2019
IEEE design & test
time predictability, and jeopardize reliable operation due to the use of advanced process technology. ...
The other side of the coin is that high-performance processors include hardware features like shared multilevel caches and multiple cores that expose the system to significant security threats, challenge ...
Personal use of this material is permitted. ...
doi:10.1109/mdat.2019.2927373
fatcat:nsgnc5qup5eiza7imooexyy5ce
SEDEA: A Sensible Approach to Account DRAM Energy in Multicore Systems
2017
2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
We also provide a use case showing that SEDEA can be used to guide shared cache and memory bank partition schemes to save energy. ...
However, the use of multicore system complicates per-task energy measurement as the increased Thread Level Parallelism (TLP) allows several tasks to run simultaneously sharing resources. ...
Personal use of this material is permitted. ...
doi:10.1109/sbac-pad.2017.17
dblp:conf/sbac-pad/LiuMACV17
fatcat:at676yf3cnc6jkchxmhyzmouza
Mobile CPU's rise to power: Quantifying the impact of generational mobile CPU design trends on performance, energy, and user satisfaction
2016
2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)
Over the last seven years, both single-and multicore performance improvements have contributed to end-user satisfaction by reducing user-critical application response latencies. ...
Our methodology allows us to identify what mobile CPU design techniques provide the most benefit to the end-user's quality of user experience. ...
This research is supported in part by NSF awards CCF-1528045, CCF-1255892 and SRC 2013-HJ-2408, along with gifts from Google, Samsung and Intel. ...
doi:10.1109/hpca.2016.7446054
dblp:conf/hpca/HalpernZR16
fatcat:dcuenxqyljb4bpxijmotq4ugr4
Time-Analysable Non-Partitioned Shared Caches for Real-Time Multicore Systems
2014
Proceedings of the The 51st Annual Design Automation Conference on Design Automation Conference - DAC '14
In a 4-core multicore processor setup our proposal improves cache partitioning by 56% in terms of guaranteed performance and 16% in terms of average performance. ...
Shared caches in multicores challenge Worst-Case Execution Time (WCET) estimation due to inter-task interferences. ...
In a 4-core multicore processor setup our result show that EFL improves cache partitioning by 56% in terms of guaranteed performance and 16% in terms of average performance. ...
doi:10.1145/2593069.2593235
dblp:conf/dac/SlijepcevicKAQC14
fatcat:ql6d3c4vnbfo3f3jvtl46eetla
The design and implementation of heterogeneous multicore systems for energy-efficient speculative thread execution
2013
ACM Transactions on Architecture and Code Optimization (TACO)
To match program execution with the most energy-efficient processor configuration, the system was equipped with a dynamic resource allocation scheme that characterizes program behaviors using novel processor ...
Compared to the most efficient homogeneous uniprocessor running sequential programs, we improved performance by 29% and reduced energy consumption by 3.6%, which is a 42% improvement in energy-delay-squared ...
This is demonstrated by the 37.5% improvement in ED 2 P from a heterogeneous sequential processor (Het-Seq) to a heterogeneous multicore processor that executes speculative parallelized code (Het-TLS). ...
doi:10.1145/2541228.2541233
fatcat:ek4cfgfxxzhprgdytcx6peg3ni
PROXIMA: Improving Measurement-Based Timing Analysis through Randomisation and Probabilistic Analysis
2016
2016 Euromicro Conference on Digital System Design (DSD)
The use of increasingly complex hardware and software platforms in response to the ever rising performance demands of modern real-time systems complicates the verification and validation of their timing ...
In this paper we relate the current state of practice in measurement-based timing analysis, the predominant choice for industrial developers, to the proceedings of the PROXIMA 1 project in that very field ...
ACKNOWLEDGEMENTS The research leading to these results has received funding from the European Community's Seventh ...
doi:10.1109/dsd.2016.22
dblp:conf/dsd/CazorlaAAVVBBAW16
fatcat:qidopagxeffazixbhtmmfaypxa
Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks
2015
ACM Transactions on Architecture and Code Optimization (TACO)
Many modern high-performance processors prefetch blocks into the on-chip cache. Prefetched blocks can potentially pollute the cache by evicting more useful blocks. ...
First, we observe that over 95% of useful prefetches in a wide variety of applications are not reused after the first demand hit (in secondary caches). ...
This work is supported in part by NSF grants 0953246, 1212962, 1320531, the Intel Science and Technology Center for Cloud Computing, and the Semiconductor Research Corporation. ...
doi:10.1145/2677956
fatcat:si4li6c7zzhkfoquoohx25dpri
« Previous
Showing results 1 — 15 out of 444 results