Filters








444 Hits in 7.3 sec

Enabling software management for multicore caches with a lightweight hardware support

Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, P. Sadayappan
2009 Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09  
In order to turn cache partitioning methods into reality in the management of multicore processors, we propose to provide an affordable and lightweight hardware support to coordinate with OS-based cache  ...  The management of shared caches in multicore processors is a critical and challenging task. Many hardware and OS-based methods have been proposed.  ...  This research was supported in part by the National Science Foundation under grants CNS-0834476, CCF-0514085, CNS-0834393, and CCF-0913050.  ... 
doi:10.1145/1654059.1654074 dblp:conf/sc/LinLDZZS09 fatcat:g5grixuigre55o2krwerkovrui

CAFFEINE

Biswabandan Panda, Shankar Balachandran
2015 ACM Transactions on Architecture and Code Optimization (TACO)  
Our metric provides net processor cycles saved because of prefetching by approximating the cycles saved across the memory subsystem, from last-level cache to DRAM.  ...  CAFFEINE uses CAFFEINATION when the prefetcher-caused interference is tolerable (we define in Section 3.1) and it uses DE-CAFFEINATION when the prefetcher-caused interference is intolerable.  ...  ACKNOWLEDGMENTS The authors would like to thank Dr. Rupesh Nasre, Dr. Madhu Mutyam, and Prof. R. Govindarajan for their valuable comments.  ... 
doi:10.1145/2806891 fatcat:fzcf6ngcpfa4jktirp2eac5qua

Memory management in NUMA multicore systems

Zoltan Majo, Thomas R. Gross
2011 Proceedings of the international symposium on Memory management - ISMM '11  
N-MASS is fine-tuned to support memory management on NUMA-multicores and improves performance up to 32%, and 7% on average, over the default setup in current Linux implementations.  ...  As the cores of a processor share a common cache, the issues of memory management and process mapping must be revisited.  ...  The N-MASS scheme described in this paper successfully combines memory management and process scheduling to better exploit the potential of NUMA-multicore processors.  ... 
doi:10.1145/1993478.1993481 dblp:conf/iwmm/MajoG11 fatcat:qoftbiu4zrgj3ork2sfrquiupy

Memory management in NUMA multicore systems

Zoltan Majo, Thomas R. Gross
2011 SIGPLAN notices  
N-MASS is fine-tuned to support memory management on NUMA-multicores and improves performance up to 32%, and 7% on average, over the default setup in current Linux implementations.  ...  As the cores of a processor share a common cache, the issues of memory management and process mapping must be revisited.  ...  The N-MASS scheme described in this paper successfully combines memory management and process scheduling to better exploit the potential of NUMA-multicore processors.  ... 
doi:10.1145/2076022.1993481 fatcat:czau3i5xsjdmdh4mkogl5adpfy

A generic and compositional framework for multicore response time analysis

Sebastian Altmeyer, Robert I. Davis, Leandro Indrusiak, Claire Maiza, Vincent Nelis, Jan Reineke
2015 Proceedings of the 23rd International Conference on Real Time and Networks Systems - RTNS '15  
The MRTA framework provides a general approach to timing verification for multicore systems that is parametric in the hardware configuration and so can be used at the architectural design stage to compare  ...  In this paper, we introduce a Multicore Response Time Analysis (MRTA) framework.  ...  Acknowledgements This work was supported in part by the COST Action IC1202 TACLe, by the DFG as part of the Transregional Collaborative Research Centre SFB/TR 14 (AVACS), by National Funds through FCT/  ... 
doi:10.1145/2834848.2834862 dblp:conf/rtns/AltmeyerDIMNR15 fatcat:4ad3vtbawjer7otj44q4dbotqe

Energy Discounted Computing on Multicore Smartphones

Meng Zhu, Kai Shen
2016 USENIX Annual Technical Conference  
In addition, we use available ARM performance counters to identify co-run resource contention on the multicore processor and throttle best-effort task when it interferes with interactivity.  ...  Experimental results on a multicore smartphone show that we can reach up to 63% energy discount in the best-effort task processing with little performance impact on the interactive applications.  ...  We also thank the anonymous USENIX ATC reviewers and our shepherd Rodrigo Fonseca for comments that helped improve this paper.  ... 
dblp:conf/usenix/ZhuS16 fatcat:crtgvu6jtvhfhbsh36jsgbxxvy

Resource management for isolation enhanced cloud services

Himanshu Raj, Ripal Nathuji, Abhishek Singh, Paul England
2009 Proceedings of the 2009 ACM workshop on Cloud computing security - CCSW '09  
Experimental results demonstrate that these approaches are effective in isolating cache interference impacts a VM may have on another VM.  ...  We identify last level cache (LLC) sharing as one of the impediments to finer grain isolation required by a service, and advocate two resource management approaches to provide performance and security  ...  This multicore trend is expected to continue in the future. Shared caches are commonly used in such multicore architectures.  ... 
doi:10.1145/1655008.1655019 dblp:conf/ccs/RajNSE09 fatcat:x5xgbgvtr5ai5hs633peljebka

Designing lab sessions focusing on real processors for computer architecture courses: A practical perspective

Josué Feliu, Julio Sahuquillo, Salvador Petit
2018 Journal of Parallel and Distributed Computing  
Unfortunately, simulators that model current multicore processors are getting more and more complex, which lengthens the learning phase and complicates their use in time-bounded lab sessions.  ...  For example, how last level cache (LLC) misses impact processor performance.  ...  of Multicore Processors, and we plan to introduce it in the Architecture and Computer Engineering in the next year.  ... 
doi:10.1016/j.jpdc.2018.02.026 fatcat:apa6byknfbftddikyqlq226p3a

Randomization for Safer, more Reliable and Secure, High-Performance Automotive Processors

David Trilla, Carles Hernandez, Jaume Abella, Francisco J. Cazorla
2019 IEEE design & test  
time predictability, and jeopardize reliable operation due to the use of advanced process technology.  ...  The other side of the coin is that high-performance processors include hardware features like shared multilevel caches and multiple cores that expose the system to significant security threats, challenge  ...  Personal use of this material is permitted.  ... 
doi:10.1109/mdat.2019.2927373 fatcat:nsgnc5qup5eiza7imooexyy5ce

SEDEA: A Sensible Approach to Account DRAM Energy in Multicore Systems

Qixiao Liu, Miquel Moreto, Jaume Abella, Francisco J. Cazorla, Mateo Valero
2017 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)  
We also provide a use case showing that SEDEA can be used to guide shared cache and memory bank partition schemes to save energy.  ...  However, the use of multicore system complicates per-task energy measurement as the increased Thread Level Parallelism (TLP) allows several tasks to run simultaneously sharing resources.  ...  Personal use of this material is permitted.  ... 
doi:10.1109/sbac-pad.2017.17 dblp:conf/sbac-pad/LiuMACV17 fatcat:at676yf3cnc6jkchxmhyzmouza

Mobile CPU's rise to power: Quantifying the impact of generational mobile CPU design trends on performance, energy, and user satisfaction

Matthew Halpern, Yuhao Zhu, Vijay Janapa Reddi
2016 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)  
Over the last seven years, both single-and multicore performance improvements have contributed to end-user satisfaction by reducing user-critical application response latencies.  ...  Our methodology allows us to identify what mobile CPU design techniques provide the most benefit to the end-user's quality of user experience.  ...  This research is supported in part by NSF awards CCF-1528045, CCF-1255892 and SRC 2013-HJ-2408, along with gifts from Google, Samsung and Intel.  ... 
doi:10.1109/hpca.2016.7446054 dblp:conf/hpca/HalpernZR16 fatcat:dcuenxqyljb4bpxijmotq4ugr4

Time-Analysable Non-Partitioned Shared Caches for Real-Time Multicore Systems

Mladen Slijepcevic, Leonidas Kosmidis, Jaume Abella, Eduardo Quiñones, Francisco J. Cazorla
2014 Proceedings of the The 51st Annual Design Automation Conference on Design Automation Conference - DAC '14  
In a 4-core multicore processor setup our proposal improves cache partitioning by 56% in terms of guaranteed performance and 16% in terms of average performance.  ...  Shared caches in multicores challenge Worst-Case Execution Time (WCET) estimation due to inter-task interferences.  ...  In a 4-core multicore processor setup our result show that EFL improves cache partitioning by 56% in terms of guaranteed performance and 16% in terms of average performance.  ... 
doi:10.1145/2593069.2593235 dblp:conf/dac/SlijepcevicKAQC14 fatcat:ql6d3c4vnbfo3f3jvtl46eetla

The design and implementation of heterogeneous multicore systems for energy-efficient speculative thread execution

Yangchun Luo, Wei-Chung Hsu, Antonia Zhai
2013 ACM Transactions on Architecture and Code Optimization (TACO)  
To match program execution with the most energy-efficient processor configuration, the system was equipped with a dynamic resource allocation scheme that characterizes program behaviors using novel processor  ...  Compared to the most efficient homogeneous uniprocessor running sequential programs, we improved performance by 29% and reduced energy consumption by 3.6%, which is a 42% improvement in energy-delay-squared  ...  This is demonstrated by the 37.5% improvement in ED 2 P from a heterogeneous sequential processor (Het-Seq) to a heterogeneous multicore processor that executes speculative parallelized code (Het-TLS).  ... 
doi:10.1145/2541228.2541233 fatcat:ek4cfgfxxzhprgdytcx6peg3ni

PROXIMA: Improving Measurement-Based Timing Analysis through Randomisation and Probabilistic Analysis

Francisco J. Cazorla, Jaume Abella, Jan Andersson, Tullio Vardanega, Francis Vatrinet, Iain Bate, Ian Broster, Mikel Azkarate-Askasua, Franck Wartel, Liliana Cucu, Fabrice Cros, Glenn Farrall (+15 others)
2016 2016 Euromicro Conference on Digital System Design (DSD)  
The use of increasingly complex hardware and software platforms in response to the ever rising performance demands of modern real-time systems complicates the verification and validation of their timing  ...  In this paper we relate the current state of practice in measurement-based timing analysis, the predominant choice for industrial developers, to the proceedings of the PROXIMA 1 project in that very field  ...  ACKNOWLEDGEMENTS The research leading to these results has received funding from the European Community's Seventh  ... 
doi:10.1109/dsd.2016.22 dblp:conf/dsd/CazorlaAAVVBBAW16 fatcat:qidopagxeffazixbhtmmfaypxa

Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks

Vivek Seshadri, Samihan Yedkar, Hongyi Xin, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry
2015 ACM Transactions on Architecture and Code Optimization (TACO)  
Many modern high-performance processors prefetch blocks into the on-chip cache. Prefetched blocks can potentially pollute the cache by evicting more useful blocks.  ...  First, we observe that over 95% of useful prefetches in a wide variety of applications are not reused after the first demand hit (in secondary caches).  ...  This work is supported in part by NSF grants 0953246, 1212962, 1320531, the Intel Science and Technology Center for Cloud Computing, and the Semiconductor Research Corporation.  ... 
doi:10.1145/2677956 fatcat:si4li6c7zzhkfoquoohx25dpri
« Previous Showing results 1 — 15 out of 444 results