Filters








848 Hits in 4.8 sec

Signature Buffer: Bridging Performance Gap between Registers and Caches

Lu Peng, Jih-Kwon Peir, Konrad Lai
10th International Symposium on High Performance Computer Architecture (HPCA'04)  
Data communications between producer instructions and consumer instructions through memory incur extra delays that degrade processor performance.  ...  A small Signature Buffer, addressed by the memory signature, can be established to permit stores and loads bypassing normal memory hierarchy for fast data communication.  ...  Acknowledgement This work is supported in part by an NSF grant EIA-0073473 and by research donations from Microprocessor Research Lab and China Research Center of Intel Corp.  ... 
doi:10.1109/hpca.2004.10020 dblp:conf/hpca/PengPL04 fatcat:itiyerrabfbudktbwz76nelbwe

A New Buffer Cache Design Exploiting Both Temporal and Content Localities

Jin Ren, Qing Yang
2010 2010 IEEE 30th International Conference on Distributed Computing Systems  
This paper presents a Least Popularly Used buffer cache algorithm to exploit both temporal locality and content locality of I/O requests.  ...  Fast delta compression and decompression are used to satisfy as many I/O requests as possible using the popular reference blocks together with small deltas inside the buffer cache.  ...  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.  ... 
doi:10.1109/icdcs.2010.26 dblp:conf/icdcs/RenY10 fatcat:35c53sugmvarlligkao6766fm4

Adaptive wear-leveling algorithm for PRAM main memory with a DRAM buffer

Sung Kyu Park, Min Kyu Maeng, Ki-Woong Park, Kyu Ho Park
2014 ACM Transactions on Embedded Computing Systems  
First, existing DRAM buffering schemes do not consider write count distribution. Second, swapping and shifting operations are performed statically.  ...  Finally, swapping and shifting operations are loosely coupled with a DRAM buffer.  ...  A relatively small DRAM buffer (3% of the size of the PRAM main memory) can bridge most of the latency gap between DRAM and PRAM [Qureshi et al. 2009b ].  ... 
doi:10.1145/2558427 fatcat:o3yzsutbjrfibby67wfsxcr4fi

Mechanisms for store-wait-free multiprocessors

Thomas F. Wenisch, Anastasia Ailamaki, Babak Falsafi, Andreas Moshovos
2007 SIGARCH Computer Architecture News  
Prior research suggests that the performance gap among consistency models can be closed through speculation-enforcing order only when dynamically necessary.  ...  To eliminate buffer-capacity-related stalls, we propose the scalable store buffer, which places private/speculative values directly into the L1 cache, thereby eliminating the non-scalable associative search  ...  This work was partially supported by grants and equipment from Intel, two Sloan research fellowships, an NSERC Discovery Grant, an IBM faculty partnership award, and NSF grant CCR-0509356.  ... 
doi:10.1145/1273440.1250696 fatcat:zi2kb743vbhpzhx5m7athfydr4

Mechanisms for store-wait-free multiprocessors

Thomas F. Wenisch, Anastasia Ailamaki, Babak Falsafi, Andreas Moshovos
2007 Proceedings of the 34th annual international symposium on Computer architecture - ISCA '07  
Prior research suggests that the performance gap among consistency models can be closed through speculation-enforcing order only when dynamically necessary.  ...  To eliminate buffer-capacity-related stalls, we propose the scalable store buffer, which places private/speculative values directly into the L1 cache, thereby eliminating the non-scalable associative search  ...  This work was partially supported by grants and equipment from Intel, two Sloan research fellowships, an NSERC Discovery Grant, an IBM faculty partnership award, and NSF grant CCR-0509356.  ... 
doi:10.1145/1250662.1250696 dblp:conf/isca/WenischAFM07 fatcat:rail7xodnjerrpgq4o4wpyntgi

Reconciling performance and programmability in networking systems

Jayaram Mudigonda, Harrick M. Vin, Stephen W. Keckler
2007 Computer communication review  
The combination of these two allows us to simultaneously achieve the goals of ease-of-programming and high performance.  ...  To address this challenge, we first make a case for, and then develop a malleable processor architecture that facilitates the dynamic reconfiguration of cache capacity and number of threads to best-suit  ...  We design a novel register access predictor to efficiently bridge the bandwidth and latency gap between MRC and data cache.  ... 
doi:10.1145/1282427.1282390 fatcat:mz3vyt4ljrfa3pc32w5rrsltd4

Reconciling performance and programmability in networking systems

Jayaram Mudigonda, Harrick M. Vin, Stephen W. Keckler
2007 Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications - SIGCOMM '07  
The combination of these two allows us to simultaneously achieve the goals of ease-of-programming and high performance.  ...  To address this challenge, we first make a case for, and then develop a malleable processor architecture that facilitates the dynamic reconfiguration of cache capacity and number of threads to best-suit  ...  We design a novel register access predictor to efficiently bridge the bandwidth and latency gap between MRC and data cache.  ... 
doi:10.1145/1282380.1282390 dblp:conf/sigcomm/MudigondaVK07 fatcat:bfyy3kqk2jc2de57fx3o5syyzu

The design space of ultra-low energy asymmetric cryptography

Andrew D. Targhetta, Donald E. Owen, Paul V. Gratz
2014 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)  
Thus, we implement a parameterizable instruction cache and simulate various configurations.  ...  In literature, a plethora of hardware and software acceleration techniques exists for improving the performance of asymmetric cryptography.  ...  Without their invaluable guidance and efforts, none of this work would have been possible.  ... 
doi:10.1109/ispass.2014.6844461 dblp:conf/ispass/TarghettaOG14 fatcat:ghiu2f6srfe5temv2tft7ywkqu

Malware Guard Extension: Using SGX to Conceal Cache Attacks [article]

Michael Schwarz, Clémentine Maurice
2019 arXiv   pre-print
We perform a Prime+Probe cache side-channel attack on a co-located SGX enclave running an up-to-date RSA implementation that uses a constant-time multiplication primitive.  ...  However, the hypervisor does not protect tenants against the cloud provider and thus the supplied operating system and hardware. Intel SGX provides a mechanism that addresses this scenario.  ...  In particular, cache attacks exploit the timing difference between the CPU cache and the main memory.  ... 
arXiv:1702.08719v3 fatcat:hg3li6yqrfemphsdb4jngbg6cm

Co-processor-based Behavior Monitoring

Ronny Chevalier, Maugan Villatel, David Plaquin, Guillaume Hiet
2017 Proceedings of the 33rd Annual Computer Security Applications Conference on - ACSAC 2017  
We model the behavior of SMM using invariants of its control-flow and relevant CPU registers (CR3 and SMBASE). We instrument two open-source firmware implementations: EDK II and coreboot.  ...  This information helps to resolve the semantic gap issue. Our approach does not depend on a specific model of the behavior nor on a specific target.  ...  ACKNOWLEDGMENTS The authors would like to thank and acknowledge the contribution of the following people (in alphabetical order) for their helpful comments, technical discussions, feedback and proofing  ... 
doi:10.1145/3134600.3134622 dblp:conf/acsac/ChevalierVPH17 fatcat:jizvvr647rgmpfbvc7xp7x7rsu

Malware Guard Extension: abusing Intel SGX to conceal cache attacks

Michael Schwarz, Samuel Weiser, Daniel Gruss, Clémentine Maurice, Stefan Mangard
2020 Cybersecurity  
We perform a Prime+Probe cache side-channel attack on a co-located SGX enclave running an up-to-date RSA implementation that uses a constant-time multiplication primitive.  ...  However, the hypervisor does not protect tenants against the cloud provider and thus, the supplied operating system and hardware. Intel SGX provides a mechanism that addresses this scenario.  ...  In particular, cache attacks exploit the timing difference between the CPU cache and the main memory.  ... 
doi:10.1186/s42400-019-0042-y fatcat:jxhbzrlzlveqjm4h7iuwrniltm

SoK: Introspections on Trust and the Semantic Gap

Bhushan Jain, Mirza Basim Baig, Dongli Zhang, Donald E. Porter, Radu Sion
2014 2014 IEEE Symposium on Security and Privacy  
The paper then observes portions of the VMI design space which have been under-explored, as well as potential adaptations of existing techniques to bridge the semantic gap without trusting the guest OS  ...  This paper organizes previous work based on the essential design considerations when building a VMI system, and then explains how these design choices dictate the trust model and security properties of  ...  by gifts from Northrop Grumman Corporation, Parc/Xerox, Microsoft Research, and CA.  ... 
doi:10.1109/sp.2014.45 dblp:conf/sp/JainBZPS14 fatcat:zpldjgsmcfdvhgx252dewbh33y

Characterizing the Performance of Modern Architectures Through Opaque Benchmarks: Pitfalls Learned the Hard Way

Luka Stanisic, Lucas Mello Schnorr, Augustin Degomme, Franz C. Heinrich, Arnaud Legrand, Brice Videau
2017 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)  
Based on such performance models, most high-level simulation-based frameworks separately characterize a machine and an application, later convolving both signatures to predict the overall performance.  ...  Determining key characteristics of High Performance Computing machines that allow users to predict their performance is an old and recurrent dream.  ...  ACKNOWLEDGMENTS The authors would like to thank the SimGrid team members and collaborators who contributed to SMPI. This  ... 
doi:10.1109/ipdpsw.2017.125 dblp:conf/ipps/StanisicSDHLV17 fatcat:gtwqzmmreffutlmzqyhvt4xi7q

Hardware-Software Contracts for Secure Speculation [article]

Marco Guarnieri, Boris Köpf, Jan Reineke, Pepe Vila
2020 arXiv   pre-print
In this paper, we put forward a framework for specifying such contracts, and we demonstrate its expressiveness and flexibility.  ...  Intuitively, more defensive mechanisms are less efficient but can securely execute a larger class of programs, while more permissive mechanisms may offer more performance but require more defensive programming  ...  We would like to thank David Chisnall, Muntaquim Chowdhury, Matthew Fernandez, Cédric Fournet, Carlos Rozas, and Gururaj Saileshwar for feedback and discussions.  ... 
arXiv:2006.03841v3 fatcat:jkj3xiqbd5d4baalaqgprzzka4

ExaSAT: An exascale co-design tool for performance modeling

Didem Unat, Cy Chan, Weiqun Zhang, Samuel Williams, John Bachan, John Bell, John Shalf
2015 The international journal of high performance computing applications  
In order to determine the performance consequences of different hardware designs, analytic models are essential because they can provide fast feedback to the co-design centers and chip designers without  ...  The parameterized analytic model enables quantitative evaluation of a broad range of hardware design trade-offs and software optimizations on a variety of different performance metrics, with a primary  ...  Acknowledgements The authors would like to thank Weishen Mead and Matthew Cordery for their contribution to the Pin tool validation.  ... 
doi:10.1177/1094342014568690 fatcat:ddefc7elibh35iw434uzdkba7i
« Previous Showing results 1 — 15 out of 848 results