Filters








774 Hits in 4.7 sec

Exploiting criticality to reduce bottlenecks in distributed uniprocessors

Behnam Robatmili, Sibi Govindan, Doug Burger, Stephen W. Keckler
2011 2011 IEEE 17th International Symposium on High Performance Computer Architecture  
To alleviate these bottlenecks, this paper proposes a fully distributed framework to exploit criticality in these architectures at different granularities.  ...  This general framework reduces competing bottlenecks in a synergic manner and achieves scalable performance/power efficiency for sequential programs when running across a large number of cores.  ...  This work was supported in part by National Science Foundation grant CCF-0916745 and DARPA contract F33615-03-C-4106.  ... 
doi:10.1109/hpca.2011.5749749 dblp:conf/hpca/RobatmiliGBK11 fatcat:mferyc5rybelfl64gftvd6oy5m

Performance of database workloads on shared-memory systems with out-of-order processors

Parthasarathy Ranganathan, Kourosh Gharachorloo, Sarita V. Adve, Luiz André Barroso
1998 Proceedings of the eighth international conference on Architectural support for programming languages and operating systems - ASPLOS-VIII  
hints can be used for this data to further reduce execution time by 12%.  ...  We show that an instruction stream buffer is effective in reducing the remaining instruction stalls in OLTP, providing a 17% reduction in execution time (approaching a perfect instruction cache to within  ...  We would also like to thank Jef Kennedy from Oracle for reviewing this manuscript, Marco Annaratone from WRL for supporting this work, and Drew Kramer from WRL for technical support.  ... 
doi:10.1145/291069.291067 dblp:conf/asplos/RanganathanGAB98 fatcat:x5qbk25rdzg45gsfimyiwuxmy4

Evaluating the impact of simultaneous multithreading on network servers using real hardware

Yaoping Ruan, Vivek S. Pai, Erich Nahum, John M. Tracey
2005 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems - SIGMETRICS '05  
In the uniprocessor case, previous studies appear to have neglected the OS overhead in switching from a uniprocessor kernel to an SMT-enabled kernel.  ...  In the 2-way multiprocessor case, the higher number of memory references from SMT often causes the memory system to become the bottleneck, offsetting any processor utilization gains.  ...  We would like to thank our shepherd, Geoff Voelker, and our anonymous reviewers for their feedback and insight.  ... 
doi:10.1145/1064212.1064254 dblp:conf/sigmetrics/RuanPNT05 fatcat:lsclv7fzabc4nbp2wrszr5yaau

Evaluating the impact of simultaneous multithreading on network servers using real hardware

Yaoping Ruan, Vivek S. Pai, Erich Nahum, John M. Tracey
2005 Performance Evaluation Review  
In the uniprocessor case, previous studies appear to have neglected the OS overhead in switching from a uniprocessor kernel to an SMT-enabled kernel.  ...  In the 2-way multiprocessor case, the higher number of memory references from SMT often causes the memory system to become the bottleneck, offsetting any processor utilization gains.  ...  We would like to thank our shepherd, Geoff Voelker, and our anonymous reviewers for their feedback and insight.  ... 
doi:10.1145/1071690.1064254 fatcat:g64hic5lwjdvfe7fh42runcgdi

An analysis of database workload performance on simultaneous multithreaded processors

Jack L. Lo, Luiz André Barroso, Susan J. Eggers, Kourosh Gharachorloo, Henry M. Levy, Sujay S. Parekh
1998 SIGARCH Computer Architecture News  
Multithreading also leads to better interthread instruction cache sharing, reducing I-cache miss rates by up to 35%.  ...  Our data show that while DBMS workloads have large memory footprints, there is substantial data reuse in a small, cacheable "critical" working set.  ...  technical assistance in the research.  ... 
doi:10.1145/279361.279367 fatcat:dailyd4c65eebirkgciljqoua4

Supporting a Coherent Shared Address Space Across SMP Nodes: An Application-Driven Investigation [chapter]

Angelos Bilas, Liviu Iftode, Rudrajit Samanta, Jaswinder Pal Singh
1999 IMA Volumes in Mathematics and its Applications  
is reduced.  ...  This means that depending on the constants, the node-to-network bandwidth may become a bottleneck if it is not increased considerably when going from a uniprocessor to an SMP node.  ...  Since shared virtual memory systems have been developed to provide this abstraction across uniprocessors in software, it is attractive, though non-trivial, to extend them to exploit SMP nodes e ectively  ... 
doi:10.1007/978-1-4612-1516-5_2 fatcat:3xbdrciarjhpxplvx434jascwi

OpenMP extension to SMP clusters

Yang-Suk Kee
2006 IEEE potentials  
to reduce the shared address space and selective data touch and migratory home to exploit data locality avoid unnecessary page migrations significantly.  ...  Easy-to-use programming paradigm for high performance computing has been a challenging issue in the parallel computing community.  ...  ParADE exploits most aforementioned solution techniques to overcome the performance bottlenecks.  ... 
doi:10.1109/mp.2006.1657761 fatcat:5th5gj37wzbqbd5febzq7hoavq

Trace processors: moving to fourth-generation microarchitectures

J.E. Smith, S. Vajapeyam
1997 Computer  
Trace processors rely on hierarchy, replication, and prediction to dramatically increase the execution speed of ordinary sequential programs.  ...  Vijaykumar, Scott Breach, and Andreas Moshovos-who have done initial research and given considerable thought to the direction of future-generation processors.  ...  This work was supported in part by NSF grant MIP-9505853 and by the US Army Intelligence Center and Fort Huachuca under contract DABT63-95-C-0127 and ARPA order D346.  ... 
doi:10.1109/2.612251 fatcat:u6ntlnrvdrei5pmxcbnv7nmghe

MemSpy: analyzing memory system bottlenecks in programs

Margaret Martonosi, Anoop Gupta, Thomas Anderson
1992 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems - SIGMETRICS '92/PERFORMANCE '92  
This paper describes MemSpy, a prototype tool that helps programmers identify and fix memory bottlenecks in both sequential and parallel programs.  ...  A key aspect of MemSpy is that it introduces the notion of data oriented, in addition to code oriented, performance tuning.  ...  This work was supported in part by the Digital Equipment Corporation Systems Research Center and DARPA contract N00039-91-C-0138.  ... 
doi:10.1145/133057.133079 dblp:conf/sigmetrics/MartonosiGA92 fatcat:h467kwcwa5brldwswzaajg6lfm

MemSpy: analyzing memory system bottlenecks in programs

Margaret Martonosi, Anoop Gupta, Thomas Anderson
1992 Performance Evaluation Review  
This paper describes MemSpy, a prototype tool that helps programmers identify and fix memory bottlenecks in both sequential and parallel programs.  ...  A key aspect of MemSpy is that it introduces the notion of data oriented, in addition to code oriented, performance tuning.  ...  This work was supported in part by the Digital Equipment Corporation Systems Research Center and DARPA contract N00039-91-C-0138.  ... 
doi:10.1145/149439.133079 fatcat:c2fb2or3pbcbhkgwl77m5jqjbi

Quantifying instruction criticality for shared memory multiprocessors

Tong Li, Alvin R. Lebeck, Daniel J. Sorin
2003 Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '03  
In this paper, we extend the uniprocessor DAG model to characterize parallel program executions on shared memory multiprocessor systems.  ...  To enable efficient offline critical path analysis, we propose a novel graph reduction technique that reduces a DAG to an equivalent but significantly smaller DAG.  ...  This work is supported in part by the US National Science Foundation (EIA-99772879, ITR-0082914, CCR-0204367, and CCR-0208920), Intel, IBM, Microsoft and the Duke University Graduate School.  ... 
doi:10.1145/777412.777434 dblp:conf/spaa/LiLS03 fatcat:gwn52oesvndkvbzu7t7wc7dbx4

Quantifying instruction criticality for shared memory multiprocessors

Tong Li, Alvin R. Lebeck, Daniel J. Sorin
2003 Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '03  
In this paper, we extend the uniprocessor DAG model to characterize parallel program executions on shared memory multiprocessor systems.  ...  To enable efficient offline critical path analysis, we propose a novel graph reduction technique that reduces a DAG to an equivalent but significantly smaller DAG.  ...  This work is supported in part by the US National Science Foundation (EIA-99772879, ITR-0082914, CCR-0204367, and CCR-0208920), Intel, IBM, Microsoft and the Duke University Graduate School.  ... 
doi:10.1145/777432.777434 fatcat:okvgvaqla5a3ll2itdte3zpkzq

Detailed cache coherence characterization for OpenMP benchmarks

Jaydeep Marathe, Anita Nagarajan, Frank Mueller
2004 Proceedings of the 18th annual international conference on Supercomputing - ICS '04  
By exploiting these unique features of ccSIM, we were able to identify and locate opportunities for program transformations, including interactions with OpenMP constructs, resulting in both significantly  ...  The novelty of ccSIM lies in its ability to relate coherence traffic -specifically coherence misses as well as their progenitor invalidations -to data structures and to their reference locations in the  ...  Since the main motivation in reducing the invalidate traffic is to decrease the number of coherence misses, it is imperative to distinguish between coherence misses and uniprocessor misses in a processor  ... 
doi:10.1145/1006209.1006250 dblp:conf/ics/MaratheNM04 fatcat:eqnhtfczzvhyfoynom4rhodcw4

VLSI architecture: past, present, and future

W.J. Dally, S. Lacy
1999 Proceedings 20th Anniversary Conference on Advanced Research in VLSI  
Today this gap is fully closed and adding devices to uniprocessors is well beyond the point of diminishing returns.  ...  Data similar to those in 4. Many important contributions were made beyond the few that I cite here. 5. They are an example of processors that are not yet to the point of diminishing returns.  ...  Acknowledgments The authors are indebted to Mark Horowitz and Kunle Olukotun for many helpful discussions.  ... 
doi:10.1109/arvlsi.1999.756051 dblp:conf/arvlsi/DallyL99 fatcat:wcpnutla2ravljkyjz64uang5m

MGS

Donald Yeung, John Kubiatowicz, Anant Agarwal
1996 Proceedings of the 23rd annual international symposium on Computer architecture - ISCA '96  
We call these systems Distributed Scalable Shared-memory Multiprocessors (DSSMPs).  ...  This paper explores the coupling of such small-to medium-scale shared memory multiprocessors through software over a local area network to synthesize larger shared memory systems.  ...  Acknowledgments This research is funded in part by ARPA contract #N00014-94-1-0985, in part by NSF Experimental Systems grant #MIP-9504399, and in part by a NSF Presidential Young Investigator Award.  ... 
doi:10.1145/232973.232980 dblp:conf/isca/YeungKA96 fatcat:nprbszfczrhnzosqc2fb3cktom
« Previous Showing results 1 — 15 out of 774 results