16 Hits in 6.1 sec

Design and implementation of the blue gene/P snoop filter

Valentina Salapura, Matthias Blumrich, Alan Gara
2008 High-Performance Computer Architecture  
The Blue Gene/P snoop filters combine stream registers and snoop caches to capture both the locality of snoop addresses and their streaming behavior.  ...  In addition, reducing snoop lookups yields power savings. This paper describes the design of the Blue Gene/P snoop filters, and presents hardware measurements to demonstrate their effectiveness.  ...  Acknowledgements The Blue Gene/P project has been supported and partially funded by Argonne National Laboratory and Lawrence Livermore National Laboratory on behalf of the United States Department of Energy  ... 
doi:10.1109/hpca.2008.4658623 dblp:conf/hpca/SalapuraBG08 fatcat:buk2yhexjndehm3fztwgfocwum

Counting stream registers: An efficient and effective snoop filter architecture

Aanjhan Ranganathan, Ali Galip Bayrak, Theo Kluter, Philip Brisk, Edoardo Charbon, Paolo Ienne
2012 2012 International Conference on Embedded Computer Systems (SAMOS)  
Over time, this class of snoop filters loses the ability to filter memory addresses that have been loaded, and then evicted, from the caches that are filtered; they include cache wrap detection logic,  ...  We introduce a counting stream register snoop filter, which improves the performance of existing snoop filters based on stream registers.  ...  Examples of exclusive filters include the exclusive JETTY [13] and the range filter used as part of the Blue Gene/P Snoop filter [6, 16, 17] .  ... 
doi:10.1109/samos.2012.6404165 dblp:conf/samos/RanganathanBKBCI12 fatcat:3qzydotj4bbzbh476rqoqvrdoa

Overview of the IBM Blue Gene/P project

2008 IBM Journal of Research and Development  
The Blue Gene/P project has been supported and partially funded by Argonne National Laboratory and the Lawrence Livermore National Laboratory on behalf of the U.S.  ...  Acknowledgments This work has benefited from the cooperation of many individuals at IBM Research (Yorktown Heights, New York), IBM Global Engineering Solutions (Rochester, Minnesota), and IBM Systems and  ...  of a 111-Tflops, eight-rack Blue Gene/P system with a storage system supporting the IBM GPFS.  ... 
doi:10.1147/rd.521.0199 fatcat:rmorpcbwrzbwbcwwifcfmmp3wm

Understanding Network Saturation Behavior on Large-Scale Blue Gene/P Systems

Pavan Balaji, Harish Naik, Narayan Desai
2009 2009 15th International Conference on Parallel and Distributed Systems  
Systems such as the IBM Blue Gene (BG) and Cray XT have started utilizing flat networks (a.k.a. scalable networks) which differ from switched fabrics in that they use a 3D torus or similar topology.  ...  Our studies scale from small systems to up to 8 racks (32768 cores) of BG/P, and show various interesting insights into the network communication characteristics of the system.  ...  Department of Energy under contract DE-AC02-06CH11357 and in part by the Department of Energy award DE-FG02-08ER25835.  ... 
doi:10.1109/icpads.2009.117 dblp:conf/icpads/BalajiND09 fatcat:fwwndoxytzha7e5xambo4dsypq

Extending and benchmarking the "Big Memory" implementation on Blue Gene/P Linux

Kazutomo Yoshii, Harish Naik, Chenjie Yu, Pete Beckman
2011 Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers - ROSS '11  
In our previous work we presented "Big Memory"-an alternative, transparent memory space that successfully removes the memory performance bottleneck on Blue Gene/P Linux.  ...  The initial Big Memory worked only as a per node resource. In this work we extend it to a per core resource and describe the details of the implementation.  ...  the Blue Gene hardware.  ... 
doi:10.1145/1988796.1988806 fatcat:5xscaz3ijnfzva5deoeygbi3ti

Using Partial Tag Comparison in Low-Power Snoop-Based Chip Multiprocessors [chapter]

Ali Shafiee, Narges Shahidi, Amirali Baniasadi
2011 Lecture Notes in Computer Science  
WereducepowerasS-PTCpreventssendingunnecessary snoops and avoids unessential tag lookups at the endpoints. Furthermore, S-PTC improves performance as a resultofearlycachemissdetection.  ...  Our solutions reduce snoop request bandwidth from 78.5% to 81.9% and averagetagarraydynamicpowerbyabout52%.  ...  Acknowledgment This work is supported by the Natural Sciences and EngineeringResearchCouncilofCanada,DiscoveryGrants Program and by Iran's Institute for Research in FundamentalSciences(IPM). 1 Assumingexploitinginterconnectsystems  ... 
doi:10.1007/978-3-642-24322-6_18 fatcat:hjo23ufey5fjhp33tzkqf34h2u

Efficient data streaming with on-chip accelerators: Opportunities and challenges

Rui Hou, Lixin Zhang, Michael C. Huang, Kun Wang, Hubertus Franke, Yi Ge, Xiaotao Chang
2011 2011 IEEE 17th International Symposium on High Performance Computer Architecture  
Thus, on-chip accelerator architectures deserve more attention from the research community. There is a wide spectrum of research opportunities for design and optimization of accelerators.  ...  these characteristics to optimize the power and performance of accelerators, and then analyzes the effectiveness of some simple optimizing extensions proposed.  ...  We would also like to thank Yu Zhang from IBM China Research Lab, and Jian Li from IBM Austin Research Lab, for the technical discussions and valuable comments on earlier stages of this work.  ... 
doi:10.1109/hpca.2011.5749739 dblp:conf/hpca/HouZHWFGC11 fatcat:xecaxvszjbbcxgvqtlfrfcb2ma

IBM Blue Gene Supercomputer [chapter]

Alan Gara, José E. Moreira, Tejas S. Karkhanis, José E. Moreira, José E. Moreira, Michael Flynn, Yoichi Muraoka, Matthias Bollhöfer, José I. Aliaga, Alberto F. Martı́n, Enrique S. Quintana-Ortí, Dhabaleswar K. Panda (+11 others)
2011 Encyclopedia of Parallel Computing  
For details on the Blue Gene system software, the reader is referred to [ 9 10, 10]. Finally, examples of the impact of Blue Gene on science can be found in [ 1, 2, 3, 4, 5, 7, 12, 13, 14] .  ...  Applications of notable significance at LLNL include ddcMD, a classical molecular dynamics code that has been used to simulate systems with approximately half a billion atoms (and in the process win the  ...  Table 1 : 1 Summary of differences between Blue Gene/L and Blue Gene/P nodes.  ... 
doi:10.1007/978-0-387-09766-4_409 fatcat:j74ptxyr4ngppiatv25675z5sa

Large-Scale Numerical Simulations on High-End Computational Platforms [chapter]

Leonid Oliker, Jonathan Carter, Vincent Beckner, John Bell, Harvey Wasserman, Mark Adams, Stéphane Ethier, Erik Schnetter
2010 Chapman & Hall/CRC Computational Science  
We observe that threading alone provided median speedups of 1.9⇥, 2.5⇥, and 4.2⇥ on Barcelona, Clovertown, and Blue Gene/P. Clearly, only Blue Gene/P showed reasonable scalability.  ...  (LBMHD) application. for Barcelona, Clovertown, and Blue Gene/P respectively.  ... 
doi:10.1201/b10509-7 fatcat:hwn33w4tlfddxc3653pmwquteu

Towards scalable, energy-efficient, bus-based on-chip networks

Aniruddha N. Udipi, Naveen Muralimanohar, Rajeev Balasubramonian
2010 HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture  
We further show that the use of OS page coloring helps maximize locality and improves the effectiveness of the Bloom filters.  ...  We show that bus-based networks with snooping protocols can significantly lower energy consumption and simplify network/protocol design and verification, with no loss in performance.  ...  IBM uses a snoop filter in its Blue Gene/P systems [41] based on the Jetty snoop filtering system [35] . A similar snoop filtering system is used by Strauss et al.  ... 
doi:10.1109/hpca.2010.5416639 dblp:conf/hpca/UdipiMB10 fatcat:a6x4ppxwbvcdthupelph4mhu64

Characterizing the Performance of "Big Memory" on Blue Gene Linux

Kazutomo Yoshii, Kamil Iskra, Harish Naik, Pete Beckmanm, P. Chris Broekema
2009 2009 International Conference on Parallel Processing Workshops  
We verify this result on 1024 nodes of Blue Gene/P using the NAS Parallel Benchmarks and find the performance under Linux with Big Memory to fluctuate within 0.7% of CNK.  ...  To address these difficulties, we present the design and implementation of "Big Memory"an alternative, transparent memory space for computational processes.  ...  Acknowledgments: We thank IBM's Todd Inglett, Thomas Musta, Thomas Gooding, George Almási, Sameer Kumar, Michael Blocksome, and Robert Wisniewski for their advice on programming the Blue Gene hardware.  ... 
doi:10.1109/icppw.2009.35 dblp:conf/icppw/YoshiiINBB09 fatcat:logbv2w465fwfh6niwmsdmg464

SigNet: Network-on-chip filtering for coarse vector directories

Natalie Enright Jerger
2010 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)  
. • Characterizes the impact of CV directory architectures on both the latency and bandwidth of the NoC. • Proposes in-network filtering to reduce the load on the NoC to improve performance and save power  ...  Overall, we demonstrate average reductions in interconnect activity of 21% and latency improvements of 20% over a coarse vector directory while utilizing as little as 25% of the area of a fullmap directory  ...  Jetty [14] uses counting Bloom filters to represent the set of all blocks cached by a processor to reduce cache snoops. Blue Gene/P employs several filters to avoid unnecessary snoops [17] .  ... 
doi:10.1109/date.2010.5457028 dblp:conf/date/Jerger10 fatcat:mok2t4whurd7rlpivyxy3jazzm

Performance and Scalability Evaluation of 'Big Memory' on Blue Gene Linux

Kazutomo Yoshii, Kamil Iskra, Harish Naik, Pete Beckman, P. Chris Broekema
2010 The international journal of high performance computing applications  
We address memory performance issues observed in Blue Gene Linux, and discuss the design and implementation of "Big Memory"-an alternative, transparent memory space introduced to eliminate the memory performance  ...  We evaluate the performance of Big Memory using custom memory benchmarks, NAS Parallel Benchmarks, and the Parallel Ocean Program, at a scale of up to 4096 nodes.  ...  Acknowledgments We thank IBM's Todd Inglett, Thomas Musta, Thomas Gooding, George Almási, Sameer Kumar, Michael Blocksome, and Robert Wisniewski for their advice on programming the Blue Gene hardware.  ... 
doi:10.1177/1094342010369116 fatcat:4qecsjamnfedtl6gldrfpwoyym

Heterogeneous NoC Design for Efficient Broadcast-based Coherence Protocol Support

Mario Lodde, Jose Flich, Manuel E. Acacio
2012 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip  
In this paper we propose a simple control network that collects the acknowledgement messages and delivers them with a bounded and fixed latency, thus relieving the NoC from a large amount of messages.  ...  However, it generates much more traffic, thus stressing the NoC and having worse performance in terms of power consumption.  ...  ACKNOWLEDGEMENTS This work was supported by the Spanish MEC and MICINN, as well as European Commission FEDER funds, under Grant TIN2009-14475-C04.  ... 
doi:10.1109/nocs.2012.14 dblp:conf/nocs/LoddeFA12 fatcat:e4ncbjc34rfh3n6bsy73g5pnvy

Survey on System I/O Hardware Transactions and Impact on Latency, Throughput, and Other Factors [chapter]

Steen Larsen, Ben Lee
2014 Advances in Computers  
This chapter surveys the current state of highperformance I/O architecture advances and explores benefits and limitations.  ...  With the proliferation of CPU multi-cores within a system, multi-GB/s ports, and on-die integration of system functions, changes beyond the techniques surveyed may be needed for optimal I/O architecture  ...  The Blue Gene/P DMA engine services the core-to-core communications instead of serving general purpose I/O.  ... 
doi:10.1016/b978-0-12-420232-0.00002-7 fatcat:5nlke4zuuncnzjulijip5iz32i
« Previous Showing results 1 — 15 out of 16 results