50 Hits in 3.5 sec

Adaptive Prefetching on POWER7

Víctor Jiménez, Francisco J. Cazorla, Roberto Gioiosa, Alper Buyuktosunoglu, Pradip Bose, Francis P. O'Connell, Bruce G. Mealey
2014 ACM Transactions on Parallel Computing  
Adaptive prefetching is also able to reduce power consumption in some cases. Finally, we also evaluate our mechanism with SPECjbb2005, improving both performance and power consumption.  ...  First we characterize-in terms of performance and power consumption-the prefetcher in that processor using microbenchmarks and SPEC CPU2006.  ...  For that benchmark, adaptive prefetching is able to both improve performance by 21% and reduce memory power consumption by 22%.  ... 
doi:10.1145/2588889 fatcat:uifqhzvrhvep7fp2e4udl7mgwq

Adaptive and application dependent runtime guided hardware prefetcher reconfiguration on the IBM POWER7 [article]

David Prat, Cristobal Ortega, Marc Casas, Miquel Moretó, Mateo Valero
2015 arXiv   pre-print
This lacking architectural feature causes systems to operate with prefetchers in a fixed configuration, which in many cases harms performance and energy consumption.  ...  The paper shows significant performance improvements over a representative set of microbenchmarks and High Performance Computing (HPC) applications.  ...  The trade-offs between performance improvement and power consumption in terms of memory bandwidth usage are explored in Section 4.4.  ... 
arXiv:1501.02282v1 fatcat:r6rscsojmjhntgpcirugalwnfq

Thread-Aware Adaptive Prefetcher on Multicore Systems

Peng Liu, Jiyang Yu, Michael C. Huang
2016 ACM Transactions on Architecture and Code Optimization (TACO)  
We compare our approach with the feedback directed prefetching technique and find that it provides 9% performance improvement on multicore systems, while saving the memory bandwidth consumption.  ...  On a set of multithreaded parallel benchmarks, our thread-aware data prefetching mechanism improves the overall performance of 64-core system by 13% over a multimode prefetch baseline system with two-level  ...  Ponomarev and Mei Yang for their comments on early versions of this article.  ... 
doi:10.1145/2890505 fatcat:tj7cjeszpfh5rkyzaq77iofoyu

IBM POWER7 multicore server processor

B. Sinharoy, R. Kalla, W. J. Starke, H. Q. Le, R. Cargnoni, J. A. Van Norstrand, B. J. Ronchetti, J. Stuecheli, J. Leenstra, G. L. Guthrie, D. Q. Nguyen, B. Blaner (+3 others)
2011 IBM Journal of Research and Development  
A new memory interface using buffered double-data-rate-three DRAM and improvements in reliability, availability, and serviceability are discussed.  ...  The IBM POWER A processor is the dominant reduced instruction set computing microprocessor in the world today, with a rich history of implementation and innovation over the last 20 years.  ...  To reduce power, processor frequency is reduced in POWER7, while higher performance is achieved through much more emphasis on microarchitecture improvements, such as aggressive out-of-order execution,  ... 
doi:10.1147/jrd.2011.2127330 fatcat:kztcasllyvgs5cuvzyf54myeyy

Power7: IBM's Next-Generation Server Processor

Ron Kalla, Balaram Sinharoy, William J. Starke, Michael Floyd
2010 IEEE Micro  
Acknowledgments This material is based on work supported by DARPA under agreement no. HR0011-07-9-0002.  ...  Power management The Power7 chip supports various adaptive power management features to allow for scaling power with workload.  ...  Each Power7 core is designed to improve performance while considerably reducing core power. In addition, the processor implements robust RAS features and can detect most soft errors.  ... 
doi:10.1109/mm.2010.38 fatcat:u2m2sorvyvfyjicef7w7uezrx4

Increasing multicore system efficiency through intelligent bandwidth shifting

Victor Jimenez, Alper Buyuktosunoglu, Pradip Bose, Francis P. O'Connell, Francisco Cazorla, Mateo Valero
2015 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)  
This mechanism maximizes the utilization of memory bandwidth, thereby improving system performance and/or reducing memory power consumption.  ...  Data prefetching efficiency depends on the prefetching algorithm. It also depends on the characteristics of the applications running on the system.  ...  This approach will attempt to maximize the utilization of memory bandwidth, potentially improving system performance and/or reducing power consumption (e.g., by turning off the prefetcher for applications  ... 
doi:10.1109/hpca.2015.7056020 dblp:conf/hpca/JimenezBBOCV15 fatcat:5va2swz4ivg73kropvbad4fmba

Making data prefetch smarter

Victor Jiménez, Roberto Gioiosa, Francisco J. Cazorla, Alper Buyuktosunoglu, Pradip Bose, Francis P. O'Connell
2012 Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12  
Our adaptive prefetch mechanism improves performance with respect to the default prefetch setting up to 2.7X and 30% for single-threaded and multiprogrammed workloads, respectively.  ...  We implement and evaluate adaptive prefetching in the context of an existing, commercial processor, namely the IBM POWER7.  ...  consumption and cache pollution.  ... 
doi:10.1145/2370816.2370837 dblp:conf/IEEEpact/JimenezGCBBO12 fatcat:zca6mvge7bftbg3msqji4bxv2e

Enabling the Next Generation of Scalable Clusters

William D. Gropp
2010 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing  
Power consumption in 10's of MW • A TGV is about 8MW • Water cooling (both to remove heat and do it more efficiently than air cooling) Exascale will need 100-1000x power efficiency; 100-1000x space  ...  Just by improving automatic vectorization, loop speedups of more than 5 have been observed on the Power 7. • But this is a long-term project Blue Waters Computing System* Reference petascale computing  ...  • Beginning with the Power 3 chip, IBM provided a hardware component called a prefetch engine to monitor cache misses, guess the data pattern ("data stream") and prefetch data in anticipation of their  ... 
doi:10.1109/ccgrid.2010.135 dblp:conf/ccgrid/Gropp10 fatcat:pq3277d6sfeovkrwz7omhgumzm

Using an Adaptive HPC Runtime System to Reconfigure the Cache Hierarchy

Ehsan Totoni, Josep Torrellas, Laxmikant V. Kale
2014 SC14: International Conference for High Performance Computing, Networking, Storage and Analysis  
Moreover, we demonstrate that, for some applications, switching to a software-controlled reconfigurable streaming buffer configuration can improve performance by up to 30% and save 75% of the cache energy  ...  Our experiments using cycle-level simulations indicate that 67% of the cache energy can be saved with only a 2.4% performance penalty on average.  ...  This power can be used to turn on more compute nodes and further improve performance for over-provisioned systems.  ... 
doi:10.1109/sc.2014.90 dblp:conf/sc/TotoniTK14 fatcat:xgr4jdw3zfcjzgrlc3jkoogwri

The BLIS Framework

Field G. Van Zee, Vernon Austel, John A. Gunnels, Lee Killough, Tyler M. Smith, Bryan Marker, Tze Meng Low, Robert A. Van De Geijn, Francisco D. Igual, Mikhail Smelyanskiy, Xianyi Zhang, Michael Kistler
2016 ACM Transactions on Mathematical Software  
The systems for which we demonstrate the framework include state-of-the-art general-purpose, low-power, and many-core architectures.  ...  We show how, with very little effort, the BLIS framework yields sequential and parallel implementations that are competitive with the performance of ATLAS, OpenBLAS (an effort to maintain and extend the  ...  We thank TACC for granting access to the Stampede cluster, AMD and Texas Instruments for the donation of equipment used in our experiments, and Ted Barragy and Tim Mattson for their encouragement.  ... 
doi:10.1145/2755561 fatcat:yrv7amzpyvexdiimqutxtij5zm

Performance Evaluation of Scientific Applications on POWER8 [chapter]

Andrew V. Adinetz, Paul F. Baumeister, Hans Böttiger, Thorsten Hater, Thilo Maurer, Dirk Pleiter, Wolfram Schenck, Sebastiano Fabio Schifano
2015 Lecture Notes in Computer Science  
With POWER8 a new generation of POWER processors became available.  ...  For a set of applications with significantly different performance signatures we explore efficient use of this processor architecture.  ...  Introduction With power consumption limiting the performance of scalar processors there is a growing trend in high-performance computing (HPC) towards low clock frequencies but extremely parallel computing  ... 
doi:10.1007/978-3-319-17248-4_2 fatcat:upzjxnqi4vaudcog7w2pxrpqtu

Thread Row Buffers: Improving Memory Performance Isolation and Throughput in Multiprogrammed Environments

Enric Herrero, Jose Gonzalez, Ramon Canal, Dean Tullsen
2013 IEEE transactions on computers  
This, in turn, increases overall performance by 17 and 7 percent, respectively.  ...  Therefore, memory access patterns have also changed and this has reduced row buffer locality significantly, degrading performance and energy efficiency.  ...  ACKNOWLEDGMENTS This work has been supported by the Generalitat de Catalunya under grant 2009SGR1250, and the Spanish Ministry of Education and Science under grant TIN2010-18368.  ... 
doi:10.1109/tc.2012.173 fatcat:chjflbqwzfa6zkmlf3xjddjowi

Characterizing thread placement and thread priorities in the IBM POWER7 processor [article]

Stylianos-Filippos A. Manousopoulos, National Technological University Of Athens, National Technological University Of Athens, Νεκτάριος Κοζύρης
In such complex multithreaded designs, resource sharing between threads has a great impact on final performance.  ...  We have analyzed thread placement and thread priorities in the IBM POWER7 processor.  ...  Finally, Jimenez et al. also used hardware-thread priorities to perform a power and thermal characterization and reduce power consumption of the POWER6 [15] at the application, operating system and hardware  ... 
doi:10.26240/heal.ntua.3453 fatcat:6uoasf4s3respg5p2boyjuc4ei

Watts-inside: A hardware-software cooperative approach for Multicore Power Debugging

Jie Chen, Fan Yao, Guru Venkataramani
2013 2013 IEEE 31st International Conference on Computer Design (ICCD)  
realized up to 5% power savings on chip power consumption.  ...  Multicore computing presents unique challenges for performance and power optimizations due to the multiplicity of cores and the complexity of interactions between the hardware resources.  ...  that improve power consumption.  ... 
doi:10.1109/iccd.2013.6657062 dblp:conf/iccd/0020YV13 fatcat:ewq7nzbul5dlrbasbaymjqlz6e

Clearing the clouds

Michael Ferdman, Babak Falsafi, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki
2012 Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '12  
Processor real-estate and power are misspent on large last-level caches that do not contribute to improved scale-out workload performance.  ...  Modern aggressive out-of-order cores are excessively complex, consuming power and on-chip area without providing performance benefits to scale-out workloads. • Data working sets of scale-out workloads  ...  Systems and Control'.  ... 
doi:10.1145/2150976.2150982 dblp:conf/asplos/FerdmanAKVAJKPAF12 fatcat:z37fymq7dzgzxhnrwjudviuzwi
« Previous Showing results 1 — 15 out of 50 results