A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Adaptive Prefetching on POWER7
2014
ACM Transactions on Parallel Computing
Adaptive prefetching is also able to reduce power consumption in some cases. Finally, we also evaluate our mechanism with SPECjbb2005, improving both performance and power consumption. ...
First we characterize-in terms of performance and power consumption-the prefetcher in that processor using microbenchmarks and SPEC CPU2006. ...
For that benchmark, adaptive prefetching is able to both improve performance by 21% and reduce memory power consumption by 22%. ...
doi:10.1145/2588889
fatcat:uifqhzvrhvep7fp2e4udl7mgwq
Adaptive and application dependent runtime guided hardware prefetcher reconfiguration on the IBM POWER7
[article]
2015
arXiv
pre-print
This lacking architectural feature causes systems to operate with prefetchers in a fixed configuration, which in many cases harms performance and energy consumption. ...
The paper shows significant performance improvements over a representative set of microbenchmarks and High Performance Computing (HPC) applications. ...
The trade-offs between performance improvement and power consumption in terms of memory bandwidth usage are explored in Section 4.4. ...
arXiv:1501.02282v1
fatcat:r6rscsojmjhntgpcirugalwnfq
Thread-Aware Adaptive Prefetcher on Multicore Systems
2016
ACM Transactions on Architecture and Code Optimization (TACO)
We compare our approach with the feedback directed prefetching technique and find that it provides 9% performance improvement on multicore systems, while saving the memory bandwidth consumption. ...
On a set of multithreaded parallel benchmarks, our thread-aware data prefetching mechanism improves the overall performance of 64-core system by 13% over a multimode prefetch baseline system with two-level ...
Ponomarev and Mei Yang for their comments on early versions of this article. ...
doi:10.1145/2890505
fatcat:tj7cjeszpfh5rkyzaq77iofoyu
IBM POWER7 multicore server processor
2011
IBM Journal of Research and Development
A new memory interface using buffered double-data-rate-three DRAM and improvements in reliability, availability, and serviceability are discussed. ...
The IBM POWER A processor is the dominant reduced instruction set computing microprocessor in the world today, with a rich history of implementation and innovation over the last 20 years. ...
To reduce power, processor frequency is reduced in POWER7, while higher performance is achieved through much more emphasis on microarchitecture improvements, such as aggressive out-of-order execution, ...
doi:10.1147/jrd.2011.2127330
fatcat:kztcasllyvgs5cuvzyf54myeyy
Power7: IBM's Next-Generation Server Processor
2010
IEEE Micro
Acknowledgments This material is based on work supported by DARPA under agreement no. HR0011-07-9-0002. ...
Power management The Power7 chip supports various adaptive power management features to allow for scaling power with workload. ...
Each Power7 core is designed to improve performance while considerably reducing core power. In addition, the processor implements robust RAS features and can detect most soft errors. ...
doi:10.1109/mm.2010.38
fatcat:u2m2sorvyvfyjicef7w7uezrx4
Increasing multicore system efficiency through intelligent bandwidth shifting
2015
2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)
This mechanism maximizes the utilization of memory bandwidth, thereby improving system performance and/or reducing memory power consumption. ...
Data prefetching efficiency depends on the prefetching algorithm. It also depends on the characteristics of the applications running on the system. ...
This approach will attempt to maximize the utilization of memory bandwidth, potentially improving system performance and/or reducing power consumption (e.g., by turning off the prefetcher for applications ...
doi:10.1109/hpca.2015.7056020
dblp:conf/hpca/JimenezBBOCV15
fatcat:5va2swz4ivg73kropvbad4fmba
Making data prefetch smarter
2012
Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12
Our adaptive prefetch mechanism improves performance with respect to the default prefetch setting up to 2.7X and 30% for single-threaded and multiprogrammed workloads, respectively. ...
We implement and evaluate adaptive prefetching in the context of an existing, commercial processor, namely the IBM POWER7. ...
consumption and cache pollution. ...
doi:10.1145/2370816.2370837
dblp:conf/IEEEpact/JimenezGCBBO12
fatcat:zca6mvge7bftbg3msqji4bxv2e
Enabling the Next Generation of Scalable Clusters
2010
2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Power consumption in 10's of MW
• A TGV is about 8MW
• Water cooling (both to remove heat and do it more efficiently
than air cooling)
Exascale will need 100-1000x power efficiency;
100-1000x space ...
Just by improving automatic vectorization, loop speedups of more than 5 have been observed on the Power 7. • But this is a long-term project Blue Waters Computing System* Reference petascale computing ...
• Beginning with the Power 3 chip, IBM provided a hardware component called a prefetch engine to monitor cache misses, guess the data pattern ("data stream") and prefetch data in anticipation of their ...
doi:10.1109/ccgrid.2010.135
dblp:conf/ccgrid/Gropp10
fatcat:pq3277d6sfeovkrwz7omhgumzm
Using an Adaptive HPC Runtime System to Reconfigure the Cache Hierarchy
2014
SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
Moreover, we demonstrate that, for some applications, switching to a software-controlled reconfigurable streaming buffer configuration can improve performance by up to 30% and save 75% of the cache energy ...
Our experiments using cycle-level simulations indicate that 67% of the cache energy can be saved with only a 2.4% performance penalty on average. ...
This power can be used to turn on more compute nodes and further improve performance for over-provisioned systems. ...
doi:10.1109/sc.2014.90
dblp:conf/sc/TotoniTK14
fatcat:xgr4jdw3zfcjzgrlc3jkoogwri
The BLIS Framework
2016
ACM Transactions on Mathematical Software
The systems for which we demonstrate the framework include state-of-the-art general-purpose, low-power, and many-core architectures. ...
We show how, with very little effort, the BLIS framework yields sequential and parallel implementations that are competitive with the performance of ATLAS, OpenBLAS (an effort to maintain and extend the ...
We thank TACC for granting access to the Stampede cluster, AMD and Texas Instruments for the donation of equipment used in our experiments, and Ted Barragy and Tim Mattson for their encouragement. ...
doi:10.1145/2755561
fatcat:yrv7amzpyvexdiimqutxtij5zm
Performance Evaluation of Scientific Applications on POWER8
[chapter]
2015
Lecture Notes in Computer Science
With POWER8 a new generation of POWER processors became available. ...
For a set of applications with significantly different performance signatures we explore efficient use of this processor architecture. ...
Introduction With power consumption limiting the performance of scalar processors there is a growing trend in high-performance computing (HPC) towards low clock frequencies but extremely parallel computing ...
doi:10.1007/978-3-319-17248-4_2
fatcat:upzjxnqi4vaudcog7w2pxrpqtu
Thread Row Buffers: Improving Memory Performance Isolation and Throughput in Multiprogrammed Environments
2013
IEEE transactions on computers
This, in turn, increases overall performance by 17 and 7 percent, respectively. ...
Therefore, memory access patterns have also changed and this has reduced row buffer locality significantly, degrading performance and energy efficiency. ...
ACKNOWLEDGMENTS This work has been supported by the Generalitat de Catalunya under grant 2009SGR1250, and the Spanish Ministry of Education and Science under grant TIN2010-18368. ...
doi:10.1109/tc.2012.173
fatcat:chjflbqwzfa6zkmlf3xjddjowi
Characterizing thread placement and thread priorities in the IBM POWER7 processor
[article]
2013
In such complex multithreaded designs, resource sharing between threads has a great impact on final performance. ...
We have analyzed thread placement and thread priorities in the IBM POWER7 processor. ...
Finally, Jimenez et al. also used hardware-thread priorities to perform a power and thermal characterization and reduce power consumption of the POWER6 [15] at the application, operating system and hardware ...
doi:10.26240/heal.ntua.3453
fatcat:6uoasf4s3respg5p2boyjuc4ei
Watts-inside: A hardware-software cooperative approach for Multicore Power Debugging
2013
2013 IEEE 31st International Conference on Computer Design (ICCD)
realized up to 5% power savings on chip power consumption. ...
Multicore computing presents unique challenges for performance and power optimizations due to the multiplicity of cores and the complexity of interactions between the hardware resources. ...
that improve power consumption. ...
doi:10.1109/iccd.2013.6657062
dblp:conf/iccd/0020YV13
fatcat:ewq7nzbul5dlrbasbaymjqlz6e
Clearing the clouds
2012
Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '12
Processor real-estate and power are misspent on large last-level caches that do not contribute to improved scale-out workload performance. ...
Modern aggressive out-of-order cores are excessively complex, consuming power and on-chip area without providing performance benefits to scale-out workloads. • Data working sets of scale-out workloads ...
Systems and Control'. ...
doi:10.1145/2150976.2150982
dblp:conf/asplos/FerdmanAKVAJKPAF12
fatcat:z37fymq7dzgzxhnrwjudviuzwi
« Previous
Showing results 1 — 15 out of 50 results