Filters








72 Hits in 6.1 sec

The Impulse memory controller

Lixin Zhang, Zhen Fang, M. Parker, B.K. Mathew, L. Schaelicke, J.B. Carter, W.C. Hsieh, S.A. McKee
2001 IEEE transactions on computers  
We describe the design of the Impulse architecture and how an Impulse memory system can be used in a variety of ways to improve the performance of memory-bound applications.  ...  Our performance results demonstrate the effectiveness of these optimizations in a variety of scenarios. Using Impulse can speed up a range of applications from 20 percent to over a factor of 5.  ...  The approach has a high intrinsic computational cost, but its simplicity and scalability make it ideal for large data sets on current high-end systems.  ... 
doi:10.1109/12.966490 fatcat:aorvqc2pm5erbotcexbgbpks4i

The potential of the cell processor for scientific computing

Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, Katherine Yelick
2006 Proceedings of the 3rd conference on Computing frontiers - CF '06  
The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists.  ...  In this work, we examine the potential of using the forthcoming STI Cell processor as a building block for future high-end computing systems. Our work contains several novel contributions.  ...  Acknowledgments This work was supported by the Director, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.  ... 
doi:10.1145/1128022.1128027 dblp:conf/cf/WilliamsSOKHY06 fatcat:vmlyxmeyazgrrn5vseohoaohdi

Dynamic IPC/clock rate optimization

David H. Albonesi
1998 SIGARCH Computer Architecture News  
Current microprocessor designs set the functionality and clock rate of the chip at design time based on the configuration that achieves the best overall performance over a range of target applications.  ...  both generalpurpose and scientific applications.  ...  Acknowledgements The author wishes to thank the members of EE492 (Yehea Ismail, Xun Liu, Radu Secareanu, Patrick Furchill, and Justin Vlietstra) who helped flush out some of the initial ideas and developed  ... 
doi:10.1145/279361.279397 fatcat:qw4xpdjvcvgzriuhvdoqaaljiu

Hardware monitors for dynamic page migration

Mustafa M. Tikir, Jeffrey K. Hollingsworth
2008 Journal of Parallel and Distributed Computing  
In particular, we investigate the effectiveness of using cache miss profiles, Translation Lookaside Buffer (TLB) miss profiles and the content of the on-chip TLBs using the valid bit information.  ...  In this paper, we first introduce a profile-driven online page migration scheme and investigate its impact on the performance of multithreaded applications.  ...  to improve the performance of real scientific applications.  ... 
doi:10.1016/j.jpdc.2008.05.006 fatcat:z66tkdnxi5b6fbgkv6mbj7kz6i

Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations

Leonid Oliker, Andrew Canning, Jonathan Carter, John Shalf, David Skinner, Ethier Ethier, Rupak Biswas, Jahed Djomehri, Rob Van der Wijngaart
2003 Proceedings of the 2003 ACM/IEEE conference on Supercomputing - SC '03  
The growing gap between sustained and peak performance for scientific applications is a well-known problem in high end computing.  ...  Results demonstrate that the SX-6 achieves high performance on a large fraction of our applications and often significantly outperforms the cache-based architectures.  ...  Acknowledgements The authors would like to gratefully thank the Arctic Region Supercomputing Center for access to the NEC SX-6, the Center for Computational Sciences at ORNL for access to the IBM p690,  ... 
doi:10.1145/1048935.1050213 dblp:conf/sc/OlikerCCSSEBDW03 fatcat:pbiviyz2sraefohdct4e3nljxm

An Analysis of HPC Benchmarks in Virtual Machine Environments [chapter]

Anand Tikotekar, Geoffroy Vallée, Thomas Naughton, Hong Ong, Christian Engelmann, Stephen L. Scott
2009 Lecture Notes in Computer Science  
In this paper, we aim to study such potential causes by investigating the behavior and identifying patterns of various overheads for HPC benchmark applications.  ...  Based on the investigation of the overhead profiles for different benchmarks, we aim to address questions such as: Are the overhead profiles for a particular type of benchmarks (such as compute-bound)  ...  Further,We would like to work on the limitations of the performance measurement tools, such as Xenoprof, so that we can enhance application profiling.  ... 
doi:10.1007/978-3-642-00955-6_8 fatcat:qhe22qon6rar7lf7sksejh7tgy

Scientific Computing Kernels on the Cell Processor

Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, Katherine Yelick
2007 International journal of parallel programming  
The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists.  ...  In this work, we examine the potential of using the recently-released STI Cell processor as a building block for future high-end computing systems. Our work contains several novel contributions.  ...  Acknowledgments This work was supported by the Director, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.  ... 
doi:10.1007/s10766-007-0034-5 fatcat:e26uq4azkzf4hizdwfu6mt6hqy

Techniques for Shared Resource Management in Systems with Throughput Processors [article]

Rachata Ausavarungnirun
2018 arXiv   pre-print
The continued growth of the computational capability of throughput processors has made throughput processors the platform of choice for a wide variety of high performance computing applications.  ...  Graphics Processing Units (GPUs) are a prime example of throughput processors that can deliver high performance for applications ranging from typical graphics applications to general-purpose data parallel  ...  He taught me many important aspects of research and shaped me into the researcher I am today.  ... 
arXiv:1803.06958v1 fatcat:3mqbwegpkvdrpk6sqwb3ooyh7e

High Performance Computing Systems for Autonomous Spaceborne Missions

Thomas Sterling, Daniel S. Katz, Larry Bergman
2001 The international journal of high performance computing applications  
Commodity off-theshelf (COTS) Clusters may permit the direct application of commercial computing hardware in loosely coupled ensembles to benefit from the enormous investment of industry in mass-market  ...  high performance computing on spacecraft for deep space missions.  ...  The Defense Advanced Research Projects Agency (DARPA) and NSA provided additional funding for early research on PIM.  ... 
doi:10.1177/109434200101500306 fatcat:wb5djaeepzdhxes32magvfq4ai

Active memory operations

Zhen Fang, Lixin Zhang, John B. Carter, Ali Ibrahim, Michael A. Parker
2007 Proceedings of the 21st annual international conference on Supercomputing - ICS '07  
Based on a standard cell implementation, we predict that the circuitry required to support AMOs is less than 1% of the typical chip area of a high performance microprocessor.  ...  The performance of modern microprocessors is increasingly limited by their inability to hide main memory latency.  ...  Finally, based on a standard cell implementation, we predict that the circuitry required to support AMOs is less than 1% of the typical chip area of a high performance microprocessor.  ... 
doi:10.1145/1274971.1275004 dblp:conf/ics/FangZCIP07 fatcat:ajzlsvdgorezbk6isb6nnlno24

Optimizing main-memory join on modern hardware

S. Manegold, P. Boncz, M. Kersten
2002 IEEE Transactions on Knowledge and Data Engineering  
Finally, we investigate the effect of implementation techniques that optimize CPU resource usage.  ...  AbstractÐIn the past decade, the exponential growth in commodity CPU's speed has far outpaced advances in memory latency.  ...  Fig. 1 shows that the speed of commercial microprocessors has increased roughly 70 percent every year, while the speed of commodity DRAM has improved by little more than 50 percent over the past decade  ... 
doi:10.1109/tkde.2002.1019210 fatcat:atvpibjsifdbzkyjetmoxv3i3m

Evaluating the impact of simultaneous multithreading on network servers using real hardware

Yaoping Ruan, Vivek S. Pai, Erich Nahum, John M. Tracey
2005 Performance Evaluation Review  
The results of our evaluation suggest that the current SMT support in the Xeon is application and workload sensitive, and may not yield significant benefits for network servers.  ...  This paper examines the performance of simultaneous multithreading (SMT) for network servers using actual hardware, multiple network server applications, and several workloads.  ...  While much of the academic focus on SMT has been on scientific or computationintensive workloads, suitable for the High Performance Computing (HPC) community, a few simulation studies have explicitly examined  ... 
doi:10.1145/1071690.1064254 fatcat:g64hic5lwjdvfe7fh42runcgdi

Revisiting Symptom-Based Fault Tolerant Techniques against Soft Errors

Hwisoo So, Moslem Didehban, Yohan Ko, Reiley Jeyapaul, Jongho Kim, Youngbin Kim, Kyoungwoo Lee, Aviral Shrivastava
2021 Electronics  
Aggressive technology scaling and near-threshold computing have made soft error reliability one of the leading design considerations in modern embedded microprocessors.  ...  However, our detailed analysis of the fault coverage and performance overheads of such schemes reveals that the user-visible failure coverage, particularly of ReStore, is limited (29% on average).  ...  Interestingly, approximately 69% of silent output corruptions generate at least one symptom until the end of the application, whereas 31% of silent  ... 
doi:10.3390/electronics10233028 fatcat:6xmgwau25bgxpjm2jdhjop623y

Evaluating the impact of simultaneous multithreading on network servers using real hardware

Yaoping Ruan, Vivek S. Pai, Erich Nahum, John M. Tracey
2005 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems - SIGMETRICS '05  
The results of our evaluation suggest that the current SMT support in the Xeon is application and workload sensitive, and may not yield significant benefits for network servers.  ...  This paper examines the performance of simultaneous multithreading (SMT) for network servers using actual hardware, multiple network server applications, and several workloads.  ...  While much of the academic focus on SMT has been on scientific or computationintensive workloads, suitable for the High Performance Computing (HPC) community, a few simulation studies have explicitly examined  ... 
doi:10.1145/1064212.1064254 dblp:conf/sigmetrics/RuanPNT05 fatcat:lsclv7fzabc4nbp2wrszr5yaau

Feedback-directed page placement for ccNUMA via hardware-generated memory traces

Jaydeep Marathe, Vivek Thakkar, Frank Mueller
2010 Journal of Parallel and Distributed Computing  
This work develops a novel hardware-assisted page placement paradigm based on automated tracing of the memory references made by application threads.  ...  Non-uniform memory architectures with cache coherence (ccNUMA) are becoming increasingly common, not just for large-scale high performance platforms but also in the context of multi-cores architectures  ...  is supported by the Office of Science of the U.S.  ... 
doi:10.1016/j.jpdc.2010.08.015 fatcat:ab7yxk5xrzgclabwjupar4bgju
« Previous Showing results 1 — 15 out of 72 results