Filters








9,891 Hits in 2.0 sec

Second Generation Quad-Core Intel Xeon Processors Bring 45 nm Technology and a New Level of Performance to HPC Applications [chapter]

Paweł Gepner, David L. Fraser, Michał F. Kowalik
2008 Lecture Notes in Computer Science  
for many of HPC installations.  ...  The results presented clearly show that the new Intel Xeon processor 5400 family provides significant performance advantage on typical HPC workloads and would therefore be seen to be an appropriate choice  ...  point operations, that allow for performance optimized code generation.  ... 
doi:10.1007/978-3-540-69384-0_47 fatcat:sdflnvoggraoxdshx6pkaoiz44

Microarchitectural Characterization on a Mobile Workload

Woohyong Lee, Jiyoung Lee, Bo Kyung Park, R. Young Chul Kim
2021 Applied Sciences  
After the study, we could understand the bottleneck of workloads, especially in the cache sub-system.  ...  This study also identifies mobile system on chip (SoC) microarchitecture impacts, such as the cache subsystem, instruction-level parallelism, and branch performance.  ...  Cache Performance Data in traditional "caching" data systems reside in secondary storage and are read into the main memory only when operated on. This limits system performance.  ... 
doi:10.3390/app11031225 fatcat:dyenhuyk4rhmzkv7ap7lm6m73u

Improving execution unit occupancy on SMT-based processors through hardware-aware thread scheduling

Achille Peternier, Danilo Ansaloni, Daniele Bonetta, Cesare Pautasso, Walter Binder
2014 Future generations computer systems  
WorkOver, presented in this article, improves thread scheduling by increasing the performance of floating point-intensive workloads on Linux-based operating systems.  ...  We target the AMD Bulldozer and IBM POWER7 processors as case studies for specific hardware-oriented performance optimizations that increase the variety of instructions sent to each core to maximize the  ...  The Operating System (OS) kernel and scheduler try to optimize the performance of applications depending on the available hardware resources.  ... 
doi:10.1016/j.future.2013.06.015 fatcat:3pv2cxqtcfdrho3sbvhkuxgnzm

An analysis of operating system behavior on a simultaneous multithreaded architecture

Joshua A. Redstone, Susan J. Eggers, Henry M. Levy
2000 Proceedings of the ninth international conference on Architectural support for programming languages and operating systems - ASPLOS-IX  
For an OS-intensive workload, we ran the multithreaded Apache Web server on an 8-context SMT.  ...  To carry out this study, we (1) modified the Digital Unix 4.0d operating system to run on an SMT CPU, and (2) integrated our SMT Alpha instruction set simulator into the SimOS simulator to provide an execution  ...  We also show how an operating-system intensive Web server workload benefits from simultaneous multithreading.  ... 
doi:10.1145/378993.379245 fatcat:lxxt65x3szh3thzef5r6umjvs4

Clearing the clouds

Michael Ferdman, Babak Falsafi, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki
2012 Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '12  
General Terms Design, Measurement, PerformanceInstruction-and memory-level parallelism in scale-out workloads is low.  ...  Processor real-estate and power are misspent on large last-level caches that do not contribute to improved scale-out workload performance.  ...  work was partially supported by EuroCloud, Project No 247779 of the European Commission 7th RTD Framework Programme -Specific Cooperation Theme 3 'Information and Communication Technologies: Embedded Systems  ... 
doi:10.1145/2150976.2150982 dblp:conf/asplos/FerdmanAKVAJKPAF12 fatcat:z37fymq7dzgzxhnrwjudviuzwi

An analysis of operating system behavior on a simultaneous multithreaded architecture

Joshua A. Redstone, Susan J. Eggers, Henry M. Levy
2000 SIGPLAN notices  
For an OS-intensive workload, we ran the multithreaded Apache Web server on an 8-context SMT.  ...  To carry out this study, we (1) modified the Digital Unix 4.0d operating system to run on an SMT CPU, and (2) integrated our SMT Alpha instruction set simulator into the SimOS simulator to provide an execution  ...  We also show how an operating-system intensive Web server workload benefits from simultaneous multithreading.  ... 
doi:10.1145/356989.357012 fatcat:rlpqirj5ujcexegllofqledpge

An analysis of operating system behavior on a simultaneous multithreaded architecture

Joshua A. Redstone, Susan J. Eggers, Henry M. Levy
2000 SIGARCH Computer Architecture News  
For an OS-intensive workload, we ran the multithreaded Apache Web server on an 8-context SMT.  ...  To carry out this study, we (1) modified the Digital Unix 4.0d operating system to run on an SMT CPU, and (2) integrated our SMT Alpha instruction set simulator into the SimOS simulator to provide an execution  ...  We also show how an operating-system intensive Web server workload benefits from simultaneous multithreading.  ... 
doi:10.1145/378995.379245 fatcat:7m4rqlvgyrbvjee2b5kewta434

An analysis of operating system behavior on a simultaneous multithreaded architecture

Joshua A. Redstone, Susan J. Eggers, Henry M. Levy
2000 ACM SIGOPS Operating Systems Review  
For an OS-intensive workload, we ran the multithreaded Apache Web server on an 8-context SMT.  ...  To carry out this study, we (1) modified the Digital Unix 4.0d operating system to run on an SMT CPU, and (2) integrated our SMT Alpha instruction set simulator into the SimOS simulator to provide an execution  ...  We also show how an operating-system intensive Web server workload benefits from simultaneous multithreading.  ... 
doi:10.1145/384264.379245 fatcat:faz5urwyovgxpgul2eokacnhxa

Clearing the clouds

Michael Ferdman, Babak Falsafi, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki
2012 SIGARCH Computer Architecture News  
General Terms Design, Measurement, PerformanceInstruction-and memory-level parallelism in scale-out workloads is low.  ...  Processor real-estate and power are misspent on large last-level caches that do not contribute to improved scale-out workload performance.  ...  work was partially supported by EuroCloud, Project No 247779 of the European Commission 7th RTD Framework Programme -Specific Cooperation Theme 3 'Information and Communication Technologies: Embedded Systems  ... 
doi:10.1145/2189750.2150982 fatcat:26l7woyutjhodbffqiidze5i2e

Performance of database workloads on shared-memory systems with out-of-order processors

Parthasarathy Ranganathan, Kourosh Gharachorloo, Sarita V. Adve, Luiz André Barroso
1998 Proceedings of the eighth international conference on Architectural support for programming languages and operating systems - ASPLOS-VIII  
However, most current system designs have been optimized to perform well on scientific and engineering workloads.  ...  We show that an instruction stream buffer is effective in reducing the remaining instruction stalls in OLTP, providing a 17% reduction in execution time (approaching a perfect instruction cache to within  ...  We would also like to thank Jef Kennedy from Oracle for reviewing this manuscript, Marco Annaratone from WRL for supporting this work, and Drew Kramer from WRL for technical support.  ... 
doi:10.1145/291069.291067 dblp:conf/asplos/RanganathanGAB98 fatcat:x5qbk25rdzg45gsfimyiwuxmy4

Memory Centric Characterization and Analysis of SPEC CPU2017 Suite

Sarabjeet Singh, Manu Awasthi
2019 Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering - ICPE '19  
We also perform instruction execution and distribution analysis of the suite and find that the average instruction count for SPEC CPU2017 workloads is an order of magnitude higher than SPEC CPU2006 ones  ...  Our experiments reveal that the SPEC CPU2017 workloads are surprisingly memory intensive, with approximately 50% of all dynamic instructions being memory intensive ones.  ...  ACKNOWLEDGMENTS The authors thank the anonymous reviewers and shepherd for their useful comments and feedback.  ... 
doi:10.1145/3297663.3310311 dblp:conf/wosp/SinghA19 fatcat:25pfc6svsfewbasvtdebizkslm

A Case for Specialized Processors for Scale-Out Workloads

Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, Babak Falsafi
2014 IEEE Micro  
This work was partially supported by EuroCloud, project no. 247779 of the European Commission 7th RTD Framework Programme-Information and Communication Technologies: Computing Systems.  ...  We thank the PARSA lab for continual support and feedback, in particular Pejman Lotfi-Kamran and Javier Picorel for their assistance with the SPEC-web09 and SAT Solver benchmarks.  ...  Processor architectures optimized for desktop and parallel applications are not optimized for scale-out workloads that spend most of their time waiting for cache misses, resulting in a clear microarchitectural  ... 
doi:10.1109/mm.2014.41 fatcat:gowz5x2fjvbobhcm2p4qsy2nlu

iBench: Quantifying interference for datacenter applications

Christina Delimitrou, Christos Kozyrakis
2013 2013 IEEE International Symposium on Workload Characterization (IISWC)  
We first validate the effect that iBench workloads have on performance against a wide spectrum of DC applications.  ...  Understanding, reducing and managing interference can significantly impact the manner in which these large-scale systems operate.  ...  ACKNOWLEDGEMENTS We sincerely thank Daniel Sanchez and the anonymous reviewers for their useful feedback on earlier versions of this manuscript.  ... 
doi:10.1109/iiswc.2013.6704667 dblp:conf/iiswc/DelimitrouK13 fatcat:yfxysir4vjd4vg3qhx6w2dz56m

Characteristics of workloads used in high performance and technical computing

Razvan Cheveresan, Matt Ramsay, Chris Feucht, Ilya Sharapov
2007 Proceedings of the 21st annual international conference on Supercomputing - ICS '07  
Since prefetching plays an important role in the performance of computational workloads, we explore the prefetching potential and for parallel workloads we study the sharing properties of memory accesses  ...  For the selected workloads we provide a wide range of characterizations based on instruction tracing and hardware counter measurements.  ...  A good understanding of workload properties sheds light on resource utilizations in the system and can guide performance optimization both at the software and system configuration level.  ... 
doi:10.1145/1274971.1274984 dblp:conf/ics/CheveresanRFS07 fatcat:ptpam3kzxzcebp6jm3m3cahlaa

Hardware-aware Thread Scheduling: The Case of Asymmetric Multicore Processors

Achille Peternier, Danilo Ansaloni, Daniele Bonetta, Cesare Pautasso, Walter Binder
2012 2012 IEEE 18th International Conference on Parallel and Distributed Systems  
BulldOver, presented in this paper, improves thread scheduling by exploiting this hardware characteristic to increase performance of floating point-intensive workloads on Linux-based operating systems.  ...  In this paper we address this problem by targeting the AMD Bulldozer processor as case study for specific hardware-oriented performance optimizations.  ...  The Operating System (OS) kernel and scheduler try to optimize the performance of applications depending on the available hardware resources.  ... 
doi:10.1109/icpads.2012.62 dblp:conf/icpads/PeternierABPB12 fatcat:7pfi4b4kovf3nowis5sh4zja7a
« Previous Showing results 1 — 15 out of 9,891 results