Filters








32,172 Hits in 7.1 sec

What Multilevel Parallel Programs Do When You Are Not Watching: A Performance Analysis Case Study Comparing MPI/OpenMP, MLP, and Nested OpenMP [chapter]

Gabriele Jost, Jesús Labarta, Judit Gimenez
2005 Lecture Notes in Computer Science  
c;..eatxici. . 4 drt2iled performance analysis is crucial to clarify these issues. The multilevel programming paradi=ps considered in this study are hybrid MPVOpenMP, MLP, and nested Openh4P.  ...  This model was originally designed for distributed memory architectures but is also suitable for shared memory systems.  ...  program under contract TIC200 1 -0995-C02-01, and by the European Center for Parallelism of Barcelona (CEPBA). -_ Keferences  ... 
doi:10.1007/978-3-540-31832-3_4 fatcat:ilwchraw6nb67hnyrbrdjitojm

Comparative Performance Analysis of Intel (R) Xeon Phi (TM), GPU, and CPU: A Case Study from Microscopy Image Analysis

George Teodoro, Tahsin Kurc, Jun Kong, Lee Cooper, Joel Saltz
2014 2014 IEEE 28th International Parallel and Distributed Processing Symposium  
We systematically implement and evaluate the performance of these operations on modern CPUs, GPUs, and MIC systems for a microscopy image analysis application.  ...  Our results show that the performance on a MIC of operations that perform regular data access is comparable or sometimes better than that on a GPU.  ...  NHLBI, by R01LM011119-01 and R01LM009239 from the NLM, and RC4MD005964 from the NIH, and CNPq (including projects 151346/2013-5 and 313931/2013-5).  ... 
doi:10.1109/ipdps.2014.111 pmid:25419088 pmcid:PMC4240026 fatcat:nyqwqlx5mjdjdhq365w6uig3ti

Evaluating Novel Memory System Alternatives for Speculative Multithreaded Computer Systems [chapter]

A. J. KleinOsowski, David J. Lilja
2004 High Performance Memory Systems  
Our results and cost− forperformance analysis show that, on average, the novel hybrid level−1 data cache (which merges a distributed level−1 data cache with the speculative memory buffer) has a 13 percent  ...  This work models and evaluates a new cache structure for scalable multithreaded computer systems.  ...  Thanks, as well, to Chris Johnson [10] and Chuck Li [13] for their work on the VHDL and Verilog implementations of the memory structures examined in this work.  ... 
doi:10.1007/978-1-4419-8987-1_16 fatcat:z4uk3labgbdbrhoxk3t3h75ngq

Modeling cache performance for embedded systems

Ogechukwu Kingsley Ugwueze, Chijindu C. V., Udeze C. C., Ahaneku A. M., Eneh N. J., Obinna M. Ezeja, Edward C. Anoliefo
2021 Bulletin of Electrical Engineering and Informatics  
This paper presents a cache performance model for embedded systems.  ...  The mean errors for bitcount, basicmath, and FFT benchmarks are 0.0263%, 2.4476%, and 1.9000% respectively. Therefore, the mean error for the three benchmarks is equal to 1.4579%.  ...  The aim of this present study is to presents a model of cache performance for embedded systems and the specifics objectives include:To develop a mathematical cache hit rate estimation model; To characterize  ... 
doi:10.11591/eei.v10i5.2459 fatcat:ppleei7ezrhqzbbxft34ppo2va

Accelerating Full-System Simulation through Characterizing and Predicting Operating System Performance

Seongbeom Kim, Fang Liu, Yan Solihin, Ravi Iyer, Li Zhao, William Cohen
2007 2007 IEEE International Symposium on Performance Analysis of Systems & Software  
This leads to an estimated simulation speedup of 4.9×, with an average performance prediction error of only 3.2%, and a worst case error of 4.2%.  ...  This paper seeks to address how to accelerate full-system simulation through studying, characterizing, and predicting the performance behavior of OS services.  ...  This leads to an estimated simulation speedup of 4.9×, with an average performance prediction error of only 3.2%, and a worst case error of 4.2%.  ... 
doi:10.1109/ispass.2007.363731 dblp:conf/ispass/KimLSIZC07 fatcat:nfa6q73rhrdmngqy5gl7xxrev4

STT-MRAM for real-time embedded systems

Kazi Asifuzzaman, Mikel Fernandez, Petar Radojković, Jaume Abella, Francisco J. Cazorla
2019 Proceedings of the International Symposium on Memory Systems - MEMSYS '19  
In this study, we investigate the feasibility of using STT-MRAM in real-time embedded systems by analyzing average system performance impact and WCET implications.  ...  Analyzing whether this deadline is met requires Worst Case Execution Time (WCET) Analysis, which is a fundamental part of evaluating any real-time system.  ...  No study, to our knowledge, has yet evaluated STT-MRAM particularly for real-time embedded systems with performance and WCET analysis. Meza et al.  ... 
doi:10.1145/3357526.3357531 dblp:conf/memsys/AsifuzzamanFRAC19 fatcat:xvbys24pyzbopj47ibqgjh2jnu

Memory Hierarchy Characterization of NoSQL Applications through Full-System Simulation

Adrian Colaso, Pablo Prieto, Jose Angel Herrero, Pablo Abad, Lucia G. Menezo, Valentin Puente, Jose Angel Gregorio
2018 IEEE Transactions on Parallel and Distributed Systems  
We compare how these data-serving applications behave with respect to other well-known benchmarks, such as SPEC CPU2006, PARSEC and NAS Parallel Benchmark.  ...  The methodology employed for evaluation relies on state-of-the-art full-system simulation tools, such as gem5.  ...  In both cases a per-level MPKI analysis is performed, comparing results from both processor models.  ... 
doi:10.1109/tpds.2017.2787150 fatcat:gkmpr4vfvfc6rh6qqzjs6quhje

System-level modeling and reliability analysis of microprocessor systems

Chang-Chih Chen, Linda Milor
2013 5th IEEE International Workshop on Advances in Sensors and Interfaces IWASI  
for the data cache while running a set of standard benchmarks. ……………………………………………………… 000 96 6 . 13 The distribution of the transition rate for the data cache while running a set of standard benchmarks  ...  The distribution of the transition rate for the data cache while running a set of standard benchmarks.  ... 
doi:10.1109/iwasi.2013.6576097 dblp:conf/iwasi/ChenM13 fatcat:dafiu7whyfclvakbdnd3xy37gq

An optimal memory allocation scheme for scratch-pad-based embedded systems

Oren Avissar, Rajeev Barua, Dave Stewart
2002 ACM Transactions on Embedded Computing Systems  
Results from our benchmarks show a 44.2% reduction in runtime from using our distributed stack strategy vs. using a unified stack, and a further 11.8% reduction in runtime from using a linear optimization  ...  Caches are avoided due to their cost and power consumption, and because they make it difficult to guarantee real-time performance.  ...  By comparing these bars we see that for every benchmark there is a distinct performance improvement when distributing the stack.  ... 
doi:10.1145/581888.581891 fatcat:e5jjmx4qqfepjnfoz533tcg2ha

Operating system benchmarking in the wake of lmbench

Aaron B. Brown, Margo I. Seltzer
1997 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems - SIGMETRICS '97  
Our analysis shows that off-chip memory system design continues to influence operating system performance in a significant way and that key design decisions (such as suboptimal choices of DRAM and cache  ...  technology, and memory-bus and cache coherency protocols) can essentially nullify the performance benefits of the aggressive execution core and sophisticated on-chip memory system of a modem processor  ...  An hbench:OS Case Study: The Performance of NetBSD on the Intel x86 Platform With both our new benchmark suite and a methodology for using it in hand, we returned to our original task of studying the architectural  ... 
doi:10.1145/258612.258690 dblp:conf/sigmetrics/BrownS97 fatcat:yt5uohe44zhftgwk6a6iib5pxi

Operating system benchmarking in the wake of lmbench

Aaron B. Brown, Margo I. Seltzer
1997 Performance Evaluation Review  
Our analysis shows that off-chip memory system design continues to influence operating system performance in a significant way and that key design decisions (such as suboptimal choices of DRAM and cache  ...  technology, and memory-bus and cache coherency protocols) can essentially nullify the performance benefits of the aggressive execution core and sophisticated on-chip memory system of a modem processor  ...  An hbench:OS Case Study: The Performance of NetBSD on the Intel x86 Platform With both our new benchmark suite and a methodology for using it in hand, we returned to our original task of studying the architectural  ... 
doi:10.1145/258623.258690 fatcat:vgsbgckfarcs7gsqjqtalp7awy

Contention-Aware Scheduling on Multicore Systems

Sergey Blagodurov, Sergey Zhuravlev, Alexandra Fedorova
2010 ACM Transactions on Computer Systems  
As a result of this analysis we discovered a classification scheme that addresses not only contention for cache space, but contention for other shared resources, such as the memory controller, memory bus  ...  for individual applications and in optimizing system energy consumption.  ...  Worst-case performance improvement is obtained by comparing the worst-case performance (across all the runs) under DI and DIO with the worst-case performance under DEFAULT.  ... 
doi:10.1145/1880018.1880019 fatcat:eo3ush725jh2tme2zr7i7jpixq

On the effectiveness of cache partitioning in hard real-time systems

Sebastian Altmeyer, Roeland Douma, Will Lunniss, Robert I. Davis
2016 Real-time systems  
We evaluate the performance of cache partitioning compared to state-of-the-art pre-emption cost analysis based on benchmark code and on a large number of synthetic tasksets with both fixed priority and  ...  partition, inter-task cache eviction is avoided, and timing verification is reduced to the standard worst-case execution time analysis used in nonpre-emptive systems.  ...  , and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.  ... 
doi:10.1007/s11241-015-9246-8 fatcat:j2byhbxhp5bwro2okxtcu7phnm

OUTSTANDING PAPER: Evaluation of Cache Partitioning for Hard Real-Time Systems

Sebastian Altmeyer, Roeland Douma, Will Lunniss, Robert I. Davis
2014 2014 26th Euromicro Conference on Real-Time Systems  
We then evaluate the performance of cache partitioning compared to state-of-the-art pre-emption cost analysis based on benchmark code and on a large number of synthetic tasksets.  ...  , inter-task cache eviction is avoided, and timing verification is reduced to the standard worstcase execution time (WCET) analysis used in non-pre-emptive systems.  ...  Acknowledgements This work was partially funded by the UK EPSRC through the MCC project (EP/K011626/1), the Engineering Doctorate Centre in Large-Scale Complex IT Systems (EP/F501374/1) and the COST Action  ... 
doi:10.1109/ecrts.2014.11 dblp:conf/ecrts/AltmeyerDLD14 fatcat:5brq6hmtuffqvo4ykvlgrlndke

System implications of LLC MSHRs in scalable memory systems

Mario Donato Marino, Kuan-Ching Li
2017 Microprocessors and microsystems  
cache line, it is fundamental to evaluate the impact of these elements in scalable memory systems.  ...  for random ones.  ...  Sensitivity Analysis We perform a sensitivity analysis to assess the impact of the key aspects: number of memory MSHRs and MCs/ranks; number of cores; and high-speed transmission delays.  ... 
doi:10.1016/j.micpro.2016.12.007 fatcat:jen7qxqfx5fctgqlbhjaysd62a
« Previous Showing results 1 — 15 out of 32,172 results