Filters








4,661 Hits in 3.3 sec

Performance analysis and its impact on design

P. Bose, T.M. Conte
1998 Computer  
Most current processor design teams use trace sampling for large cache simulation as well as for processor simulator runs.  ...  Given the timing specifications for single instructions and pairs (with and without dependencies, and for infinite and finite caches or translation look-aside buffers), it is possible to compute the best  ... 
doi:10.1109/2.675632 fatcat:koqgxmbwwnehhpr6kdbcnf4v7u

An analysis of the effects of miss clustering on the cost of a cache miss

Thomas R. Puzak, A. Hartstein, P. G. Emma, V. Srinivasan, Jim Mitchell
2007 Proceedings of the 4th international conference on Computing frontiers - CF '07  
The cost of a miss is displayed (graphed) as a histogram, which represents a precise readout showing a detailed visualization of the cost of each cache miss throughout all levels of the memory hierarchy  ...  In this paper we describe a new technique, called pipeline spectroscopy, and use it to measure the cost of each cache miss.  ...  We use trace tapes produced for the IBM zSeries processor family.  ... 
doi:10.1145/1242531.1242536 dblp:conf/cf/PuzakHESM07 fatcat:gqu2lqovj5arjocydykiiiayve

Impact of sharing-based thread placement on multithreaded architectures

R. Thekkath, S. J. Eggers
1994 SIGARCH Computer Architecture News  
Rather than decreasing, compulsory and invalidation misses remained nearly constant across all placement algorithms, for all processor configurations, even with an infinite cache.  ...  Although this improves processor utilization, it can increase cache interference and degrade overall performance.  ...  Karlin, Hank Levy and Dan Nussbaum for useful comments and discussions.  ... 
doi:10.1145/192007.192027 fatcat:3p7oizzw2ffjtekdgisjebjyyy

Environment for PowerPC microarchitecture exploration

M. Moudgill, J.-D. Wellman, J.H. Moreno
1999 IEEE Micro  
Complex design trade-offs require accurate and timely performance modeling, which in turn requires flexible, efficient environments for exploring microarchitecture processor performance.  ...  Workload-driven simulation models are essential for microprocessor design space exploration.  ...  The RS/6000 and AS/400 development organizations provided invaluable help on microarchitecture features and implementations, and hardwarecollected execution traces for the experiments.  ... 
doi:10.1109/40.768496 fatcat:ujmi2wb46vebrphxc3ul5gmoie

An evaluation of directory schemes for cache coherence

A. Agarwal, R. Simoni, J. Hennessy, M. Horowitz
1988 SIGARCH Computer Architecture News  
Slight modifications to directory schemes can make them competitive in performance with snoopy cache schemes for small multiprocessors.  ...  Directory schemes for cache coherence are potentially attractive in large multiprocessor systems that are beyond the scaling limits of the snoopy cache schemes.  ...  These snoopy cache schemes also interfere with the processor-cache connection.  ... 
doi:10.1145/633625.52432 fatcat:5grzygbxgbdbjlbnzxhrhzxyli

A case study in top-down performance estimation for a large-scale parallel application

Ilya Sharapov, Robert Kroeger, Guy Delamarter, Razvan Cheveresan, Matthew Ramsay
2006 Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '06  
For GTC, we identify the important phases of the iteration and perform low-level analysis that includes instruction tracing and component simulations of processor and memory systems.  ...  Lowlevel analysis is complemented with scalability estimates based on modeling MPI, OpenMP and I/O activity in the code.  ...  For example, the processor microarchitecture can be represented by cycle-accurate models or software components with impact on application performance can be represented with component models.  ... 
doi:10.1145/1122971.1122985 dblp:conf/ppopp/SharapovKDCR06 fatcat:2vuaiivpendazfoo5tymauu2ay

An evaluation of directory schemes for cache coherence

Anant Agarwal, Richard Simoni, John Hennessy, Mark Horowitz
1998 25 years of the international symposia on Computer architecture (selected papers) - ISCA '98  
Slight modifications to directory schemes can make them competitive in performance with snoopy cache schemes for small multiprocessors.  ...  Directory schemes for cache coherence are potentially attractive in large multiprocessor systems that are beyond the scaling limits of the snoopy cache schemes.  ...  Acknowledgements Our thanks to Roberto Bisiani and the Speech Group at CMU for letting us use their VAX 8350 to obtain traces.  ... 
doi:10.1145/285930.285995 dblp:conf/isca/AgarwalSHH98 fatcat:ewzcctxncze55ekdolxkigkmei

A performance methodology for commercial servers

S. R. Kunkel, R. J. Eickemeyer, M. H. Lipasti, T. J. Mullins, B. O'Krafka, H. Rosenberg, S. P. VanderWiel, P. L. Vitale, L. D. Whitley
2000 IBM Journal of Research and Development  
Optimization of performance among commercial applications is not simply an exercise in using traces to maximize the processor MIPS.  ...  Creation of input data for performance models on the basis of measured workload information.  ...  traces, Men-Chow Chiang for establishing mean value analysis as our preferred technique for high-level modeling, Brad Nelson for his many hours of collecting I/O traces, and Harold Kossman for his many  ... 
doi:10.1147/rd.446.0851 fatcat:ebdeieh2ifcihmhr6jx7ytzjwq

Performance of database workloads on shared-memory systems with out-of-order processors

Parthasarathy Ranganathan, Kourosh Gharachorloo, Sarita V. Adve, Luiz André Barroso
1998 Proceedings of the eighth international conference on Architectural support for programming languages and operating systems - ASPLOS-VIII  
This paper examines the behavior of database workloads on shared-memory multiprocessors with aggressive out-of-order processors, and considers simple optimizations that can provide further performance  ...  for OLTP and DSS, respectively.  ...  Acknowledgements This paper benefited from discussions with Norm Jouppi, Jack Lo, and Dan Scales, and from comments by the anonymous reviewers.  ... 
doi:10.1145/291069.291067 dblp:conf/asplos/RanganathanGAB98 fatcat:x5qbk25rdzg45gsfimyiwuxmy4

Footprints in the cache

Dominique Thiebaut, Harold S. Stone
1987 ACM Transactions on Computer Systems  
This paper develops an analytical model for cache-reload transients and compares the model to observations based on several address traces.  ...  A simulation based on program-address traces shows excellent agreement between the model and the observations.  ...  model.  ... 
doi:10.1145/29868.32979 fatcat:ss7vpdmp2vgfznjchim7ln5gau

TPTS: A Novel Framework for Very Fast Manycore Processor Architecture Simulation

Sangyeun Cho, Socrates Demetriades, Shayne Evans, Lei Jin, Hyunjin Lee, Kiyeon Lee, Michael Moeng
2008 2008 37th International Conference on Parallel Processing  
We design and implement tsim, an event-driven manycore processor simulator that models detailed memory hierarchy, interconnect, and coherence protocol models based on the proposed TPTS framework.  ...  This paper proposes and evaluates a fast manycore processor simulation framework called Two-Phase Trace-driven Simulation (TPTS), which splits detailed timing simulation into a trace generation phase and  ...  Authors thank the anonymous reviewers for their constructive comments.  ... 
doi:10.1109/icpp.2008.7 dblp:conf/icpp/ChoDEJLLM08 fatcat:nv3ezljbwfgqjcyr44yb3fsi5a

Traffic characteristics of a distributed memory system

Jonathan M Smith, David J Farber
1991 Computer networks and ISDN systems  
Second, we examine references in units of "blocks", first using a one-block cache model and then with an infinite cache.  ...  In this paper, we study memory reference strings gathered with a tracing program we devised. We study several models.  ...  ''Enough'' cache (infinite) To analyze the effects of successful caching in the face of increasing main memory sizes for workstations, we chose the extremal case of ''enough'' main memory for caching all  ... 
doi:10.1016/0169-7552(91)90006-x fatcat:j4icdjevjnfs5nivy5oki73j4m

In-N-Out: Reproducing Out-of-Order Superscalar Processor Behavior from Reduced In-Order Traces

Kiyeon Lee, Sangyeun Cho
2011 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems  
The prepared in-order trace then drives a novel simulation algorithm that models an out-of-order processor.  ...  During trace generation, we use a functional cache simulator to capture interesting processor events such as uncore accesses in the program order.  ...  Machine model We use two different machine models in experiments, "baseline" and "combined." The baseline model assumes infinite MSHRs and no data prefetching.  ... 
doi:10.1109/mascots.2011.16 dblp:conf/mascots/LeeC11 fatcat:qeycvfrkuvh77d4gy6yx4bhora

Modeling communication in parallel algorithms

Jaswinder Pal Singh, Edward Rothberg, Anoop Gupta
1994 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures - SPAA '94  
In fact, this may be one reason why more realistic models than the PRAM have not found favor with algorithm designers: Not that the models do not capture architectural communication costs adequately, but  ...  There are several algorithms and problems for which these properties can be determined analytically (e.g. dense linear algebra, regular grid problems, FFTs).  ...  , with 0 set to 1.Barnes-HuC comm. with infinite caches.  ... 
doi:10.1145/181014.181329 dblp:conf/spaa/SinghRG94 fatcat:z7qvqpxs75b5ndwcbi5n2zspnu

Architectural performance analysis of FPGA synthesized LEON processors

Corentin Damman, Gregory Edison, Fabrice Guet, Eric Noulard, Luca Santinelli, Jerome Hugues
2016 Proceedings of the 27th International Symposium on Rapid System Prototyping Shortening the Path from Specification to Prototype - RSP '16  
Benchmarking exposes key parameters to execution time variability allowing for accurate probabilistic modeling of system dynamics.  ...  Current processors have gone through multiple internal optimization to speed-up the average execution time e.g. pipelines, branch prediction.  ...  The authors thanks Thales Avionics for their support through the ARISE Chair.  ... 
doi:10.1145/2990299.2990306 dblp:conf/rsp/DammanEGNSH16 fatcat:qqhzehyzx5cwfo64c47c4vy4ne
« Previous Showing results 1 — 15 out of 4,661 results