Filters








1,356 Hits in 5.8 sec

Memory hierarchy performance measurement of commercial dual-core desktop processors

Lu Peng, Jih-Kwon Peir, Tribuvan K. Prakash, Carl Staelin, Yen-Kuang Chen, David Koppelman
2008 Journal of systems architecture  
Three dual-core processors that we studied have shown benefits of some of these factors, but not all of them.  ...  In this paper, performance measurement on an Intel Core 2 Duo, an Intel Pentium D and an AMD Athlon 64 Â 2 processor are reported.  ...  Acknowledgements The comments from the second reviewer help a great deal to improve the content of this paper, especially leading to a bug found on the original STREAM benchmark. This work  ... 
doi:10.1016/j.sysarc.2008.02.004 fatcat:eyf3dydlvzhupfrow6j6j7nw3e

Quantifying the energy cost of data movement for emerging smart phone workloads on mobile platforms

Dhinakaran Pandiyan, Carole-Jean Wu
2014 2014 IEEE International Symposium on Workload Characterization (IISWC)  
To aid this study, we design micro-benchmarks that generate desired data movement patterns between different levels of the memory hierarchy and measure the instantaneous power consumed by the device when  ...  We perform a detailed investigation to quantify the impact of data movement on overall energy consumption of a popular, commercially-available smart phone device.  ...  The opinions, findings and conclusions or recommendations expressed in this manuscript are those of the authors and do not necessarily reflect the views of the Science Foundation of Arizona.  ... 
doi:10.1109/iiswc.2014.6983056 dblp:conf/iiswc/PandiyanW14 fatcat:iyrmxt7pxfar3l2hjgoenl5zsi

A Desktop Computer with a Reconfigurable Pentium®

Shih-Lien L. Lu, Peter Yiannacouras, Taeweon Suh, Rolf Kassa, Michael Konow
2008 ACM Transactions on Reconfigurable Technology and Systems  
Core 4 and Windows XP; however we have inserted a Xilinx Virtex-4 in place of the processor that should sit in the motherboard and have used the Virtex-4 to host a complete version of the Pentium r microprocessor  ...  Specifically, we perform preliminary experimentation/prototyping with an original Socket 7 based desktop processor system with typical hardware peripherals running modern operating systems such as Fedora  ...  We emulate a version of a commercial x86 desktop processor on an FPGA and run real operating systems on stock hardware [Lu et al. 2007] .  ... 
doi:10.1145/1331897.1331901 fatcat:xzumvw3ipzezzj4agejzknzqzy

An FPGA-based Pentium® in a complete desktop system

Shih-Lien L. Lu, Peter Yiannacouras, Rolf Kassa, Michael Konow, Taeweon Suh
2007 Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays - FPGA '07  
The increasing complexity of the processor  ...  In this work we emulate a version of a commercial x86 desktop processor on an FPGA to run real operating systems on stock hardware.  ...  commercial debut, (ii) we perform preliminary architectural enhancements which demonstrate the emulator's ability to measure the effect of microarchitectural changes on the complete system using the SPEC2000  ... 
doi:10.1145/1216919.1216927 dblp:conf/fpga/LuYKKS07 fatcat:eclbiayqtjdnlfzhi6qbradkxq

Clearing the clouds

Michael Ferdman, Babak Falsafi, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki
2012 SIGARCH Computer Architecture News  
General Terms Design, Measurement, Performance • Instruction-and memory-level parallelism in scale-out workloads is low.  ...  We find that inefficiency comes from the mismatch between the workload needs and modern processors, particularly in the organization of instruction and data memory systems and the processor core micro-architecture  ...  This work was partially supported by EuroCloud, Project No 247779 of the European Commission 7th RTD Framework Programme -Specific Cooperation Theme 3 'Information and Communication Technologies: Embedded  ... 
doi:10.1145/2189750.2150982 fatcat:26l7woyutjhodbffqiidze5i2e

Clearing the clouds

Michael Ferdman, Babak Falsafi, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki
2012 Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '12  
General Terms Design, Measurement, Performance • Instruction-and memory-level parallelism in scale-out workloads is low.  ...  We find that inefficiency comes from the mismatch between the workload needs and modern processors, particularly in the organization of instruction and data memory systems and the processor core micro-architecture  ...  This work was partially supported by EuroCloud, Project No 247779 of the European Commission 7th RTD Framework Programme -Specific Cooperation Theme 3 'Information and Communication Technologies: Embedded  ... 
doi:10.1145/2150976.2150982 dblp:conf/asplos/FerdmanAKVAJKPAF12 fatcat:z37fymq7dzgzxhnrwjudviuzwi

A Case for Specialized Processors for Scale-Out Workloads

Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, Babak Falsafi
2014 IEEE Micro  
Acknowledgments We thank the reviewers and readers for their feedback and suggestions on all earlier versions of this work.  ...  This work was partially supported by EuroCloud, project no. 247779 of the European Commission 7th RTD Framework Programme-Information and Communication Technologies: Computing Systems.  ...  To measure the frequency of readwrite sharing, we execute the workloads on cores split across two physical processors in separate sockets.  ... 
doi:10.1109/mm.2014.41 fatcat:gowz5x2fjvbobhcm2p4qsy2nlu

Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems

Daniel Hackenberg, Daniel Molka, Wolfgang E. Nagel
2009 Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture - Micro-42  
Using our benchmarks we present fundamental memory performance data and architectural properties of both processors.  ...  The potential of our approach is demonstrated with an in-depth comparison of ccNUMA multiprocessor systems with AMD (Shanghai) and Intel (Nehalem-EP) quad-core x86-64 processors that both feature integrated  ...  RELATED WORK Performance measurements are common practice to analyze implementation details of the memory hierarchy.  ... 
doi:10.1145/1669112.1669165 dblp:conf/micro/HackenbergMN09 fatcat:n6csedwzpvdxhg5pyvxvhgvbze

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

Michael Ferdman, Babak Falsafi, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki
2012 ACM Transactions on Computer Systems  
We find that inefficiency comes from the mismatch between the workload needs and modern processors, particularly in the organization of instruction and data memory systems and the processor core microarchitecture  ...  We use performance counters on modern servers to study scale-out workloads, finding that today's predominant processor microarchitecture is inefficient for running these workloads.  ...  APPENDIX A Comment: We multiply the number of LLC misses per cycle with the number of bytes fetched (64 bytes) and the frequency of the processor in Hertz.  ... 
doi:10.1145/2382553.2382557 fatcat:huy2nlmwibftnbrk32z77noowq

Understanding PARSEC performance on contemporary CMPs

Major Bhadauria, Vincent M. Weaver, Sally A. McKee
2009 2009 IEEE International Symposium on Workload Characterization (IISWC)  
We use hardware performance counters, taking a systems-level approach and varying common architectural parameters: number of out-of-order cores, memory hierarchy configurations, number of multiple simultaneous  ...  threads, number of memory channels, and processor frequencies.  ...  We also thank Chris Fensch from the University of Cambridge for his patches to enable execution on the SPARC platform.  ... 
doi:10.1109/iiswc.2009.5306793 dblp:conf/iiswc/BhadauriaWM09 fatcat:qx5lefg44ncnxc7ur34f3yvn4q

Computational Characteristics of Production Seismic Migration and its Performance on Novel Processor Architectures

Jairo Panetta, Paulo R. P. de Souza Filho, Carlos A. da Cunha Filho, Fernando M. Roxo da Motta, Silvio Sinedino Pinheiro, Ivan Pedr, Andre L. Romanelli Rosa, Luiz R. Monnerat, Leandro T. Carneiro, Carlos H.B. de Albrecht
2007 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07)  
Production load comprises thousands of jobs per year, consuming the installed park of a few thousand x86 CPU cores, with top production runs continuously using up to 1000 dedicated processors during 20  ...  Its port to quad-core x86 and Sony PlayStation 3 achieved very high price/performance and performance/watt gains over single core x86 machines. Port to the PS3 is described in detail.  ...  The authors gratefully acknowledge the continuous support of AMD, Intel and IBM, including the early availability of prototype boards for performance measures.  ... 
doi:10.1109/sbac-pad.2007.13 fatcat:wqv5evdtwravleckt4skqcondu

A new direction for computer architecture research

C.E. Kozyrakis, D.A. Patterson
1998 Computer  
At about the same time, Intel and Hewlett-Packard presented the basic characteristics of their next-generation IA-64 architecture, which is expected to dominate the high-performance processor market within  ...  computing domains that have shaped processor architecture for the past decade: • The uniprocessor desktop running technical and scientific applications, and • the multiprocessor server used for transaction  ...  The processor consists of 128 tiles, each of which has a processing core, small first-level caches backed by a larger amount of dynamic memory (128 Kbytes) used as main memory, and a reconfigurable functional  ... 
doi:10.1109/2.730733 fatcat:ykv5f53p5rfdfo4a72a4i25g2q

Power Consumption of GPUs from a Software Perspective [chapter]

Sylvain Collange, David Defour, Arnaud Tisserand
2009 Lecture Notes in Computer Science  
In this article we investigate, using measurements, how and where modern GPUs are using energy during various computations in a CUDA environment.  ...  Results are reported in GPU Commercial name Core Computing Memory Fab. # of Temp. freq. freq. freq. Process trans.  ...  We measure the power used with various combination of units (MAD Table 4 . Memory Hierarchy We measure how the memory hierarchy impacts the power consumption of GPUs.  ... 
doi:10.1007/978-3-642-01970-8_92 fatcat:5w6xmbxxevhj5ercioqjq3fu4i

Back to Thin-Core Massively Parallel Processors

Ami Marowka
2011 Computer  
Chipmakers believe they can resolve the fundamental technical impediments to building a 1,000-core processor and are studying the efficiency of novel advanced parallel architectures for many-core processors  ...  Published by the IEEE Computer Society 0018-9162/11/$26.00 © 2011 IEEE Examination of the innovations of the past three decades that brought chips to the point at which many-core processors are possible  ...  They must be aware of details such as the number of cores, the main memory layout, and the cache memory hierarchy. An efficient match increases performance and achieves the desired scalability.  ... 
doi:10.1109/mc.2011.133 fatcat:5zuo7tuq3feq5lxwypw5kartey

Exploring locking & partitioning for predictable shared caches on multi-cores

Vivy Suhendra, Tulika Mitra
2008 Proceedings of the 45th annual conference on Design automation - DAC '08  
Our study reveals certain design principles that strongly dictate the performance of a predictable memory hierarchy.  ...  Multi-core architectures consisting of multiple processing cores on a chip have become increasingly prevalent.  ...  This feature is available in several commercial processors (PowerPC 440 core, ARM 920T, Freescale Semiconductor's e300 core, etc).  ... 
doi:10.1145/1391469.1391545 dblp:conf/dac/SuhendraM08 fatcat:kie4bdbtabhlvaoq32oidjcrx4
« Previous Showing results 1 — 15 out of 1,356 results