Filters








14,606 Hits in 3.8 sec

Maximizing CMP throughput with mediocre cores

J.D. Davis, J. Laudon, K. Olukotun
2005 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)  
We use area models based on SPARC processors incorporating these architectural features.  ...  In this paper we compare the performance of area equivalent small, medium, and large-scale multithreaded chip multiprocessors (CMTs) using throughput-oriented applications.  ...  Acknowledgements We would like to thank Cong Fu, Venkatesh Iyengar, and the entire Niagara Architecture Group for their assistance with the performance modeling.  ... 
doi:10.1109/pact.2005.42 dblp:conf/IEEEpact/DavisLO05 fatcat:h5rgzutzjndzhdtyndkrewxxhm

Low Power Coarse-Grained Reconfigurable Instruction Set Processor [chapter]

Francisco Barat, Murali Jayapala, Tom Vander Aa, Rudy Lauwereins, Geert Deconinck, Henk Corporaal
2003 Lecture Notes in Computer Science  
Preliminary results show that the presented coarse-grained processor can achieve on average 2.5x the performance of a RISC processor at an 18% overhead in energy consumption.  ...  In this paper, we present a novel coarse-grained reconfigurable processor and study its power consumption.  ...  Future work will study in more detail the power consumption of the processor. Also, we will study the effects of spatial computation on the performance and power consumption of the processor.  ... 
doi:10.1007/978-3-540-45234-8_23 fatcat:4usoc63ulra2df3jx6n6yunlxy

History-aware, resource-based dynamic scheduling for heterogeneous multi-core processors

A.Z. Jooya, A. Baniasadi, M. Analoui
2011 IET Computers & Digital Techniques  
We show that HARD can be configured to achieve both performance and power improvements. We compare HARD to a complexity-based static scheduler and show that HARD outperforms this alternative.  ...  HARD relies on recording application resource utilization and throughput to adaptively change cores for applications during runtime.  ...  Such policies often focus on concurrent execution on the threads of an application on distinct processors. In this study applications are independent. We intro dynamic (or s CMPs.  ... 
doi:10.1049/iet-cdt.2009.0045 fatcat:h5yd2rnaone47gcuph6jwxhcmu

Performance/Watt

James Laudon
2005 SIGARCH Computer Architecture News  
As a case study, we compare Sun's TLP-oriented Niagara processor against the ILP-oriented dual-core Pentium Extreme Edition from Intel, showing that the Niagara processor has a significant performance/  ...  As a result, we are now at the point where the performance/Watt of subsequent generations of traditional ILP-focused processors on server workloads has been flat [4] or even decreasing.  ...  A complex processor chip is necessarily the effort of many people.  ... 
doi:10.1145/1105734.1105737 fatcat:ududf2eobbgbfamqkdxwptcrke

Exploring Multi-core Design Space: Heracles vs. Rocket Chip Generator

Eduardo André Neves
2018 Journal of Computers  
Its modularity allows quick development by varying the types of processor, memory, network interconnect and cache.  ...  The Rocket Chip Generator is one of these tools.  ...  In the study of multi-core systems, the capacity to model cache is important due to the need of cache coherence among different caches to simplify programming.  ... 
doi:10.17706/jcp.13.5.555-563 fatcat:i4km2rnotfcpbgd7eqci6o6d6a

Exploring the design space for a shared-cache multiprocessor

B. A. Nayfeh, K. Olukotun
1994 SIGARCH Computer Architecture News  
We study the performance of a cluster-based multiprocessor architecture in which processors within a cluster are tightly coupled via a shared cluster cache for various processor-cache configurations.  ...  with a smaller cache provides higher performance and better cost/ performance than a single processor with a larger cache and 2) this four cluster configuration can be scaled linearly in performance by  ...  Finally, we would like to thank Mark Horowitz, Anoop Gupta and the reviewers for their insightful comments on earlier drafts of this paper.  ... 
doi:10.1145/192007.192026 fatcat:ntywckvlzfczzmndlob5wd57si

Measuring the Efficiency of Cache Memory on Java Processors for Embedded Systems

Antonio Carlos S. Beck, Mateus B. Rutzig, Luigi Carro
2007 Journal of Integrated Circuits and Systems  
One of these advantages concerns memory utilization, impacting in less accesses and cache misses.  ...  In this work we analyze this impact in performance and energy consumption,comparing a Java processor with a RISC one based on a MIPS architecture with similar characteristics.  ...  Section 4 presents the simulation environment and the results regarding performance of the cache memory on the systems with various configurations.  ... 
doi:10.29292/jics.v2i1.230 fatcat:no2koxlbb5fs7dbaox5mgss67i

Multicore-Aware Code Co-Positioning to Reduce WCET on Dual-Core Processors with Shared Instruction Caches

Yiqiang Ding, Wei Zhang
2012 Journal of Computing Science and Engineering  
Furthermore, how to improve the WCET of applications that run on multicore processors is both significant and challenging as the WCET can be largely affected by the possible inter-core interferences in  ...  Our experiments indicate that the proposed multicore-aware code positioning approaches, not only improve the worstcase performance of the real-time threads but also make good tradeoffs between efficiency  ...  Due to the great impact of the L2 cache hit rate on the performance of multi-core processors, private L2 caches may have worse performance than a shared L2 cache with the same total size as each core with  ... 
doi:10.5626/jcse.2012.6.1.12 fatcat:cgkbu7oqovhi7itfbvzsegpql4

Modeling and Simulative Performance Analysis of SMP and Clustered Computer Architectures

Mark W. Burns, Alan D. George, Brad A. Wallace
2000 Simulation (San Diego, Calif.)  
Because the performance of a parallel algorithm on a specific architecture is dependent upon its communication-to-computation ratio, an analysis of communication latencies for bus transactions, cache coherence  ...  To demonstrate a typical use of the models, the performance of ten systems with one to eight processors and the Scalable Coherent Interface interconnection network is evaluated using a parallel matrix-multiplication  ...  , the impact of clustered processor system configurations on scalability, or the effect of local and global memories on parallel algorithm performance.  ... 
doi:10.1177/003754970007400203 fatcat:t3aac62snrchdb4yojb5fnuvja

Transparent Reconfigurable Acceleration for Heterogeneous Embedded Applications

Antonio Carlos S. Beck, Mateus B. Rutzig, Georgi Gaydadjiev, Luigi Carro
2008 2008 Design, Automation and Test in Europe  
Executing the MIBench suite, we show performance improvements of up to 2.5 times, while reducing 1.7 times the required energy, using trivial hardware resources.  ...  The proposed mechanism is responsible for transforming sequences of instructions at runtime to be executed on a dynamic coarse-grain reconfigurable array, supporting speculative execution.  ...  Results In our study we are using an improved VHDL version of the Minimips processor [26] , which is based on the R3000 version.  ... 
doi:10.1109/date.2008.4484843 dblp:conf/date/BeckRGC08 fatcat:jopozrcp65agdl66afdlahu4xu

Transparent reconfigurable acceleration for heterogeneous embedded applications

Antonio Carlos S. Beck, Mateus B. Rutzig, Georgi Gaydadjiev, Luigi Carro
2008 Proceedings of the conference on Design, automation and test in Europe - DATE '08  
Executing the MIBench suite, we show performance improvements of up to 2.5 times, while reducing 1.7 times the required energy, using trivial hardware resources.  ...  The proposed mechanism is responsible for transforming sequences of instructions at runtime to be executed on a dynamic coarse-grain reconfigurable array, supporting speculative execution.  ...  Results In our study we are using an improved VHDL version of the Minimips processor [26] , which is based on the R3000 version.  ... 
doi:10.1145/1403375.1403669 fatcat:2s2kyexcsvdihnmmzsjllvqxxe

Task Scheduling for Reliable Cache Architectures of Multiprocessor Systems

Makoto Sugihara, Tohru Ishihara, Kazuaki Murakami
2007 2007 Design, Automation & Test in Europe Conference & Exhibition  
This paper presents a task scheduling method for reliable cache architectures (RCAs) of multiprocessor systems.  ...  The RCAs dynamically switch their operation modes for reducing the usage of vulnerable SRAMs under real-time constraints.  ...  Figure 5 shows vulnerability and performance of a computer system on various cache configurations for executing a task.  ... 
doi:10.1109/date.2007.364511 fatcat:btlviccrlbccfmtk3xe5ojrma4

Instruction fetching

Richard Uhlig, David Nagle, Trevor Mudge, Stuart Sechrest, Joel Emer
1995 SIGARCH Computer Architecture News  
We study the impact of cache organization, transfer bandwidth, prefetching, and pipelined memory systems on machines that rely on the use of relatively small primary caches to facilitate increased clock  ...  Even so, under IBS, a stubborn lower bound on the instruction-fetch CPI remains as an obstacle to improving overall processor performance.  ...  A study of the effects of code bloat on instruction-cache performance must extend beyond SPEC to include a new set of workloads that better represents these effects.  ... 
doi:10.1145/225830.224445 fatcat:rhxzy45ambdtpf3cusayczjgmu

Emerging Non-volatile Memory Technologies Exploration Flow for Processor Architecture

Sophiane Senni, Lionel Torres, Gilles Sassatelli, Abdoulaye Gamatie, Bruno Mussard
2015 2015 IEEE Computer Society Annual Symposium on VLSI  
Most die area of today's systems-on-chips is occupied by memories. Hence, a significant proportion of total power is spent on memory systems.  ...  This paper describes an evaluation flow to explore next generation of the memory hierarchy of processor-based systems using new non-volatile memory technologies.  ...  The difference between the latency parameter in SRAM and MRAM will of course depend on the frequency used by the processor. In this study, the frequency used for the processor was 1GHz.  ... 
doi:10.1109/isvlsi.2015.126 dblp:conf/isvlsi/SenniTSGM15 fatcat:33unyux6nzdobl4hvcayebcpdm

The Coming Wave of Multithreaded Chip Multiprocessors

James Laudon, Lawrence Spracklen
2007 International journal of parallel programming  
on-chip shared secondary cache allows for more fine-grain parallelism to be effectively exploited by the CMP.  ...  To address these limits, the computer industry has embraced chip multiprocessing (CMP), predominately in the form of multiple high-performance superscalar processors on the same die.  ...  The study looked at commercial-grade configurations of the various benchmarks.  ... 
doi:10.1007/s10766-007-0033-6 fatcat:4gzhbtdumvablcjfy62osfb2g4
« Previous Showing results 1 — 15 out of 14,606 results