8,016 Hits in 4.7 sec

A performance evaluation of cache injection in bus-based shared memory multiprocessors

Aleksandar Milenkovic, Veljko Milutinovic
2002 Microprocessors and microsystems  
Software-controlled cache prefetching and data forwarding are two widely used techniques for tolerating high memory latency in scalable cache-coherent shared memory multiprocessors.  ...  However, some previous studies have shown that these techniques are not so effective in bus-based shared memory multiprocessors.  ...  Another direction is to implement some kind of cache injection in scalable cache coherent shared memory multiprocessors [8] .  ... 
doi:10.1016/s0141-9331(01)00146-6 fatcat:zajmkk23r5amvkv434oigd2m4y

Multiprocessor on chip: beating the simulation wall through multiobjective design space exploration with direct execution

R.B. Mouhoub, O. Hammami
2006 Proceedings 20th IEEE International Parallel & Distributed Processing Symposium  
Design space exploration of multiprocessors on chip requires both automatic performance analysis techniques and efficient multiprocessors configuration performance evaluation.  ...  This paper proposes a new performance evaluation methodology for multiprocessors on chip which conduct a multiobjective design space exploration through emulation.  ...  Paul and al [18] propose a technique called MESH a high level modeling and simulation technique of single chip programmable heterogeneous multiprocessors based on a layered approach.  ... 
doi:10.1109/ipdps.2006.1639623 dblp:conf/ipps/MouhoubH06 fatcat:oeagswclb5fgpksaucl3nvz7wa

Integrating cache coherence protocols for heterogeneous multiprocessor systems. 1

T. Suh, H.-H.S. Lee, D.M. Blough
2004 IEEE Micro  
The simulation results show up to 51 percent performance improvement for low miss penalties with simple techniques, compared to a pure software solution.  ...  ) on a bus as explained in a later section on region-based cache coherence.  ... 
doi:10.1109/mm.2004.33 fatcat:ltfbpcs4dzd3noxh4443pv7tn4

Design space exploration using arithmetic-level hardware--software cosimulation for configurable multiprocessor platforms

Jingzhao Ou, Viktor K. Prasanna
2006 ACM Transactions on Embedded Computing Systems  
the performance of the applications running on the multiprocessor platform.  ...  For illustrative purposes, we provide an implementation of our approach based on MATLAB/Simulink.  ...  The authors would also like to thank Phil James-Roxby for offering us the original VHDL multiprocessor design of the JPEG2000 application.  ... 
doi:10.1145/1151074.1151080 fatcat:4h5l2jh5crdwnpktv2m7vepki4

Disk caching with an optical ring

Enrique V. Carrera, Ricardo Bianchini
2000 Applied Optics  
This latter evaluation shows that our optical ring improves performance for a traditional multiprocessor by roughly the same amount as it does for an optically interconnected multiprocessor.  ...  To evaluate the extent to which these benefits affect performance, we use detailed execution-driven simulations of several out-of-core parallel applications that run on an eight-node scalable multiprocessor  ...  The paper is based on research supported by the National Science Foundation under grant CCR-9510173 and by the Brazilian Conselho Nacional de Desenvolvimento Científico e Tecnológico.  ... 
doi:10.1364/ao.39.006663 pmid:18354681 fatcat:5k2pwihkpndvdalmzzavbafl2a

Prefetching and multithreading performance in bus-based multiprocessors with Petri Nets [chapter]

Edward D. Moreno, Sergio T. Kofuji, Marcelo H. Cintra
1997 Lecture Notes in Computer Science  
We also intend to develop and analyse such techniques, using simple but useful analytical models that predict the performance benefits achievable on bus-based multiprocessors.  ...  In this paper, we shall consider two architectural techniques, that address the latency problem: Pl:efetching and Multithreading.  ...  Figure 4.Bus Contention in a Bus-Based Multiprocessor with FSP According to figure 3, prefetching to a great number of data, (i.e. larger than 4 consecutive blocks) does not improve the system's performance  ... 
doi:10.1007/bfb0002846 fatcat:kuc4gfer55brvatbvec6kgc7cq

Achieving high performance in bus-based shared-memory multiprocessors

A. Milenkovic
2000 IEEE Concurrency  
SMP performance Private caches are essential for reducing bus congestion and for coping with memory reference latency in bus-based SMPs.  ...  In bus-based SMPs, cache misses and bus traffic pose key obstacles to high performance. To overcome these problems, several techniques have been proposed.  ...  They found that read snarfing improves performance by 30% to 67% for 32 processors with 64byte cache blocks for three applications. For 8-byte cache blocks, performance improves up to 10%.  ... 
doi:10.1109/4434.865891 fatcat:e42udkwrhjdrfbi2rsjfhna5tm

Reducing coherence overhead in shared-bus multiprocessors [chapter]

Sangyeun Cho, Gyungho Lee
1996 Lecture Notes in Computer Science  
To reduce the overhead of cache coherence enforcement i n shared-bus multiprocessors, we propose a self-invalidation technique as an extension to write-invalidate protocols.  ...  We e v aluate our self-invalidation scheme by simulating SPLASH-2 benchmark programs that exhibit various reference patterns, under a realistic shared-bus multiprocessor model.  ...  Reducing the coherence overhead, which in turn reduces the bandwidth requirement on the bus, is very important for future shared-bus multiprocessors.  ... 
doi:10.1007/bfb0024741 fatcat:4cvao3eepnaexfmhdt4ahs7agi

System-level performance analysis of multiprocessor system-on-chips by combining analytical model and execution time variation

Sungchan Kim, Soonhoi Ha
2014 Microprocessors and microsystems  
Our approach consists of two techniques: (1) analytical model of on-chip crossbar-based communication architectures and (2) enumeration of task-level execution time variations for a target application.  ...  As the impact of the communication architecture on performance grows in a Multiprocessor Systemon-Chip (MPSoC) design, the need for performance analysis in the early stage in order to consider various  ...  The proposed technique consists of two key parts: first, building an analytical model of the target system's dynamic behavior, and second, systematically exploring, based on the model, the wider performance  ... 
doi:10.1016/j.micpro.2014.02.003 fatcat:wsonxvepd5gfthhxp5ehlebrma

Cooperative multithreading on embedded multiprocessor architectures enables energy-scalable design

P. Schaumont, Bo-Cheng Charles Lai, Wei Qin, I. Verbauwhede
2005 Proceedings. 42nd Design Automation Conference, 2005.  
We propose an embedded multiprocessor architecture and its associated thread-based programming model.  ...  We port a fingerprint minutiae detection application onto this architecture, and show the resulting performance on single-, dual-, and quad-processor configurations.  ...  Figure 4 clearly points out the relevance of voltage-scaled multiprocessor architectures. Two configurations in particular improve on the single-processor nominal case (1_H).  ... 
doi:10.1109/dac.2005.193767 fatcat:ysdbhce73zadnpjqv5vyrbxo34

Simulation study of memory performance of SMP multiprocessors running a TPC-W workload

P. Foglia, R. Giorgi, C.A. Prete
2004 IEE Proceedings - Computers and digital Techniques  
As in previous studies on shared-bus multiprocessors, it was found that the memory performance is highly influenced by cache parameters.  ...  The hardware configurations are: a single SMP running tiers two and three, and two SMPs each one running a single tier.  ...  In both cases the systems can be based on multiprocessor architecture [9] . We consider servers based on shared-bus shared-memory multiprocessor systems.  ... 
doi:10.1049/ip-cdt:20040349 fatcat:ancei4gzrvdgpe3lnuxnshskye

Shared Memory Multiprocessors [chapter]

2004 Parallel Computing on Heterogeneous Networks  
In bus-based and some small-scale interconnect-based shared memory systems, cache coherence is implemented using a technique known as bus snooping: each processor on the bus monitors bus transactions issued  ...  The two major types of interconnects are bus-based and network-based interconnects.  ... 
doi:10.1002/0471654167.ch3 fatcat:dvaj7kmetfgr7bkmdrmvzljwda

Implementation of data cache block (DCB) in shared processor using field-programmable gate array (FPGA)

R Karthick, P Meenalochini
2020 Journal of the National Science Foundation of Sri Lanka  
A two-tier implementation which is based on chip clock scheduling is proposed.  ...  This research deals with a novel dynamic reconfi gurable multiprocessor technique combined with a System-On-Chip (SOC) and provides continuous transition activities in a digital environment.  ...  used to minimize overall system cost and improved performance.  ... 
doi:10.4038/jnsfsr.v48i4.10340 fatcat:qtgmocfgabafrne6atz7hjvrfe

Comparison of the Performance of Two Service Disciplines for a Shared Bus Multiprocessor with Private Caches [article]

Angel Vassilev Nikolov, Lerato Lerato
2010 arXiv   pre-print
In this paper, we compare two analytical models for evaluation of cache coherence overhead of a shared bus multiprocessor with private caches.  ...  The models are based on a closed queuing network with different service disciplines. We find that the priority discipline can be used as a lower-level bound.  ...  Analytical models based on queuing theory provide simple but approximate approach for estimating the performance of multiprocessors in the early design cycles.  ... 
arXiv:1004.3560v1 fatcat:t6hyfvpf2jf5haw42jl5iv5t7m

Multiprocessor System-on-Chip (MPSoC) Technology

W. Wolf, A.A. Jerraya, G. Martin
2008 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
The multiprocessor system-on-chip (MPSoC) uses multiple CPUs along with other hardware subsystems to implement a system. A wide range of MPSoC architectures have been developed over the past decade.  ...  Dutta for the helpful discussions of their MPSoCs.  ...  However, two of the units on the bus are programmable accelerators, one for audio and another for video, which make use of the MMDSP+, which is a small DSP.  ... 
doi:10.1109/tcad.2008.923415 fatcat:p37pvh5iezfdjd4acepney4zmy
« Previous Showing results 1 — 15 out of 8,016 results