4,131 Hits in 4.1 sec

Embedded computer architectures in the MPSoC age

Wayne Wolf
2005 Proceedings of the 2005 workshop on Computer architecture education held in conjunction with the 32nd International Symposium on Computer Architecture - WCAE '05  
Although several open-source multiprocessor simulators are available, most of them are designed for symmetric multiprocessors and cannot be easily modified to handle heterogeneous multiprocessors.  ...  For example, many reference video encoders come with full-search motion estimation, even though that algorithm is not used in practice.  ... 
doi:10.1145/1275604.1275607 dblp:conf/wcae/Wolf05 fatcat:ihdtjoimcbgbhpm2p4cek4tgsa

Parallel Exact and Approximate Arrow-Type Inverses on Symmetric Multiprocessor Systems [chapter]

George A. Gravvanis, Konstantinos M. Giannoutakis
2006 Lecture Notes in Computer Science  
In this paper we present new parallel inverse arrow-type matrix algorithms based on the concept of sparse factorization procedures, for computing explicitly exact and approximate inverses, on symmetric  ...  multiprocessor systems.  ...  For the parallelization of the EIATM and the BAIATM algorithms on symmetric multiprocessor systems, the ATALUFA algorithm was used as a "frontend"computational procedure.  ... 
doi:10.1007/11758501_69 fatcat:uujujh7winftjnz2hbwcdqjssa

An Efficient Algorithm for Load Balancing in Multiprocessor Systems

Saleh A.
2018 International Journal of Advanced Computer Science and Applications  
To use multiprocessor systems efficiently, several load balancing algorithms have been adopted widely.  ...  This paper proposes an efficient load balance algorithm which addresses common overheads that may decrease the efficiency of a multiprocessor system.  ...  Crossbar switches have a good potential for high bandwidth and system efficiency. 2) Operating System Characteristics During the course of design of the scheduling algorithm, it became apparent that  ... 
doi:10.14569/ijacsa.2018.090324 fatcat:s5cksetah5hufeeiwmdnfrlw6q

Keynote Speakers

2008 2008 14th IEEE International Conference on Parallel and Distributed Systems  
The parallel multiprocessor systems should be so designed so as to facilitate the design and implementation of the efficient parallel algorithms that exploit optimally the capabilities of the system.  ...  The two major issues in the formulation and design of parallel multiprocessor systems are algorithm design and architecture design.  ...  In 1987, he worked as a Consultant for Caplin Cybernetics Corporation (London, England), where he helped in the design of a number of image processing algorithms that were targeted at a particular parallel  ... 
doi:10.1109/icpads.2008.5 fatcat:lzjaadyhljb2pkhaorehoy7epu

An FPT algorithm with a modularized structure for computing two-dimensional discrete Fourier transforms

Ja-Ling Wu, Yuh-Ming Huang
1991 IEEE Transactions on Signal Processing  
The regularity of the new algorithm makes it of great practical value. Based on the ideas in [4], in this correspondence, we modularized the FPT algorithm for computing 2-D DFT's.  ...  Koltracht, "Efficient algorithm for Toeplitz-plus-Hankel matrices," Integral Equarions Oper. Theory, vol. 12, no. I , Y. H. Hu and S.-Y. Kung, "Toeplitz eigensystem solver," IEEE Trans.  ... 
doi:10.1109/78.134460 fatcat:tamcrlvbenhg3kqozb7bxv47vm

Techniques for Designing Efficient Parallel Graph Algorithms for SMPs and Multicore Processors [chapter]

Guojing Cong, David A. Bader
2007 Lecture Notes in Computer Science  
We propose techniques for designing and implementing efficient parallel algorithms for graph problems on symmetric multiprocessors and chip multiprocessors with a case study of parallel tree and connectivity  ...  Although many regular problems can be solved efficiently in parallel, obtaining efficient implementations for irregular graph problems remains a challenge.  ...  Modern symmetric multiprocessors (SMPs) and chip multiprocessors (CMPs) are becoming very powerful and common place. Most of the high performance computers are clusters of SMPs and/or CMPs.  ... 
doi:10.1007/978-3-540-74742-0_15 fatcat:qmtryl6qmnfavk45kbxls5bzca

Experimental Evaluation of the Performance of Processing Stealing Technique: A Scalable Load Balancing Technique for a Dynamic Multiprocessor System

O. O.Olakanmi, O. A. Fakolujo
2013 International Journal of Computer Applications  
The experimental results showed that the load balancing algorithm is efficient and scalable for balancing at least 100,000 instructions tasks and PE-S generated ratios are averagely better than any other  ...  Each cluster of the multiprocessor system is a node in symmetric multiprocessor architecture and the number of Processing Element (PE) in each cluster is dynamically determined at runtime.  ...  This was further strengthened in [5] where design and preliminary evaluation of an integrated load distribution-load balancing algorithm which was targeted to be both efficient and scalable for dynamically  ... 
doi:10.5120/14426-2568 fatcat:njatusmmcndatfuvjkgvj4nhqy

Mapping DSP applications onto self-timed multiprocessors

S.S. Bhattacharyya, N. Bambha, M. Khandelia, V. Kianzad
2001 Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256)  
Self-timed scheduling is an attractive implementation style for multiprocessor DSP systems due to its ability to exploit predictability in application behavior, its avoidance of over-constrained synchronization  ...  This paper examines a number of intermediate representations for compiling dataflow programs onto selftimed DSP platforms, and discusses efficient techniques that operate on these representations to streamline  ...  Nevertheless, it has proven to be a useful estimate of performance during design space exploration for multiprocessor DSP.  ... 
doi:10.1109/acssc.2001.986965 fatcat:vlwdmst5ebbqrhc2vzkyscspou

Cooperative multithreading on embedded multiprocessor architectures enables energy-scalable design

P. Schaumont, Bo-Cheng Charles Lai, Wei Qin, I. Verbauwhede
2005 Proceedings. 42nd Design Automation Conference, 2005.  
We propose an embedded multiprocessor architecture and its associated thread-based programming model.  ...  Using a cycle-true simulation model of this architecture, we are able to estimate energy savings for a threaded C program.  ...  This paper shows how a multiprocessor can be used as an energy-efficient replacement for a single processor.  ... 
doi:10.1109/dac.2005.193767 fatcat:ysdbhce73zadnpjqv5vyrbxo34

Prefix Computations on Symmetric Multiprocessors

David R. Helman, Joseph JáJá
2001 Journal of Parallel and Distributed Computing  
Moreover, whereas Reid-Miller and Blelloch targeted their algorithm for implementation on a vector multiprocessor architecture, we develop our algorithm for implementation on the symmetric multiprocessor  ...  These symmetric multiprocessors dominate the high-end server market and are currently the primary candidate for constructing large scale multiprocessor systems.  ...  Moreover, whereas Reid-Miller and Blelloch targeted their algorithm for implementation on a vector multiprocessor architecture, we develop our algorithm for implementation on the symmetric multiprocessor  ... 
doi:10.1006/jpdc.2000.1678 fatcat:zguouofg4fctdkt37posnx3lka

Linear Crossed Cube (LCQ): A New Interconnection Network Topology for Massively Parallel System

Zaki A. Khan, Jamshed Siddiqui, Abdus Samad
2015 International Journal of Computer Network and Information Security  
Scalability and Complexity are crucial performance parameters in the design of Interconnection networks for multiprocessor system.  ...  The comparative simulation study shows that the proposed network can be considered as low-cost multiprocessor architecture for parallel system when appropriate scheduling algorithm is implemented onto  ...  For future works, we intend to design more efficient scheduling scheme suitable for the purposed LCQ network. 25 Fig. 1 . 251 For different depth, network having 1, 3, 6, 10, 15, 21,...... processors are  ... 
doi:10.5815/ijcnis.2015.03.03 fatcat:pdanyyal45gpvmvclmzwpe3iwm

CPR: Composable performance regression for scalable multiprocessor models

Benjamin C. Lee, Jamison Collins, Hong Wang, David Brooks
2008 2008 41st IEEE/ACM International Symposium on Microarchitecture  
Trained with a production quality simulator, CPR is accurate with median errors of 6.63, 4.83 percent for dual-, quad-core multiprocessors.  ...  Multiprocessor simulators, however, must account for synchronization events that increase the cost of every cycle simulated and shared resource contention that increases the total number of cycles simulated  ...  Uniprocessor inferential models leverage best known practices in statistical inference for highly efficient simulation and analysis.  ... 
doi:10.1109/micro.2008.4771797 dblp:conf/micro/LeeCWB08 fatcat:di4kfo7yb5emhnzasj73f4bsoi

Hierarchical Scheduling for Symmetric Multiprocessors

A. Chandra, P. Shenoy
2008 IEEE Transactions on Parallel and Distributed Systems  
We then present hierarchical multiprocessor scheduling (H-SMP): a hierarchical CPU scheduling algorithm designed for a symmetric multiprocessor (SMP) platform.  ...  In this paper, we present a novel hierarchical scheduling algorithm designed specifically for multiprocessor environments that overcomes the limitations of existing algorithms in several ways.  ...  We then presented H-SMP: a hierarchical CPU scheduling algorithm designed for a symmetric multiprocessor (SMP) platform.  ... 
doi:10.1109/tpds.2007.70755 fatcat:n22ck5tm3zfexoy5k44urttujy

Multiprocessing: An Annotated Bibliography

1980 Computer  
Describes the efficient imnplementation of certain digital signal processing algorithms on a multiprocessor architecture specifically designed for such applications.  ...  dedicated to signal processing tasks and discusses techniques for generating efficient code for certain algorithms used in digital signal processing. 5.  ... 
doi:10.1109/mc.1980.1653627 fatcat:fw6llehhwnepvp3ve6shtw4zgu

A Customized Lattice Reduction Multiprocessor for MIMO Detection [article]

Shahriar Shahabuddin, Janne Janhunen, Amanullah Ghazi, Zaheer Khan and Markku Juntti
2015 arXiv   pre-print
We propose some modification of the popular LR algorithm, Lenstra-Lenstra-Lovasz (LLL) for high throughput. The TTA cores are programmed with high level language.  ...  In this paper, we propose a customized homogeneous multiprocessor for LR. The processor cores are based on transport triggered architecture (TTA).  ...  However, the ML algorithm is too complex for practical realtime implementations. Linear detection is popular for practical implementations.  ... 
arXiv:1501.04860v1 fatcat:6ohgnxzszze6rkcrsu4ttmzrnm
« Previous Showing results 1 — 15 out of 4,131 results