Filters








1,736 Hits in 6.6 sec

DOSA: Design Optimizer for Scientific Applications

David A. Bader, Viktor K. Prasanna
2007 2007 IEEE International Parallel and Distributed Processing Symposium  
for speed (or power) at design-time and use a run-time optimizer.  ...  In this paper we briefly introduce our new framework, called "Design Optimizer for Scientific Applications" (DOSA) which allows the programmer or compiler writer to explore alternative designs and optimize  ...  These new architectural features include hardware accelerators (e.g., reconfigurable logic such as FPGAs, SIMD/vector processing units such as in the IBM Cell Broadband Engine processor, and graph-ics  ... 
doi:10.1109/ipdps.2007.370494 dblp:conf/ipps/BaderP07 fatcat:vuso7sdnfrhytfnxc4njwzqm6m

DOSA: design optimizer for scientific applications

David A. Bader, Viktor K. Prasanna
2008 Proceedings, International Parallel and Distributed Processing Symposium (IPDPS)  
for speed (or power) at design-time and use a run-time optimizer.  ...  In this paper we briefly introduce our new framework, called "Design Optimizer for Scientific Applications" (DOSA) which allows the programmer or compiler writer to explore alternative designs and optimize  ...  These new architectural features include hardware accelerators (e.g., reconfigurable logic such as FPGAs, SIMD/vector processing units such as in the IBM Cell Broadband Engine processor, and graph-ics  ... 
doi:10.1109/ipdps.2008.4536426 dblp:conf/ipps/BaderP08 fatcat:sayugcp2gncefhenbt5k5f73mm

Optimizing matrix multiplication for a short-vector SIMD architecture – CELL processor

Jakub Kurzak, Wesley Alvaro, Jack Dongarra
2009 Parallel Computing  
The STI CELL processor exceeds the capabilities of any other processor available today in terms of peak single precision, floating point performance, aside from special purpose accelerators like Graphics  ...  Any improvement to the _GEMM routine immediately benefits the entire algorithm, which makes the optimization of the _GEMM routine yet more important for the CELL processor.  ...  Introduction The CELL Broadband Engine Architecture (CBEA) has been developed jointly by the alliance of Sony, Toshiba and IBM (STI).  ... 
doi:10.1016/j.parco.2008.12.010 fatcat:wd6lh2pn2nbztlqxqfyrknnmvu

Signal processing on platforms with multiple cores: Part 1 - Overview and methodologies [From the Guest Editors

Yen-Kuang Chen, Chaitali Chakrabarti, Shuvra Bhattacharyya, Bruno Bougard
2009 IEEE Signal Processing Magazine  
The number of cores is even higher for the Sony PlayStation 3, which is equipped with an eight-core IBM CELL Broadband Engine processor, Nvidia GeForce 9800 GX2, which has 256 stream processors, and SUN  ...  that demonstrate useful techniques for developing efficient implementations.  ...  The number of cores is even higher for the Sony PlayStation 3, which is equipped with an eight-core IBM CELL Broadband Engine processor, Nvidia GeForce 9800 GX2, which has 256 stream processors, and SUN  ... 
doi:10.1109/msp.2009.934556 fatcat:ept7w36whzetvmi7mpp7iaue7q

Multicore Architectures With Dynamically Reconfigurable Array Processors for Wireless Broadband Technologies

Wei Han, Ying Yi, Mark Muir, Ioannis Nousias, Tughrul Arslan, Ahmet T. Erdogan
2009 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
A simulation platform is proposed in order to explore and implement various multicore solutions combining different memory architectures and task-partitioning schemes.  ...  and ten DR processor cores for both the WiMAX transmitter and receiver sections, respectively.  ...  Many multicore-based processors including DSPs and microcontrollers have emerged, such as Cell Broadband Engine Architecture [9] and Ambric [10] .  ... 
doi:10.1109/tcad.2009.2032361 fatcat:mc6eflr6onef5oufokurvqknya

On the random access performance of Cell Broadband Engine with graph analysis application [article]

Mingyu Chen, David A. Bader
2011 arXiv   pre-print
The Cell Broad Engine (BE) Processor has unique memory access architecture besides its powerful computing engines. Many computing-intensive applications have been ported to Cell/BE successfully.  ...  The dynamic load balanc- ing and software pipeline for optimizing SSCA#2 are intro- duced.  ...  The Cell Broadband Engine (Cell/BE) [13] is a unique architectural multi-core design by Sony, Toshiba, and IBM (STI).  ... 
arXiv:1105.5881v2 fatcat:33th7b6ijjha7evnzmcyy2asyy

AN MPI PERFORMANCE MONITORING INTERFACE FOR CELL BASED COMPUTE NODES

HIKMET DURSUN, KEVIN J. BARKER, DARREN J. KERBYSON, SCOTT PAKIN, RICHARD SEYMOUR, RAJIV K. KALIA, AIICHIRO NAKANO, PRIYA VASHISHTA
2009 Parallel Processing Letters  
We analyze the performance of our approach on a PlayStation3 console based on Cell Broadband Engine-the CBE-as well as an IBM BladeCenter QS22 based on PowerXCell 8i.  ...  Our analyses of inter-SPE communication (across the entire cluster) and function call patterns provide valuable information that can be used to optimize application performance.  ...  Numerical tests were implemented using the AMD Opteron/IBM PowerXCell 8i cluster at the Los Alamos National Laboratory and Playstation3 cluster at the Collaboratory for Advanced Computing and Simulations  ... 
doi:10.1142/s0129626409000407 fatcat:6jltougjtjaqnpnhfen54fev3m

Accelerating the Execution of Matrix Languages on the Cell Broadband Engine Architecture [article]

Raymes Khoury, Bernd Burgstaller, Bernhard Scholz
2009 arXiv   pre-print
Current implementations of matrix languages do not fully utilise high-performance, special-purpose chip architectures such as the IBM PowerXCell processor (Cell), which is currently used in the fastest  ...  Our Cell-based implementation achieves speedups of up to a factor of 12 over code run on recent Intel Core2 Quad processors.  ...  In this work, we implement a computation engine for the Cell Broadband Engine architecture (see section 7).  ... 
arXiv:0910.2324v2 fatcat:yx54eawsizcthdf5l73k7b7rzi

Application profiling on Cell-based clusters

Hikmet Dursun, Kevin J. Barker, Darren J. Kerbyson, Scott Pakin
2009 2009 IEEE International Symposium on Parallel & Distributed Processing  
Our analyses of inter-SPE communication (across the entire cluster) and function call patterns provide valuable information that can be used to optimize application performance.  ...  Specifically, we examine Cell-centric MPI programs on hybrid clusters containing multiple Opteron and Cell processors per node such as those used in the petascale Roadrunner system.  ...  Seymour, and P. Vashishta for providing us with their parallel MD and LB simulation codes.  ... 
doi:10.1109/ipdps.2009.5161092 dblp:conf/ipps/DursunBKP09 fatcat:onrqrevlandmbgogizzl3anl3i

Cell Broadband Engine processor: Design and implementation

M. W. Riley, J. D. Warnock, D. F. Wendel
2007 IBM Journal of Research and Development  
The Cell Broadband Enginee (Cell/B.E.) processor was developed by Sony, Toshiba, and IBM engineers to deliver a high-speed, high-performance, multicore processor that brings supercomputer performance via  ...  This application also required very high speed compute and real-time response processes. Introduction: Overview of the Cell Broadband Engine processor  ...  Acknowledgments The design and implementation of the Cell/B.E. processor was a monumental effort.  ... 
doi:10.1147/rd.515.0545 fatcat:4nn2zxfsmjdcziyhturtazw5vq

Integrating profile-driven parallelism detection and machine-learning-based mapping

Zheng Wang, Georgios Tournavitis, Björn Franke, Michael F. P. O'boyle
2014 ACM Transactions on Architecture and Code Optimization (TACO)  
On average, our methodology achieves 96% of the performance of the hand-tuned OpenMP NAS and SPEC parallel benchmarks on the Intel Xeon platform and gains a significant speedup for the IBM Cell platform  ...  We have evaluated our parallelization strategy on the NAS and SPEC CPU2000 benchmarks and two different multi-core platforms (dual quad-core Intel Xeon SMP and dual-socket QS20 Cell blade).  ...  For example, parallel execution of this loop is not profitable for the Cell Broadband Engine (BE) platform due to high communication costs between processing elements.  ... 
doi:10.1145/2579561 fatcat:x5b7hvxjgrgjnmyk3pozdrtzye

Multi-core acceleration of chemical kinetics for simulation and prediction

John C. Linford, John Michalakes, Manish Vachharajani, Adrian Sandu
2009 Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09  
This work implements a computationally expensive chemical kinetics kernel from a large-scale community atmospheric model on three multi-core platforms: NVIDIA GPUs using CUDA, the Cell Broadband Engine  ...  When used as a template mechanism in the the Kinetic PreProcessor, the multi-core implementation enables the automatic optimization and porting of many chemical mechanisms on a variety of multi-core platforms  ...  The authors acknowledge Georgia Institute of Technology, its Sony-Toshiba-IBM Center of Com-petence, and the National Science Foundation, for the use of Cell Broadband Engine resources that have contributed  ... 
doi:10.1145/1654059.1654067 dblp:conf/sc/LinfordMVS09 fatcat:ym4dm6gxljdkdoxkixsl33a43e

An SPU reference model for simulation, random test generation and verification

Yukio Watanabe, Balazs Sallay, Brad Michael, Daniel Brokenshire, Gavin Meil, Hazim Shafi, Daisuke Hiraoka
2006 Proceedings of the 2006 conference on Asia South Pacific design automation - ASP-DAC '06  
Outline  Overview of the Cell Broadband Engine and the SPU  SPU reference model  Applications using the SPU reference model ------------------------------------------------------ If there is a DMA  ...  Broadband Engine and the SPU  SPU reference model  Applications using the SPU reference model  Simulator  Random test case generator  Verification environment  Three kinds of simulators were implemented  ... 
doi:10.1145/1118299.1118495 fatcat:aiouxer5yzecrgai5oirecll5a

Evaluating application mapping scenarios on the Cell/B.E

Ana Lucia Varbanescu, Henk Sips, Kenneth A. Ross, Qiang Liu, Apostol (Paul) Natsev, John R. Smith, Lurng-Kuo Liu
2009 Concurrency and Computation  
Specifically, we focus on analyzing the impact of combining data-and task-parallelism for a multimedia analysis application running on the Cell Broadband Engine (Cell/B.E.).  ...  Although low-level optimizations only target code running on individual cores, high-level optimizations (e.g. data-and task-parallelism) target the overall application performance.  ...  ACKNOWLEDGEMENTS We would like to thank Michael Perrone, Gordon Braudaway, Karen Magerlein, and Bruce D'Amora for their valuable support and ideas during the development and experiments of this application  ... 
doi:10.1002/cpe.1335 fatcat:m5ostrbvrva2fjbjj2asqrhviq

Achieving high memory performance from heterogeneous architectures with the SARC programming model

Roger Ferrer, Vicenç Beltran, Marc González, Xavier Martorell, Eduard Ayguadé
2009 Proceedings of the 10th MEDEA workshop on MEmory performance DEaling with Applications, systems and architecture - MEDEA '09  
Results indicate that the programming model is able to achieve up to 85% of the peak memory bandwidth on the Cell/B.E. processor.  ...  , achieving high performance.  ...  This work has been supported by the Ministry of Education of Spain under contract TIN2007-60625, and the European Commission in the context of the SARC integrated project #27648 (FP6).  ... 
doi:10.1145/1621960.1621963 fatcat:z2v6z6kgj5gmzckc72fci6uwge
« Previous Showing results 1 — 15 out of 1,736 results