Filters








778 Hits in 4.9 sec

The Vector Floating-Point Unit in a Synergistic Processor Element of a CELL Processor

S.M. Mueller, C. Jacobi, Hwa-Joon Oh, K.D. Tran, S.R. Cottier, B.W. Michael, H. Nishikawa, Y. Totsuka, T. Namatame, N. Yano, T. Machida, S.H. Dhong
17th IEEE Symposium on Computer Arithmetic (ARITH'05)  
The floating-point unit in the Synergistic Processor Element of the 1st generation multi-core CELL Processor is described.  ...  The latency includes the global forwarding of the result.  ...  Introduction The Synergistic Processor Element (SPE) of a CELL Processor [4] is the first implementation of a new processor architecture designed to accelerate media and data streaming workloads.  ... 
doi:10.1109/arith.2005.45 dblp:conf/arith/MullerJOTCMNTNYMD05 fatcat:u7vle64a5zgzrnbgsplvbbeolm

Cell Multiprocessor Communication Network: Built for Speed

M. Kistler, M. Perrone, F. Petrini
2006 IEEE Micro  
Larrabee: a many-core x86 architecture for visual computing",  ...  processor unit utilize all elements in a register -> SIMD Simplified representation of a current Cell processor Image Source: [3] Element Interconnect Bus •  ...  all cores Vector Processing Unit in Larrabee • 16-wide VPU executing integer, single-and double precision floating point operations • VPU supports gather-scatter operations -The 16 elements are  ... 
doi:10.1109/mm.2006.49 fatcat:t4fyqh4i65a7xgabx5orysepyi

Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the Synergistic Processing Element of the CELL Processor [chapter]

Wesley Alvaro, Jakub Kurzak, Jack Dongarra
2008 Lecture Notes in Computer Science  
The STI CELL processor exceeds the capabilities of any other processor available today in terms of peak single precision, floating point performance.  ...  The crutial component is the matrix multiplication kernel crafted for the short vector Single Instruction Multiple Data architecture of the Synergistic Processing Element of the CELL processor.  ...  It can be observed that the Synergistic Processing Element of the CELL processor closely matches this description.  ... 
doi:10.1007/978-3-540-69384-0_98 fatcat:3a55zr7pbrggfpimqutyeanrse

Introduction to the Cell multiprocessor

J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, D. Shippy
2005 IBM Journal of Research and Development  
This paper provides an introductory overview of the Cell multiprocessor. Cell represents a revolutionary extension of conventional microprocessor architecture and organization.  ...  The paper discusses the history of the project, the program objectives and challenges, the design concept, the architecture and programming models, and the implementation.  ...  Acknowledgments The Cell processor is the result of a deep collaboration by engineers from IBM, the Sony Group, and Toshiba Corporation.  ... 
doi:10.1147/rd.494.0589 fatcat:7nj6ionujfcl7oxmv5vvpfjyhm

Multicore Surprises: Lessons Learned from Optimizing Sweep3D on the Cell Broadband Engine

Fabrizio Petrini, Gordon Fossum, Juan Fernandez, Ana Lucia Varbanescu, Mike Kistler, Michael Perrone
2007 2007 IEEE International Parallel and Distributed Processing Symposium  
In our exploration to achieve the optimum level of performance for Sweep3D, we have enjoyed many pleasant surprises, such as a very high floating point performance, reaching 64% of the theoretical peak  ...  This level of performance can be reached by exploiting several dimensions of parallelism, such as thread-level parallelism using several Synergistic Processing Elements, data streaming parallelism, vector  ...  The number of elements in any of the wmV* vectors is (16/sizeo f (w[m])), which is 2 in this case because w[m] is a double precision floating point value. 4. spu madd does a double-precision 2-way SIMD  ... 
doi:10.1109/ipdps.2007.370252 dblp:conf/ipps/PetriniFFVKP07 fatcat:5bl6agy62fct7p77h5zgnqhuyq

Toward exploitation of cell multi-processor array in time-consuming applications by using CNN model

Zoltan Nagy, Laszlo Kek, Zoltan Kincses, Andras Kiss, Peter Szolgay
2008 2008 11th International Workshop on Cellular Neural Networks and Their Applications  
IBM has recently introduced the Cell Broadband Engine (Cell BE) Architecture, which contains 8 identical vector processors in an array structure.  ...  In the paper the implementation of the 3-D Princeton Ocean Model on the Cell BE is discussed. The area/speed/power tradeoffs of our solution and different hardware implementations are also compared.  ...  ACKNOWLEDGMENT The authors would like to thank Professor Tamás Roska for many helpful discussions and suggestions.  ... 
doi:10.1109/cnna.2008.4588670 fatcat:5bmj4l3wyfa2lj2wne2olcvb4i

Extended precision with a rounding mode toward zero environment. Application to the Cell processor

Hong Diep Nguyen, Stef Graillat, Jean Luc Lamotte
2009 International Journal of Reliability and Safety  
In this paper, we develop EFT operations in truncation rounding mode optimised for the Cell processor.  ...  In the field of scientific computing, the exactness of the calculation is of prime importance. That leads to efforts made to increase the precision of the floating point algorithms.  ...  Acknowledgements The authors are very grateful to the CINES (Centre Informatique National de l'Enseignement Supérieur, Montpellier, France) for providing us access to their Cell blades.  ... 
doi:10.1504/ijrs.2009.026839 fatcat:dwlvrfwvqrclxllyzzahfmncpq

FAMOUS, faster: using parallel computing techniques to accelerate the FAMOUS/HadCM3 climate model with a focus on the radiative transfer algorithm

P. Hanappe, A. Beurivé, F. Laguzet, L. Steels, N. Bellouin, O. Boucher, Y. H. Yamazaki, T. Aina, M. Allen
2011 Geoscientific Model Development  
<br><br> The modified algorithm runs more than 50 times faster on the CELL's <i>Synergistic Processing Element</i> than on its main PowerPC processing element.  ...  Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors.  ...  Sony CSL would like to thank the UK Met Office for providing us with a Vendor Benchmarking License, and the Sony Computer Entertainment R&D teams for their support.  ... 
doi:10.5194/gmd-4-835-2011 fatcat:kxke3exme5df7o4jroojsrtjsi

Cell/B.E. blades: Building blocks for scalable, real-time, interactive, and digital media servers

A. K. Nanda, J. R. Moulic, R. E. Hanson, G. Goldrian, M. N. Day, B. D. D'Amora, S. Kesavarapu
2007 IBM Journal of Research and Development  
The Cell Broadband Enginee (Cell/B.E.) processor, developed jointly by Sony, Toshiba, and IBM primarily for next-generation gaming consoles, packs a level of floating-point, vector, and integer streaming  ...  In this paper we present the design of the Cell/B.E. blades and discuss several early application prototypes and results.  ...  Acknowledgments The authors gratefully acknowledge the contributions of the STI Design Center team in Austin, Texas, the Cell/B.E. blade design team in Germany, the team at Episode, Inc., the Alias Systems  ... 
doi:10.1147/rd.515.0573 fatcat:3wk543hwtnfyzfmdzjym3iw5ge

Optimizing matrix multiplication for a short-vector SIMD architecture – CELL processor

Jakub Kurzak, Wesley Alvaro, Jack Dongarra
2009 Parallel Computing  
The crucial component is the matrix multiplication kernel crafted for the short vector Single Instruction Multiple Data architecture of the Synergistic Processing Element of the CELL processor.  ...  The STI CELL processor exceeds the capabilities of any other processor available today in terms of peak single precision, floating point performance, aside from special purpose accelerators like Graphics  ...  The CELL processor is a multi-core architecture consisting of a standard processor, the Power Processing Element (PPE), and eight short-vector Single Instruction Multiple Data (SIMD) processors, referred  ... 
doi:10.1016/j.parco.2008.12.010 fatcat:wd6lh2pn2nbztlqxqfyrknnmvu

Synergistic Processing in Cell's Multicore Architecture

M. Gschwind, H.P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, T. Yamazaki
2006 IEEE Micro  
We also thank Valentina Salapura for her help and numerous suggestions in the preparation of this article.  ...  Acknowledgments We thank Jim Kahle, Ted Maeurer, Jaime Moreno, and Alexandre Eichenberger for their many comments and suggestions in the preparation of this work.  ...  eight synergistic processor elements (SPEs) in a unified system architecture.  ... 
doi:10.1109/mm.2006.41 fatcat:tt5nh6bppzdnxh6rhwfdcq7gle

The PlayStation 3 for High-Performance Scientific Computing

Jakub Kurzak, Alfredo Buttari, Piotr Luszczek, Jack Dongarra
2008 Computing in science & engineering (Print)  
point units, vector, and SIMD processing elements.  ...  The CELL in a Nutshell The main control unit of the CELL processor is the POWER Processing Element (PPE).  ... 
doi:10.1109/mcse.2008.85 fatcat:5mwizvz5nzbvfdvqcntxtlxkom

Toward exploitation of cell multi-processor array in time-consuming applications by using CNN model

Zoltán Nagy, László Kék, Zoltán Kincses, András Kiss, Péter Szolgay
2008 International journal of circuit theory and applications  
IBM has recently introduced the Cell Broadband Engine (Cell BE) Architecture, which contains 8 identical vector processors in an array structure.  ...  In this paper the implementation of CNN simulation kernel on the Cell BE is described.  ...  ACKNOWLEDGMENT The authors would like to thank Professor Tamás Roska for many helpful discussions and suggestions.  ... 
doi:10.1002/cta.508 fatcat:o6hbmupef5dqlgvaz6nkguexc4

The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor

Michael Gschwind
2007 International journal of parallel programming  
We describe how the heterogeneous cores allow to achieve this performance by parallelizing and offloading computation intensive application code onto the Synergistic Processor Element (SPE) cores using  ...  in many designs fails to exploit all the levels of available parallelism in many workloads for CMP systems.  ...  The Synergistic Processor Elements (SPEs) deliver the majority of a Cell BE system's compute performance.  ... 
doi:10.1007/s10766-007-0035-4 fatcat:zcndew73k5bevgjnuc3h3lbtya

Non-Preconditioned Conjugate Gradient on Cell and FPGA Based Hybrid Supercomputer Nodes

David DuBois, Andrew DuBois, Thomas Boorman, Carolyn Connor
2009 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines  
These nodes utilize the Cell Broadband Engine Architecture™ in conjunction with x86 Opteron™ processors from AMD.  ...  We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance.  ...  For most current generation processors, these load/store units are often the bottleneck in SMVM leaving the floating-point units underutilized.  ... 
doi:10.1109/fccm.2009.26 dblp:conf/fccm/DuBoisDBD09 fatcat:mrjgtbj76rhcnmmjjpbithvtxu
« Previous Showing results 1 — 15 out of 778 results