7,027 Hits in 3.7 sec

Vector Parallelism in JavaScript: Language and Compiler Support for SIMD

Ivan Jibaja, Peter Jensen, Ningxin Hu, Mohammad R. Haghighat, John McCutchan, Dan Gohman, Stephen M. Blackburn, Kathryn S. McKinley
2015 2015 International Conference on Parallel Architecture and Compilation (PACT)  
The performance and energy usage of these applications benefit from hardware parallelism, including SIMD (Single Instruction, Multiple Data) vector parallel instructions.  ...  The design principles seek portability, SIMD performance portability on various SIMD architectures, and compiler simplicity to ease adoption.  ...  Each SIMD types has four to sixteen lanes, which correspond to degrees of SIMD parallelism. Each element of a SIMD vector is a lane. Indices are required to access the lanes of vectors.  ... 
doi:10.1109/pact.2015.33 dblp:conf/IEEEpact/JibajaJHHMGBM15 fatcat:yzgca7enjngwdlwzfo47v5k5zq

Automatic vectorization using dynamic compilation and tree pattern matching technique in Jikes RVM

Sara El-Shobaky, Ahmed El-Mahdy, Ahmed El-Nahas
2009 Proceedings of the 4th workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems - ICOOOLPS '09  
Moreover virtual machines, such as JVMs, are currently widely used for increasing the portability of programs across different platforms; performing SIMDization on these virtual machines would further  ...  Modern processors incorporate SIMD instructions to improve the performance of multimedia applications. Vectorizing compilers are therefore sought to efficiently generate SIMD instructions.  ...  SIMD instructions are exploited by packing four byte instructions into one SIMD instruction.  ... 
doi:10.1145/1565824.1565833 dblp:conf/ecoop/ElshobakyEE09 fatcat:uuumvuemknbmhkftv4m5acr33e

Finding the Next Computational Model: Experience with the UCSC Kestrel

Richard Hughey, Andrea Di Blas
2007 Journal of Signal Processing Systems  
Flexibility and performance continued to increase with new machines from research projects and industry.  ...  Biological sequence analysis has long been a standard problem for application-specific processing and all other forms of high-performance computing.  ...  Acknowledgements The authors thank the many contributors to the hardware, software, and algorithms of the The authors also thank the National Science Foundation, Affymax, and the University of California  ... 
doi:10.1007/s11265-007-0130-1 fatcat:wdghbbthsnae7hzoecump77omy

Retargetable code optimization with SIMD instructions

Manuel Hohenauer, Christoph Schumacher, Rainer Leupers, Gerd Ascheid, Heinrich Meyr, Hans van Someren
2006 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis - CODES+ISSS '06  
This paper focuses on target machines with SIMD instruction support which is widespread in embedded processors for multimedia applications.  ...  One frequent concern about retargetable compilers, though, is their lack of machine-specific code optimization techniques in order to achieve highest code quality.  ...  In addition, the last four lines in table 2 give results for the four more complex DSP routines.  ... 
doi:10.1145/1176254.1176291 dblp:conf/codes/HohenauerSLAMS06 fatcat:g7nfyugb6nckxcxhzbizxaw3sq

Evaluating GPUs for network packet signature matching

Randy Smith, Neelam Goyal, Justin Ormont, Karthikeyan Sankaralingam, Cristian Estan
2009 2009 IEEE International Symposium on Performance Analysis of Systems and Software  
We first present a detailed architectural and microarchitectural analysis, showing that signature matching is well suited for SIMD processing because of regular control flow and parallelism available at  ...  However, the recent transition to complex regular-expression based signatures coupled with ever-increasing network speeds has rapidly increased the performance requirements of signature matching.  ...  Application analysis The four main components of the signature matching module are (1) a state machine for the set of patterns to be detected, (2) auxiliary data maintained for each packet as it is processed  ... 
doi:10.1109/ispass.2009.4919649 dblp:conf/ispass/SmithGOSE09 fatcat:nzojf7u5wfdf7olljanxotzsr4

A new SIMD iterative connected component labeling algorithm

Lionel Lacassagne, Laurent Cabaret, Daniel Etiemble, Farouk Hebache, Andrea Petreto
2016 Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector Processing - WPMVP '16  
The performance of this algorithm is compared to those of State-of-the-Art two-pass direct algorithms.  ...  We show that thanks to the parallelism of the SIMD multi-core processors and an activity matrix that avoids useless memory access, such algorithms have performance that comes closer and closer to direct  ...  Acknowledgements The authors would like to thank, Francois Hannebicq from Intel France, for the access to State-of-the-Art machines and Zakhar A.  ... 
doi:10.1145/2870650.2870652 dblp:conf/ppopp/LacassagneCEHP16 fatcat:vuifiecq6regzlwn64ue6swcdu

FFT algorithms for SIMD parallel processing systems

Leah H. Jamieson, Philip T. Mueller, Howard Jay Siegel
1986 Journal of Parallel and Distributed Computing  
The ability of various interconnection networks presented in the literature to perform the needed transfers is examined.  ...  Parallel structurings of algorithms for efficient computation for a variety of machine size/problem size combinations are presented and analyzed.  ...  SIMD MACHINE MODEL The SIMD machine model assumed for the algorithms includes a set of PEs, a control unit, and an interconnection network [26] .  ... 
doi:10.1016/0743-7315(86)90027-4 fatcat:qd33uckxa5ab3nunklekvu5iwu

Vectorization technology to improve interpreter performance

Erven Rohou, Kevin Williams, David Yuste
2013 ACM Transactions on Architecture and Code Optimization (TACO)  
The main performance penalty in interpreters arises from instruction dispatch. Each bytecode requires a minimum number of machine instructions to be executed.  ...  However, the space of search for superinstructions is limited due to time constraints. Aggressive analysis of the code and powerful transformations are out of their reach.  ...  Detailed Performance Analysis Our main objective is performance, i.e. the total runtime of applications.  ... 
doi:10.1145/2400682.2400685 fatcat:wzp5leogynemfhtyuwrx2w74lm

DFT Performance Prediction in FFTW [chapter]

Liang Gu, Xiaoming Li
2010 Lecture Notes in Computer Science  
Our technique adapts to different architectures and automatically predicts the performance of DFT algorithms and codelets (including SIMD codelets).  ...  Our experiments show that this technique renders DFT implementations that achieve more than 95% of the performance with the original FFTW and uses less than 5% of the search overhead on four test platforms  ...  FFTW uses two implementation schemes, SIMD with vector length of two and SIMD with vector length of four.  ... 
doi:10.1007/978-3-642-13374-9_10 fatcat:tu5qvd3sqzcoxphl2e5y5jbvka

The Rewrite Rule Machine node architecture and its performance [chapter]

Patrick Lincoln, José Meseguer, Livio Ricciulli
1994 Lecture Notes in Computer Science  
The Rewrite Rule Machine (RRM) is a massively parallel MIMD/SIMD computer designed with the explicit purpose of supporting veryhigh-level parallel programming with rewrite rules.  ...  The RRM's node architecture consists of a SIMD processor, a SIMD controller, local memory, and network and I/O interfaces.  ...  Acknowledgments We are saddened by the untimely loss of our colleague and friend Dr. Sany Leinwand.  ... 
doi:10.1007/3-540-58430-7_45 fatcat:btpgwvpicngxfbpzks7dxpyhvm

Database Scan Variants on Modern CPUs: A Performance Study [chapter]

David Broneske, Sebastian Breß, Gunter Saake
2015 Lecture Notes in Computer Science  
In this paper, we extend prior studies by an in-depth performance analysis of different variants of the scan operator.  ...  However, it is still not clear how the combination of code optimizations (e.g., loop unrolling and vectorization) will affect the performance of database algorithms on different processors.  ...  In our in-depth performance analysis, we analyze the impact of four common code optimizations -loop unrolling, branch-free code, vectorization, and parallelization -and all of their combinations.  ... 
doi:10.1007/978-3-319-13960-9_8 fatcat:rrhvpdrdvvej5iqzokqtds45yi

Suitability of GCM Physics for Execution on SIMD Parallel Computers

Leon Rotstayn, Rhys Francis, David Abramson, Martin Dix
1993 Journal of the Meteorological Society of Japan  
Overall, we found a performance penalty of only 15% to 20% for SIMD compared to MIMD execution.  ...  A critical consideration in moving a model to a SIMD architecture is the efficiency of the model's physical parameterizations on this type of machine.  ...  This work was conducted as part of the Division of Information Technology High Performance Computation Program.  ... 
doi:10.2151/jmsj1965.71.2_297 fatcat:avufcypterec3oyosbb4gk4gii

Page 1029 of IEEE Transactions on Computers Vol. 52, Issue 8 [page]

2003 IEEE Transactions on Computers  
The major findings of the bottleneck analysis are: e Approximately 75 to 85 percent of instructions in the dynamic instruction stream of media workloads are not performing true/core computations.  ...  We perform a comprehensive detection of bottlenecks in SIMD-style extensions.  ... 

The UCSC Kestrel Application-Unspecific Processor

Richard Hughey, Andrea Blas
2006 IEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP'06)  
The UCSC Kestrel parallel processor is part of an evolution from application-specific to specialized to applicationunspecific processing.  ...  Experience with Kestrel indicates that programmable systolic processing, and its natural combination with the Single Instruction-Multiple Data (SIMD) parallel architecture, will be an effective design  ...  One way of comparing sequence analysis performance is in terms of performance per transistor [4] .  ... 
doi:10.1109/asap.2006.66 dblp:conf/asap/HugheyB06 fatcat:zl7dyipt6zaktbxr4tkrs3wuxq

Digital Media Indexing on the Cell Processor

Lurng-Kuo Liu, Qiang Liu, Apostol Natsev, Kenneth A. Ross, John R. Smith, Ana Lucia Varbanescu
2007 Multimedia and Expo, 2007 IEEE International Conference on  
There are two aspects of the target application that require significant computing power: image analysis for feature extraction, and Support Vector Machine (SVM) based pattern classification for concept  ...  We discuss how the synergistic processing units of a CBE can be used to gain dramatic performance improvements.  ...  It uses multi-modal machine learning techniques to bridge the semantic gap for multimedia content analysis and retrieval [1, 2, 3, 4] .  ... 
doi:10.1109/icme.2007.4285038 dblp:conf/icmcs/LiuLNRSV07 fatcat:gtui4r26gjey5d4noiidv53rmi
« Previous Showing results 1 — 15 out of 7,027 results