432 Hits in 6.7 sec

On the efficiency of reductions in μ-SIMD media extensions

J. Corbal, R. Espasa, M. Valero
Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques  
On the other hand, this paper demonstrates that longer SIMD media extensions such as MOM can take great advantage of accumulators by exploiting the associative parallelism implicit in reductions.  ...  To overcome the problem of reductions in ¢ -SIMD ISAs, designers tend to include more and more complex instructions able to deal with the most common forms of reductions in multimedia.  ...  Since the first generation of media extensions, with registers of limited width (32-64 bits) and only integer arithmetic (MMX [1] ), new extensions have introduced wider ¢ -SIMD registers (128 bits in  ... 
doi:10.1109/pact.2001.953290 dblp:conf/IEEEpact/CorbalEV01 fatcat:et3dm4egs5azbibzbkoau5karm

Bottlenecks in multimedia processing with SIMD style extensions and architectural enhancements

D. Talla, L.K. John, D. Burger
2003 IEEE transactions on computers  
Multimedia SIMD extensions such as MMX and AltiVec speedup media processing, however, our characterization shows that the attributes of current general-purpose processors enhanced with SIMD extensions  ...  in the underutilization of SIMD execution units (only 1-12% of the peak SIMD execution units' throughput is achieved).  ...  Acknowledgments: We thank members of the Laboratory for Computer Architecture for their comments and suggestions that improved several drafts of this paper.  ... 
doi:10.1109/tc.2003.1223637 fatcat:ks5jycooazbl5ii4vntzejlv4i

Compiler-Assisted Compaction/Restoration of SIMD Instructions

Juan M. Cebrian, Thibaud Balem, Adrian Barredo, Marc Casas, Miquel Moreto Planas, Alberto Ros, Alexandra Jimborean
2021 IEEE Transactions on Parallel and Distributed Systems  
., SIMD or GPUs) are ubiquitous in high performance systems.  ...  Since the trend is that vector register size increases, the energy efficiency of exascale computing systems will become sub-optimal.  ...  , RTI2018-098156-B-C53), the ECHO and RoMoL ERC projects (819134, 321253), the European HiPEAC Network and the Mont-Blanc 2020 project (EU-FP7-610402 and EU-H2020-779877). and the Spanish Ministry of Economy  ... 
doi:10.1109/tpds.2021.3091015 fatcat:4xczuwqt5jaqre6reagcfg6s5i

Quantized color instruction set for media-on-demand applications

Jongmyon Kim, D.S. Wills
2003 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)  
In addition, QCPX results in a higher system utilization in excess of 95% due to a significant reduction of conditional instructions.  ...  This paper presents Quantized Color Pack eXtension (QCPX) ISA to accelerate performance of pixel-oriented media processing applications.  ...  CONCLUTIONS This paper examines the impact of the QCPX instruction set for several commonly used media applications on a SIMD pixel processor architecture.  ... 
doi:10.1109/icme.2003.1220874 dblp:conf/icmcs/KimW03 fatcat:5qrjtpultjdencr3cqlv4b5pzu

GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks [article]

Amir Yazdanbakhsh, Hajar Falahati, Philip J. Wolfe, Kambiz Samadi, Nam Sung Kim, Hadi Esmaeilzadeh
2018 arXiv   pre-print
GANs are on the frontier as further extension of deep learning into many domains (e.g., medicine, robotics, content synthesis) requires massive sets of labeled data that is generally either unavailable  ...  The reordering breaks the full SIMD execution model, which is prominent in convolution accelerators.  ...  This work was in part supported by NSF awards CNS#1703812, ECCS#1609823, CCF#1553192, Air Force Office of Scientific Research (AFOSR) Young Investigator Program (YIP) award #FA9550-17-1-0274, NSF-1705047  ... 
arXiv:1806.01107v1 fatcat:6q743mpn65c63i3y7wkdqelzfq

A DSP-Enhanced 32-Bit Embedded Microprocessor [chapter]

Hyun-Gyu Kim, Hyeong-Cheol Oh
2005 Lecture Notes in Computer Science  
In this paper, we propose a DSP-enhanced embedded microprocessor based on the 32-bit EISC architecture.  ...  Our simulations and experiments show that the proposed DSP-enhanced processor reduces the average execution time of the DSP kernels considered in this work by 47.8% and the DSP applications by 29.3%.  ...  Acknowledgements The authors wish to acknowledge the CAD tool support of IDEC (IC Design Education Center), Korea and the financial support of Advanced Digital Chips Inc., Korea.  ... 
doi:10.1007/11596356_5 fatcat:247apidt2nd53n4vuwocgnijfa

Avoiding Conversion and Rearrangement Overhead in SIMD Architectures

Asadollah Shahbahrami, Ben Juurlink, Demid Borodin, Stamatis Vassiliadis
2006 International journal of parallel programming  
Experimental results obtained by extending the SimpleScalar toolset show that these techniques improve performance by up to a factor of 4.5 compared to a conventional SIMD instruction set extension.  ...  In this paper, we employ two techniques to overcome this limitation. The first technique, extended subwords, uses four extra bits for every byte in a media register.  ...  In order to employ SIMD instructions in 2D algorithms, the matrix needs to be transposed frequently. On current SIMD extensions, however, transposition takes a significant amount of time.  ... 
doi:10.1007/s10766-006-0015-0 fatcat:j2eavkv3engmbcjtiqtcqiztoi

Accelerating Color Space Conversion Using Extended Subwords and the Matrix Register File

Asadollah Shahbahrami, Ben Juurlink, Stamatis Vassiliadis
2006 Eighth IEEE International Symposium on Multimedia (ISM'06)  
When implemented using SIMD instructions, however, the performance improvement is often limited due to two reasons.  ...  These techniques avoid rearrangement instructions and increase the number of subwords that are processed in parallel. Experimental results have been obtained by extending the SimpleScalar toolset.  ...  Color space conversion, however, has certain characteristics which make it difficult to implement it efficiently using existing SIMD extensions such as MMX [8] and SSE [9] .  ... 
doi:10.1109/ism.2006.16 dblp:conf/ism/ShahbahramiJV06 fatcat:aq7dkprrhjbvbjmzqfvexaehby

Embedded System Hardware [chapter]

Peter Marwedel
2011 Embedded System Design  
CPU when not in use, while monitoring interrupts SLEEP: Shutdown of on- chip activity RUN SLEEP IDLE 400mW 160µW 50mW 90µs 10µs 10µs 160ms Example: STRONGARM SA1100 Power fault  ...  Multimedia-Instructions, Short vector extensions, Streaming extensions, SIMD instructions  Multimedia instructions exploit that many registers, adders etc are quite wide (32/64 bit), whereas most multimedia  ... 
doi:10.1007/978-94-007-0257-8_3 fatcat:i7rwxc373zhqhkisnkox5ctgeu

Implementation of stereophonic acoustic echo canceller on nVIDIA GeForce graphics processing unit

Akihiro Hirano, Kenji Nakayama
2009 2009 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)  
This paper presents an implementation of a stereophonic acoustic echo canceller on nVIDIA GeForce graphics processor and CUDA software development environment.  ...  For efficiency, fast shared memory has been used as much as possilbe. A tree adder is introduced to reduce the cost for summing thread outputs up.  ...  Intel IA-32 architectures [4] have MMX (Multi Media eXtension) and also SSE (Streaming Single instruction multiple data Extension, Streaming SIMD Extension).  ... 
doi:10.1109/ispacs.2009.5383842 fatcat:xwhsnwovfzhrbnbnstpql7oq5i

Exploiting motion estimation resilience to approximated metrics on SIMD-capable general processors: From Atom to Nehalem

Steven Pigeon, Stephane Coulombe
2010 2010 25th Biennial Symposium on Communications  
In this paper, we extend previous work by further exploring efficient implementation of approximate fast metrics for motion estimation.  ...  Now, they are directed toward the shrewd exploitation of the machine's advanced architectural features such as multimedia extensions, especially for the computation of the error metric which is known to  ...  In modern processors, this means that full advantage must be taken of any machine-specific ISA extension, and, in particular, multimedia and single instruction, multiple data (SIMD) extensions.  ... 
doi:10.1109/bsc.2010.5473009 fatcat:btr6w4getjfl3kbq3aqvssvpba

On the Scalability of 1- and 2-Dimensional SIMD Extensions for Multimedia Applications

F. Sanchez, M. Alvarez, E. Salami, A. Ramirez, M. Valero
2005 IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.  
In this paper we perform a scalability analysis of SIMD extensions for multimedia applications.  ...  SIMD extensions are the most common technique used in current processors for multimedia computing. In order to obtain more performance for emerging applications SIMD extensions need to be scaled.  ...  ACKNOWLEDGMENT This work has been supported by the Ministry of Science and Technology of Spain, the European Union (FEDER funds) under contract TIC2001-0995-C02-01 and TIC2004-07739-C02-01, by the HiPEAC  ... 
doi:10.1109/ispass.2005.1430571 dblp:conf/ispass/SanchezASRV05 fatcat:fnr3t7uibbfgbd2rn4tleat3s4

High-throughput fuzzy clustering on heterogeneous architectures

Juan M. Cebrian, Baldomero Imbernón, Jesús Soto, José M. García, José M. Cecilia
2020 Future generations computer systems  
The efficient analysis of this data deluge is becoming mandatory in order to transform it into meaningful information.  ...  The Internet of Things (IoT) is pushing the next economic revolution in which the main players are data and immediacy.  ...  Acknowledgments This work was partially supported by the Fundación Séneca del Centro de Coordinación de la Investigación de la Región de Murcia under Project 20813/PI/18, and by Spanish Ministry of Science  ... 
doi:10.1016/j.future.2020.01.022 fatcat:e24gdvrrwvbrnaq5gyophl3wga

Computationally efficient implementation of sparse-tap FIR adaptive filters with tap-position control on Intel IA-32 processors

Akihiro Hirano, Kenji Nakayama
2009 2008 International Symposium on Intelligent Signal Processing and Communications Systems  
This paper presents an computationally efficient implementation of sparse-tap FIR adaptive filters with tapposition control on Intel IA-32 processors with single-instruction multiple-data (SIMD) capability  ...  A dynamic register allocation and the use of memory-to-register operations help the maximization of the loop-unrolling level. Up to 66percent speedup is achieved.  ...  Intel IA-32 architectures [3] have MMX (Multi Media eXtention) and also SSE (Streaming Single instruction multiple data Extension, Streaming SIMD Extension).  ... 
doi:10.1109/ispacs.2009.4806758 fatcat:ke6jsrhlcfesdfykdaek4o2f44

The TigerSHARC DSP architecture

J. Fridman, Z. Greenfield
2000 IEEE Micro  
(Two-dimensional extensions of these algorithms, such as 2D filtering and convolution used in imaging, can also be solved using extensions to the techniques presented here.)  ...  And, as a result, non-SIMD execution is required to achieve high efficiency.  ...  Acknowledgments The material presented in this article represents the work of a very large group of people at Analog Devices, including the software tools, product engineering, Israel design teams, and  ... 
doi:10.1109/40.820055 fatcat:2dgje6lpqjhu3hqeu2befusmvm
« Previous Showing results 1 — 15 out of 432 results