Filters








9,077 Hits in 1.4 sec

Data transformations enabling loop vectorization on multithreaded data parallel architectures

Byunghyun Jang, Perhaad Mistry, Dana Schaa, Rodrigo Dominguez, David Kaeli
2010 Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10  
This paper describes data transformations that allow us to vectorize loops targeting massively multithreaded data parallel architectures.  ...  Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memory access patterns in  ...  However, due to the underlying GPU memory architecture, applications can experience long stalls if they possess irregular memory access patterns.  ... 
doi:10.1145/1693453.1693510 dblp:conf/ppopp/JangMSDK10 fatcat:6ct54ys65rbrfaw7kz6t3lnhqi

Data transformations enabling loop vectorization on multithreaded data parallel architectures

Byunghyun Jang, Perhaad Mistry, Dana Schaa, Rodrigo Dominguez, David Kaeli
2010 SIGPLAN notices  
This paper describes data transformations that allow us to vectorize loops targeting massively multithreaded data parallel architectures.  ...  Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memory access patterns in  ...  However, due to the underlying GPU memory architecture, applications can experience long stalls if they possess irregular memory access patterns.  ... 
doi:10.1145/1837853.1693510 fatcat:xmy2wjph3rgytkeuveprvix2ba

Multithreaded vector architectures

R. Espasa, M. Valero
Proceedings Third International Symposium on High-Performance Computer Architecture  
a multithreaded vector architecture.  ...  The purpose of this paper is to show that multithreading techniques can be applied to a vector processor to greatly increase processor throughput and maximize resource utilization.  ...  The Multithreaded Vector Architecture The multithreaded vector architecture we propose is modeled after a Convex C3400 architecture.  ... 
doi:10.1109/hpca.1997.569677 dblp:conf/hpca/EspasaV97 fatcat:7yrgay4xvfgm5dj3vxgf7ffcfa

Simultaneous multithreaded vector architecture: merging ILP and DLP for high performance

R. Espasa, M. Valero
Proceedings Fourth International Conference on High-Performance Computing  
h t t p: // w w w. ac . u pc . es/ h pc The goal of this p a p e r is to show that instruction level parallelism (ILP) and data-level parallelism (DLP) can be merged i n a single simultaneous vector multithreaded  ...  architecture t o execute regular vectorizable code at a performance level that can not be achieved using either paradigm on its own.  ...  Figure 1 : 1 The Simultaneous Multithreaded vector architecture.  ... 
doi:10.1109/hipc.1997.634514 dblp:conf/hipc/EspasaV97 fatcat:hv4xaa5xpbga7fknram3tjspsy

Analyzing Performance and Power of Multicore Architecture Using Multithreaded Iterative Solver

M.
2010 Journal of Computer Science  
Conclusion/Recommendations: In this study, we explored the performance and power characteristics of scientific algorithms on multicore architectures using a multithreaded version of sparse iterative linear  ...  Results: As a result, multicore computing architectures had been proposed and several products are already available.  ...  Then, we describe our simulation environments and multithreaded iterative solver. Experimental results of multithreaded iterative solver on multicore architectures are following.  ... 
doi:10.3844/jcssp.2010.406.412 fatcat:c2nvpaoszjhlhhypqf5bunqhim

A Low-Power Multithreaded Processor for Baseband Communication Systems [chapter]

Michael Schulte, John Glossner, Suman Mamidi, Mayan Moudgill, Stamatis Vassiliadis
2004 Lecture Notes in Computer Science  
Using a super-computer class vectorizing compiler, the processor achieves real-time performance on a 2Mbps WCDMA transmission system.  ...  The processor uses token triggered threading, SIMD vector processing, and powerful compound instructions to provide real-time baseband processing capabilities with very low power consumption.  ...  In this paper, we describe a compound instruction set architecture and an ultra low power multithreaded microarchitecture, in which multithreading is utilized to reduce power consumption and simplifying  ... 
doi:10.1007/978-3-540-27776-7_41 fatcat:tqe3lzqjorcepmx7eij2fhya2a

Design and Implementation of the Multimedia Operation Mechanism for Responsive Multithreaded Processor

Tsutomu Itou, Nobuyuki Yamasaki
2005 Journal of Robotics and Mechatronics  
<I>RMT Processor</I> architecture is based on eight-way prioritized simultaneous multithreading, which executes each thread in order of priority.  ...  <I>Responsive Multithreaded (RMT) Processor</I> is designed for distributed real-time systems. This paper focuses on the multimedia processing architecture of <I>RMT Processor</I>.  ...  Simultaneous Multithreading The architecture of RMT PU is based on Simultaneous Multithreading (SMT) mechanism [2, 3] . Multiple threads are executed in parallel in the SMT architecture.  ... 
doi:10.20965/jrm.2005.p0456 fatcat:q2iu3nrmzjcxnfvdlhkxtjudvq

A Low-Power Multithreaded Processor for Software Defined Radio

Michael Schulte, John Glossner, Sanjay Jinturkar, Mayan Moudgill, Suman Mamidi, Stamatis Vassiliadis
2006 Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology  
In this paper, we present the design of the Sandblaster Processor, a low-power multithreaded digital signal processor for software defined radio.  ...  We describe the processor's architecture and microarchitecture, along with various techniques for achieving high performance and low power dissipation.  ...  processors can dissipate less power than single-threaded architectures, and presents various techniques for further reducing power in multithreaded systems. [29] presents new techniques for multithreading  ... 
doi:10.1007/s11265-006-7267-1 fatcat:nfqhlyks6bhmnfyg2opfpsm4xy

Collaborative Multithreading: An Open Scalable Processor Architecture for Embedded Multimedia Applications

Wei-chun Ku, Shu-hsuan Chou, Jui-chin Chu, Chih-heng Kang, Tien-fu Chen, Jiun-in Guo
2006 2006 IEEE International Conference on Multimedia and Expo  
The architecture concurrently executes a main thread and several accelerative threads, coordinated by the main thread.  ...  Our results show that the proposed architecture provides area and performance advantages for embedded multimedia applications.  ...  architecture Multithread architecture Coarse grained Fine grained SMT Vector MT UniCore VisoMT Thread Switch Costly stall Every cycle intermix Every cycle Every cycle Thread parallelism No No Yes Yes  ... 
doi:10.1109/icme.2006.262505 dblp:conf/icmcs/KuCCKCG06 fatcat:bsuqbiln5rcxzpl5pto7bfo5im

LU factorization using multithreaded system

Mohammad Osama Badawy, Yasser Y. Hanafy, Ramy Eltarras
2012 2012 22nd International Conference on Computer Theory and Applications (ICCTA)  
We use the multithreaded approach over multicore architecture to achieve the estimations made.  ...  The model was implemented over three different architectures to evaluate the cost of executing the Multithreaded proposed model: two multicore architectures, one of two cores over one chip, and the other  ... 
doi:10.1109/iccta.2012.6523540 fatcat:wyaf6irrmbb4jh6q3h3y25s6qy

Exploiting instruction- and data-level parallelism

R. Espasa, M. Valero
1997 IEEE Micro  
Simultaneous multithreaded vector architectures combine the best of data-level and instruction-level parallelism and perform better than either approach could separately.  ...  Figure 3 . 3 Simultaneous multithreaded vector architecture. Figure 5 . 5 Performance of several architecture paradigms for five possible configurations. .63.  ...  Curve DLP + OOO represents a vector architecture augmented with out-of-order execution and register renaming. 7 Curve DLP + MTH represents a vector architecture augmented with multithreading (but no  ... 
doi:10.1109/40.621210 fatcat:5oanmvkc3vfe7lq3w4jcdbkmjy

Increasing data reuse of sparse algebra codes on simultaneous multithreading architectures

J. C. Pichel, D. B. Heras, J. C. Cabaleiro, F. F. Rivera
2009 Concurrency and Computation  
In this paper the problem of the locality of sparse algebra codes on simultaneous multithreading architectures is studied.  ...  The technique has been tested, first, using a simulator of a simultaneous multithreading architecture, and subsequently, on a real architecture as Intel's Hyper-Threading.  ...  INTRODUCTION Simultaneous Multithreading (SMT) is one of the most successful implementations of the multithreaded architectures.  ... 
doi:10.1002/cpe.1404 fatcat:2sk2h74y4jabzggxihhewjr3x4

Sandbridge Software Tools [chapter]

John Glossner, Sean Dorward, Sanjay Jinturkar, Mayan Moudgill, Erdem Hokenek, Michael Schulte, Stamatis Vassiliadis
2005 Lecture Notes in Computer Science  
We describe the generation of the simulation environment for the Sandbridge Sandblaster multithreaded processor.  ...  The processor model is described using the Sandblaster architecture Description Language (SaDL), which is implemented as python objects.  ...  The Delft-Java architecture, designed in 1996, introduced the concept of dynamic translation of Java code into a multithreaded RISC-based machine with Vector SIMD DSP operations [10] [11] .  ... 
doi:10.1007/11512622_29 fatcat:l3sd7vacdfgu3e5fm6g27x2pei

Architecture Support for Reconfigurable Multithreaded Processors in Programmable Communication Systems

Suman Mamidi, Michael J. Schulte, Daniel Iancu, John Glossner
2007 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP)  
This paper dis-real-time constraints. cusses architectural support to facilitate management and Explicitly multithreaded processors and reconfigurable sharing ofPHAs on a multithreaded system.  ...  These observations motivate augmenting multithreaded proces-  ...  It presents the architectural support necessary closely coupled to multithreaded processors.  ... 
doi:10.1109/asap.2007.4430000 dblp:conf/asap/MamidiSIG07 fatcat:54jcvymgabecvlndtf3yza5uim

Memory model effects on application performance for a lightweight multithreaded architecture

Sheng Li, Shannon Kuntz, Peter Kogge, Jay Brockman
2008 Proceedings, International Parallel and Distributed Processing Symposium (IPDPS)  
Outline Lightweight Multithreaded Architecture and the memory model Simulation methodology Results Conclusions Compare PGAS vs.  ...  Multiplication of an N*N dense matrix and an N vector. Threads are distributed uniformly across the LPCs.  ... 
doi:10.1109/ipdps.2008.4536356 dblp:conf/ipps/LiKKB08 fatcat:byimpdqz6rbxvg2yoizqylp7l4
« Previous Showing results 1 — 15 out of 9,077 results