Filters








90,592 Hits in 3.3 sec

Pipelining Wavefront Computations: Experiences and Performance [chapter]

E Christopher Lewis, Lawrence Snyder
2000 Lecture Notes in Computer Science  
This work is the first assessment of the efficacy of these approaches in solving wavefront computations, and in the process, we reveal surprising characteristics of commercial compilers.  ...  We address this question through a quantitative and qualitative study of three approaches to expressing pipelining: programmer implemented via message passing, compiler discovered via automatic parallelization  ...  This research was supported in part by a grant of HPC time from the Arctic Region Supercomputing Center.  ... 
doi:10.1007/3-540-45591-4_35 fatcat:ie3izmvqfrbrfdjwjkliq6ewgy

High-Level Synthesis: Productivity, Performance, and Software Constraints

Yun Liang, Kyle Rupnow, Yinan Li, Dongbo Min, Minh N. Do, Deming Chen
2012 Journal of Electrical and Computer Engineering  
In this paper, we present an unbiased study of the performance, usability and productivity of HLS using AutoPilot (a state-of-the-art HLS tool).  ...  Then, to evaluate the suitability of HLS on real-world applications, we perform a case study of stereo matching, an active area of computer vision research that uses techniques also common for image denoising  ...  Acknowledgments This paper is supported by the Advanced Digital Sciences Center (ADSC) under a grant from the Agency for Science, Technology, and Research of Singapore.  ... 
doi:10.1155/2012/649057 fatcat:lvu2kniyyvaa7prpklymhslf5m

Solving Dynamic Programming Problem by Pipeline Implementation on GPU

Susumu Matsumae, Makoto Miyazaki
2018 International Journal of Advanced Computer Science and Applications  
In this paper, we show the effectiveness of a pipeline implementation of Dynamic Programming (DP) on GPU.  ...  In our approach, we solve the MCM problem on GPU in a pipeline fashion, i.e., we use GPU cores for supporting pipeline-stages so that many elements of the solution table are partially computed in parallel  ...  In this study, we consider adopting a pipeline technique and implementing the DP program on GPU in a pipeline fashion.  ... 
doi:10.14569/ijacsa.2018.091272 fatcat:up6tbwo6xjdutcaqz4mpjfp55m

Page 142 of Journal of Research and Practice in Information Technology Vol. 24, Issue 4 [page]

1992 Journal of Research and Practice in Information Technology  
A block diagram of a pipelined sub-array such as the Cytocomputer is illustrated in Figure 3 as a representative example.  ...  The general user’s model of the array is a synchronous linear (ring) pipeline.  ... 

Self-loop Pipelining and Reconfigurable Dataflow Arrays [chapter]

João M. P. Cardoso
2004 Lecture Notes in Computer Science  
In particular, we briefly present a novel technique for pipelining loops. Experiments with the technique confirm important improvements over the use of conventional loop pipelining.  ...  We introduce some data-driven reconfigurable arrays and summarize techniques to map imperative software programs to those architectures, some of them being focus of current research work.  ...  The author gratefully acknowledges the donation by PACT XPP Technologies, Inc, of the XPP development suite (XDS) software.  ... 
doi:10.1007/978-3-540-27776-7_25 fatcat:astqsk6q3fgr3jems6g7pqkh6q

Design of throughput-optimized arrays from recurrence abstractions

Arpith C. Jacob, Jeremy D. Buhler, Roger D. Chamberlain
2010 ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors  
We achieve a further 2× speedup by processor pipelining, with only a 37% increase in resources.  ...  Our approach is to exploit additional parallelism by pipelining multiple inputs on an array and multiple iteration vectors in a processing element.  ...  Chamberlain is a principal in BECS Technology, Inc.  ... 
doi:10.1109/asap.2010.5540753 dblp:conf/asap/JacobBC10 fatcat:blvol54tsrhuxkhg73k2iqazbe

Real-time sonar beamforming on high-performance distributed computers

Alan D George, Jeff Markwell, Ryan Fogarty
2000 Parallel Computing  
Rapid advancements in acoustical beamforming techniques for array signal processing are producing algorithms with increased levels of computational complexity.  ...  Concomitantly, autonomous arrays capable of performing most or all of the processing in situ have become a focus for mission-critical applications.  ...  Acknowledgements The support provided by the Oce of Naval Research on grant N00014-98-1-0188 is acknowledged and appreciated.  ... 
doi:10.1016/s0167-8191(00)00037-5 fatcat:mqgutembl5el5k3qdgjaglmz3q

Language Support for Pipelining Wavefront Computations [chapter]

Bradford L. Chamberlain, E.Christopher Lewis, Lawrence Snyder
2000 Lecture Notes in Computer Science  
A language-based approach is simple for the programmer yet unambiguously parallel. In this paper we introduce simple array language extensions that directly support wavefront computations.  ...  Wavefront computations, characterized by a data dependent flow of computation across a data space, are receiving increasing attention as an important class of parallel computations.  ...  We thank Sung-Eung Choi and Samuel Guyer for their comments on drafts of this paper. This research was supported by a grant of HPC time from the Arctic Region Supercomputing Center.  ... 
doi:10.1007/3-540-44905-1_20 fatcat:dsdwitryrzdc7pyb4jmrwxg6zy

Virtualizing Hardware with Multi-context Reconfigurable Arrays [chapter]

Rolf Enzler, Christian Plessl, Marco Platzner
2003 Lecture Notes in Computer Science  
A co-simulation framework enables cycleaccurate simulation of the complete architecture. As a case study we map an FIR filter to our virtualized hardware model and evaluate different designs.  ...  As a hardware implementation we present a hybrid multi-context architecture that attaches a coarse-grained reconfigurable array to a host CPU.  ...  While there exists already a substantial body of work on coarse-grained arrays, macro-pipelining of stream computations and multi-context devices, a system-level evaluation of the performance and the various  ... 
doi:10.1007/978-3-540-45234-8_16 fatcat:tx2yzbzoofenfbqyvjlkttgdga

Compiler-generated communication for pipelined FPGA applications

Heidi E. Ziegler, Mary W. Hall, Pedro C. Diniz
2003 Proceedings of the 40th conference on Design automation - DAC '03  
Our algorithm finds a solution in which transmitting a row of an array between pipeline stages per communication instance leads to a speedup of 1.76 over an implementation that communicates the entire  ...  program execution time by trading communication overhead with the amount of computation overlap in different stages.  ...  FPGA-based computing machines offer a unique opportunity for the realization of custom pipelining structures, matching the definition of the pipeline to the application requirements in terms of pipeline  ... 
doi:10.1145/775832.775986 dblp:conf/dac/ZieglerHD03 fatcat:ohvgbxmbwnccdhaaugfgxwzesm

Dynamic loop pipelining in data-driven architectures

Joäo M. P. Cardoso
2005 Proceedings of the 2nd conference on Computing frontiers - CF '05  
The results confirm improvements over the use of conventional loop pipelining techniques. Better performance and fewer resources are achieved in a number of cases.  ...  Data-driven array architectures seem to be important alternatives for coarse-grained reconfigurable computing platforms.  ...  The use of our approach in the presence of loop-carried array dependences requires further studies.  ... 
doi:10.1145/1062261.1062283 dblp:conf/cf/Cardoso05 fatcat:pvaf4c5y4bhqvdpt2sm2dfz22q

Compiler-generated communication for pipelined FPGA applications

Heidi E. Ziegler, Mary W. Hall, Pedro C. Diniz
2003 Proceedings of the 40th conference on Design automation - DAC '03  
Our algorithm finds a solution in which transmitting a row of an array between pipeline stages per communication instance leads to a speedup of 1.76 over an implementation that communicates the entire  ...  program execution time by trading communication overhead with the amount of computation overlap in different stages.  ...  FPGA-based computing machines offer a unique opportunity for the realization of custom pipelining structures, matching the definition of the pipeline to the application requirements in terms of pipeline  ... 
doi:10.1145/775983.775986 fatcat:gbjnrqjjdjeela4yrgwxfnruvy

MrPhi: An Optimized MapReduce Framework on Intel Xeon Phi Coprocessors

Mian Lu, Yun Liang, Huynh Phung Huynh, Zhongliang Ong, Bingsheng He, Rick Siow Mong Goh
2015 IEEE Transactions on Parallel and Distributed Systems  
In this work, we develop MrPhi, an optimized MapReduce framework on a heterogeneous computing platform, particularly equipped with multiple Intel Xeon Phi coprocessors.  ...  We propose a vectorization friendly technique and SIMD hash computation algorithms to utilize the SIMD vectors. Then we pipeline the map and reduce phases to improve the resource utilization.  ...  This work was partially supported by the National Natural Science Foundation of China (No. 61300005). Bingsheng He is partly supported by a MoE AcRF Tier 2 grant (MOE2012-T2-2-067) in Singapore.  ... 
doi:10.1109/tpds.2014.2365784 fatcat:o2tktcwqxrdk3e4qkjg7zgs7vu

VLSI Array processors

S. Kung
1985 IEEE ASSP Magazine  
The exploitation of the pipeline technique is often very natural in regular and locallyconnected networks; therefore, a major part of c9ncurrency in array processing will be derived from pipelining.  ...  Parallel array algorithm design is a new area of, research study that has profited from the theory of signals and systems and has been influenced by linear algebraic numerical methods.  ... 
doi:10.1109/massp.1985.1163741 fatcat:dc6jsvtgabar7ffr343nhdzqiy

A wire delay-tolerant reconfigurable unit for a clustered programmable-reconfigurable processor

Richard B. Kujoth, Chi-Wei Wang, Jeffrey J. Cook, Derek B. Gottlieb, Nicholas P. Carter
2007 Microprocessors and microsystems  
They support pipelining of wire delays by providing pipeline registers at the intersections between wires in the reconfigurable cluster, retiming buffers at the inputs and outputs of logic blocks, and  ...  Wire delay is rapidly becoming a major bottleneck in reconfigurable systems, creating a significant gap between the clock rates of reconfigurable logic and custom circuits.  ...  Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the ONR, NSF, or AMD.  ... 
doi:10.1016/j.micpro.2006.03.001 fatcat:izat3z4hdrfg3lzeyapmr44e5y
« Previous Showing results 1 — 15 out of 90,592 results