34 Hits in 3.3 sec

Topological transformations as a tool in the design of systolic networks

Karel Culik, Ivan Fris
1985 Theoretical Computer Science  
For example, we use the transformation technique to give a concise proof of a strengthened version of Leiserson's and Saxe's Retiming Lemma and Systolic Conversion Theorem.  ...  We show that the topological transformations on unrollings can be used to design systolic networks, to give simple proofs of their correctness, and to demonstrate the equivalence of different networks.  ...  the proof of [28, Theorem 6.3] it is also implicitly assumed that all the processors of the shuffle-exchange network have memory; however in [28, Fig. 6 .1 ] the selfloops are shown for the leftmost and  ... 
doi:10.1016/0304-3975(85)90091-x fatcat:miga4mm5z5dp5gm7wltdwqfjbm

An automated process for compiling dataflow graphs into reconfigurable hardware

R. Rinker, M. Carter, A. Patel, M. Chawathe, C. Ross, J. Hammes, W.A. Najjar, W. Bohm
2001 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
The system consists of an optimizing compiler which produces data ow graphs, and a data ow graph to VHDL translator.  ...  Abstract| W e describe a system, developed as part of the Cameron project, which compiles programs written in a single-assignment subset of C called SA-C into data ow graphs, and then into VHDL.  ...  by the hardware; these include code motion, constant folding, array and constant value propagation, and common subexpression elimination.  ... 
doi:10.1109/92.920828 fatcat:46w5mvdimzbmnm4utdwipzr3fq

Compilation for a high-performance systolic array

Thomas Gross, Monica S. Lam
1986 Proceedings of the 1986 SIGPLAN symposium on Compiler contruction - SIGPLAN '86  
We report on a compiler for Warp, a high-performance systolic array developed at Carnegie Mellon.  ...  This compiler enhances the uscfulncss of Warp significantly and allows application programmers to code substantial algorithms.  ...  Mosur, and P. Steenkiste, who all helped with the implementation of this compiler. H. Enderton, B. Siegel). and J. Webb are the first users of the compiler and suffered through the first releasea.  ... 
doi:10.1145/12276.13314 dblp:conf/sigplan/GrossL86 fatcat:zleptouad5cafksmztyav4zhxq

Accelerating cardiac cine MRI using a deep learning-based ESPIRiT reconstruction [article]

Christopher M. Sandino, Peng Lai, Shreyas S. Vasanawala, Joseph Y. Cheng
2020 arXiv   pre-print
DL-ESPIRiT is compared against a state-of-the-art parallel imaging and compressed sensing method known as l_1-ESPIRiT.  ...  Mid-systolic frames, y-t profiles, and segmentations of the left and right ventricles are shown here.  ...  , and 3) the unrolled algorithm is trained end-to-end in a supervised fashion.  ... 
arXiv:1911.05845v3 fatcat:n665uj2abbbstmgsshrf4rag7m

Tiling and optimizing time-iterated computations on periodic domains

Uday Bondhugula, Vinayaka Bandishti, Albert Cohen, Guillain Potron, Nicolas Vasilache
2014 Proceedings of the 23rd international conference on Parallel architectures and compilation - PACT '14  
Experimental results on the swim SPEC CPU2000fp benchmark show a speedup of 5× and 4.2× over the highest SPEC performance achieved by native compilers on Intel Xeon and AMD Opteron multicore SMP systems  ...  Our approach augments a state-of-the-art parallelization and localityenhancing algorithm from the polyhedral framework to allow timetiling of stencil computations on periodic domains.  ...  Sadayappan and Tobias Grosser for their important role during the early stages of this work. We are also thankful to the reviewers of PACT 2014 for their detailed comments.  ... 
doi:10.1145/2628071.2628106 dblp:conf/IEEEpact/BondhugulaBCPV14 fatcat:5csjr5drv5aqvjj6buyxke3bie

High Performance Computing with FPGAs and OpenCL [article]

Hamid Reza Zohouri
2019 arXiv   pre-print
Using High-Level Synthesis and a large set of optimization techniques, we show that FPGAs can achieve better performance than CPUs, and better power efficiency than both CPUs and GPUs for typical HPC workloads  ...  With support for high-order stencils, we achieve the highest single-FPGA performance for 2D and 3D stencil computation of any order, to this day.  ...  This attribute is specifically useful for streaming designs in form of multi-dimensional systolic array or single-dimensional ring architectures.  ... 
arXiv:1810.09773v4 fatcat:ziz6dhguxfdntjkorvxopkxp44

Understanding Reuse, Performance, and Hardware Cost of DNN Dataflows: A Data-Centric Approach Using MAESTRO [article]

Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, Tushar Krishna
2020 arXiv   pre-print
The data partitioning and scheduling strategies used by DNN accelerators to leverage reuse and perform staging are known as dataflow, and they directly impact the performance and energy efficiency of DNN  ...  execution time and energy efficiency for a DNN model and hardware configuration.  ...  ACKNOWLEDGEMENT We thank Joel Emer for insightful advice and constructive comments to improve this work; Vivienne Sze and Yu-Hsin Chen for their insights and taxonomy that motivated this work.  ... 
arXiv:1805.02566v6 fatcat:3656k7gkcbfbxewgcebz7v2wrq

Evaluation of the Raw Microprocessor

Michael Bedford Taylor, James Psota, Arvind Saraf, Nathan Shnidman, Volker Strumpen, Matt Frank, Saman Amarasinghe, Anant Agarwal, Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt (+4 others)
2004 SIGARCH Computer Architecture News  
ASIC designers manage wire delay inherent in large distributed arrays of function units in multiple steps. First, they place close together operations that need to communicate frequently.  ...  Management of Wires and Wire Delay: ASIC designers can place and wire communicating operations in ways that minimize wire delay, minimize latency, and maximize bandwidth.  ...  We use clapack version 3.0 [2] and a tuned BLAS implementation, AT-LAS [50] , version 3.4.2.  ... 
doi:10.1145/1028176.1006733 fatcat:rdy5winvrjdlvawhqj6wgiozai

Modeling, analysis and exploration of layers: A 3D computing architecture

Zoltan Endre Rakossy
2014 2014 22nd International Conference on Very Large Scale Integration (VLSI-SoC)  
During this time I have been accompanied and supported by many people. It is my great pleasure to take this opportunity to thank them.  ...  Acknowledgements This thesis is the result of my work as research assistant at the Institute for Communication Technologies and Embedded Systems (ICE), Multiprocessor System-on-Chip Architectures (MPSoC  ...  The focus of [28] is on emulation of systolic schedule for GR on REDEFINE and hence synthesis of systolic array on REDEFINE.  ... 
doi:10.1109/vlsi-soc.2014.7004167 dblp:conf/vlsi/Rakossy14 fatcat:6h3w3hfgdjeytitl7wnwrard34

Reconstruction techniques for cardiac cine MRI

Rosa-María Menchón-Lara, Federico Simmross-Wattenberg, Pablo Casaseca-de-la-Higuera, Marcos Martín-Fernández, Carlos Alberola-López
2019 Insights into Imaging  
Additionally, clinical relevance, main challenges, and future trends of this image modality are outlined.  ...  , unrolling the process into several stages, for instance [55, 56] .  ...  Coil array compression is crucial to reduce the computational cost of reconstructions [46] [47] [48] .  ... 
doi:10.1186/s13244-019-0754-2 pmid:31549235 pmcid:PMC6757088 fatcat:s5574wj5pjadhbq5rah7k3h6lu

An Efficient Hardware Design for Accelerating Sparse CNNs with NAS-based Models

Yun Liang, Liqiang Lu, Yicheng Jin, Jiaming Xie, Ruirui Huang, Jiansong Zhang, Wei Lin
2021 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
and memory requirement.  ...  Then we design an FPGA accelerator that features a tile lookup table (TLUT) and a channel multiplexer (CMUX).  ...  JQ19014) and in part by the Beijing Academy of Artificial Intelligence (BAAI). This work was also supported by Key-Area Research and Development Program of Guangdong Province (No. 2019B010155002)  ... 
doi:10.1109/tcad.2021.3066563 fatcat:vxqd4ez64zgxxcwuy5uq2txmpy

Parallelization of dynamic programming recurrences in computational biology

Arpith Jacob
Finally, I would like to thank my Father and Mother who along with my brother have been unceasing in their support and encouragement throughout my life. I dedicate this work to my family.  ...  All Theses and Dissertations (ETDs). 169. The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them.  ...  Conclusions We have analyzed and accelerated Nussinov RNA folding by building two systolic array designs.  ... 
doi:10.7936/k7fq9tnc fatcat:ua25ktlbuffgxnihr3iyir7cmq

Creating portable and efficient packet processing applications

Olivier Morandi, Fulvio Risso, Pierluigi Rolando, Silvio Valenti, Paolo Veglia
2011 Design automation for embedded systems  
Portability and efficiency are achieved altogether by virtualizing the hardware and by capturing in the programming model the peculiar characteristics of the application domain.  ...  Network processors are special-purpose programmable units deployed in many modern high-speed network devices, which combine flexibility and high performance.  ...  Acknowledgements The authors wish to thank all the people who were involved in this project, particularly the many students who contributed to the development of the NetVM framework, and all the (former  ... 
doi:10.1007/s10617-011-9072-8 fatcat:2fnuiaefyba25bxovdyi4zf46q

Modern Computational Techniques for the HMMER Sequence Analysis

Xiandong Meng, Yanqing Ji
2013 ISRN Bioinformatics  
The characteristics of the sequence analysis, such as data and compute-intensive natures, make it very attractive to optimize and parallelize by using both traditional software approach and innovated hardware  ...  This paper focuses on the latest research and critical reviews on modern computing architectures, software and hardware accelerated algorithms for bioinformatics data analysis with an emphasis on one of  ...  ., master) for protein folding work, and every time they finish their local work, they contact the server again for extra work.  ... 
doi:10.1155/2013/252183 pmid:25937944 pmcid:PMC4393056 fatcat:kdo43qa23je4zflfpkvkhoxkfu

End of year report for parallel vision algorithm design and implementation : January 15, 1987-January 14, 1988

Takeo Kanade, Jon A. Webb
version of Warp being designed by Carnegie Mellon and Intel.  ...  A cell's local memory is represented by several large arrays; systolic communication is simulated by explicit data movement in and out of these arrays.  ... 
doi:10.1184/r1/6554729 fatcat:bhcxgjkq65bkllegvgnrqdcmk4
« Previous Showing results 1 — 15 out of 34 results