A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Topological transformations as a tool in the design of systolic networks
1985
Theoretical Computer Science
For example, we use the transformation technique to give a concise proof of a strengthened version of Leiserson's and Saxe's Retiming Lemma and Systolic Conversion Theorem. ...
We show that the topological transformations on unrollings can be used to design systolic networks, to give simple proofs of their correctness, and to demonstrate the equivalence of different networks. ...
the proof of [28, Theorem 6.3] it is also implicitly assumed that all the processors of the shuffle-exchange network have memory; however in [28, Fig. 6 .1 ] the selfloops are shown for the leftmost and ...
doi:10.1016/0304-3975(85)90091-x
fatcat:miga4mm5z5dp5gm7wltdwqfjbm
An automated process for compiling dataflow graphs into reconfigurable hardware
2001
IEEE Transactions on Very Large Scale Integration (vlsi) Systems
The system consists of an optimizing compiler which produces data ow graphs, and a data ow graph to VHDL translator. ...
Abstract| W e describe a system, developed as part of the Cameron project, which compiles programs written in a single-assignment subset of C called SA-C into data ow graphs, and then into VHDL. ...
by the hardware; these include code motion, constant folding, array and constant value propagation, and common subexpression elimination. ...
doi:10.1109/92.920828
fatcat:46w5mvdimzbmnm4utdwipzr3fq
Compilation for a high-performance systolic array
1986
Proceedings of the 1986 SIGPLAN symposium on Compiler contruction - SIGPLAN '86
We report on a compiler for Warp, a high-performance systolic array developed at Carnegie Mellon. ...
This compiler enhances the uscfulncss of Warp significantly and allows application programmers to code substantial algorithms. ...
Mosur, and P. Steenkiste, who all helped with the implementation of this compiler. H. Enderton, B. Siegel). and J. Webb are the first users of the compiler and suffered through the first releasea. ...
doi:10.1145/12276.13314
dblp:conf/sigplan/GrossL86
fatcat:zleptouad5cafksmztyav4zhxq
Accelerating cardiac cine MRI using a deep learning-based ESPIRiT reconstruction
[article]
2020
arXiv
pre-print
DL-ESPIRiT is compared against a state-of-the-art parallel imaging and compressed sensing method known as l_1-ESPIRiT. ...
Mid-systolic frames, y-t profiles, and segmentations of the left and right ventricles are shown here. ...
, and 3) the unrolled algorithm is trained end-to-end in a supervised fashion. ...
arXiv:1911.05845v3
fatcat:n665uj2abbbstmgsshrf4rag7m
Tiling and optimizing time-iterated computations on periodic domains
2014
Proceedings of the 23rd international conference on Parallel architectures and compilation - PACT '14
Experimental results on the swim SPEC CPU2000fp benchmark show a speedup of 5× and 4.2× over the highest SPEC performance achieved by native compilers on Intel Xeon and AMD Opteron multicore SMP systems ...
Our approach augments a state-of-the-art parallelization and localityenhancing algorithm from the polyhedral framework to allow timetiling of stencil computations on periodic domains. ...
Sadayappan and Tobias Grosser for their important role during the early stages of this work. We are also thankful to the reviewers of PACT 2014 for their detailed comments. ...
doi:10.1145/2628071.2628106
dblp:conf/IEEEpact/BondhugulaBCPV14
fatcat:5csjr5drv5aqvjj6buyxke3bie
High Performance Computing with FPGAs and OpenCL
[article]
2019
arXiv
pre-print
Using High-Level Synthesis and a large set of optimization techniques, we show that FPGAs can achieve better performance than CPUs, and better power efficiency than both CPUs and GPUs for typical HPC workloads ...
With support for high-order stencils, we achieve the highest single-FPGA performance for 2D and 3D stencil computation of any order, to this day. ...
This attribute is specifically useful for streaming designs in form of multi-dimensional systolic array or single-dimensional ring architectures. ...
arXiv:1810.09773v4
fatcat:ziz6dhguxfdntjkorvxopkxp44
Understanding Reuse, Performance, and Hardware Cost of DNN Dataflows: A Data-Centric Approach Using MAESTRO
[article]
2020
arXiv
pre-print
The data partitioning and scheduling strategies used by DNN accelerators to leverage reuse and perform staging are known as dataflow, and they directly impact the performance and energy efficiency of DNN ...
execution time and energy efficiency for a DNN model and hardware configuration. ...
ACKNOWLEDGEMENT We thank Joel Emer for insightful advice and constructive comments to improve this work; Vivienne Sze and Yu-Hsin Chen for their insights and taxonomy that motivated this work. ...
arXiv:1805.02566v6
fatcat:3656k7gkcbfbxewgcebz7v2wrq
Evaluation of the Raw Microprocessor
2004
SIGARCH Computer Architecture News
ASIC designers manage wire delay inherent in large distributed arrays of function units in multiple steps. First, they place close together operations that need to communicate frequently. ...
Management of Wires and Wire Delay: ASIC designers can place and wire communicating operations in ways that minimize wire delay, minimize latency, and maximize bandwidth. ...
We use clapack version 3.0 [2] and a tuned BLAS implementation, AT-LAS [50] , version 3.4.2. ...
doi:10.1145/1028176.1006733
fatcat:rdy5winvrjdlvawhqj6wgiozai
Modeling, analysis and exploration of layers: A 3D computing architecture
2014
2014 22nd International Conference on Very Large Scale Integration (VLSI-SoC)
During this time I have been accompanied and supported by many people. It is my great pleasure to take this opportunity to thank them. ...
Acknowledgements This thesis is the result of my work as research assistant at the Institute for Communication Technologies and Embedded Systems (ICE), Multiprocessor System-on-Chip Architectures (MPSoC ...
The focus of [28] is on emulation of systolic schedule for GR on REDEFINE and hence synthesis of systolic array on REDEFINE. ...
doi:10.1109/vlsi-soc.2014.7004167
dblp:conf/vlsi/Rakossy14
fatcat:6h3w3hfgdjeytitl7wnwrard34
Reconstruction techniques for cardiac cine MRI
2019
Insights into Imaging
Additionally, clinical relevance, main challenges, and future trends of this image modality are outlined. ...
, unrolling the process into several stages, for instance [55, 56] . ...
Coil array compression is crucial to reduce the computational cost of reconstructions [46] [47] [48] . ...
doi:10.1186/s13244-019-0754-2
pmid:31549235
pmcid:PMC6757088
fatcat:s5574wj5pjadhbq5rah7k3h6lu
An Efficient Hardware Design for Accelerating Sparse CNNs with NAS-based Models
2021
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
and memory requirement. ...
Then we design an FPGA accelerator that features a tile lookup table (TLUT) and a channel multiplexer (CMUX). ...
JQ19014) and in part by the Beijing Academy of Artificial Intelligence (BAAI). This work was also supported by Key-Area Research and Development Program of Guangdong Province (No. 2019B010155002) ...
doi:10.1109/tcad.2021.3066563
fatcat:vxqd4ez64zgxxcwuy5uq2txmpy
Parallelization of dynamic programming recurrences in computational biology
2010
Finally, I would like to thank my Father and Mother who along with my brother have been unceasing in their support and encouragement throughout my life. I dedicate this work to my family. ...
All Theses and Dissertations (ETDs). 169. The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. ...
Conclusions We have analyzed and accelerated Nussinov RNA folding by building two systolic array designs. ...
doi:10.7936/k7fq9tnc
fatcat:ua25ktlbuffgxnihr3iyir7cmq
Creating portable and efficient packet processing applications
2011
Design automation for embedded systems
Portability and efficiency are achieved altogether by virtualizing the hardware and by capturing in the programming model the peculiar characteristics of the application domain. ...
Network processors are special-purpose programmable units deployed in many modern high-speed network devices, which combine flexibility and high performance. ...
Acknowledgements The authors wish to thank all the people who were involved in this project, particularly the many students who contributed to the development of the NetVM framework, and all the (former ...
doi:10.1007/s10617-011-9072-8
fatcat:2fnuiaefyba25bxovdyi4zf46q
Modern Computational Techniques for the HMMER Sequence Analysis
2013
ISRN Bioinformatics
The characteristics of the sequence analysis, such as data and compute-intensive natures, make it very attractive to optimize and parallelize by using both traditional software approach and innovated hardware ...
This paper focuses on the latest research and critical reviews on modern computing architectures, software and hardware accelerated algorithms for bioinformatics data analysis with an emphasis on one of ...
., master) for protein folding work, and every time they finish their local work, they contact the server again for extra work. ...
doi:10.1155/2013/252183
pmid:25937944
pmcid:PMC4393056
fatcat:kdo43qa23je4zflfpkvkhoxkfu
End of year report for parallel vision algorithm design and implementation : January 15, 1987-January 14, 1988
2018
version of Warp being designed by Carnegie Mellon and Intel. ...
A cell's local memory is represented by several large arrays; systolic communication is simulated by explicit data movement in and out of these arrays. ...
doi:10.1184/r1/6554729
fatcat:bhcxgjkq65bkllegvgnrqdcmk4
« Previous
Showing results 1 — 15 out of 34 results