129 Hits in 3.0 sec

Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration [article]

Cong Guo, Yangjie Zhou, Jingwen Leng, Yuhao Zhu, Zidong Du, Quan Chen, Chao Li, Bin Yao, Minyi Guo
2020 arXiv   pre-print
The key to SMA is the temporal integration of the systolic execution model with the GPU-like SIMD execution model.  ...  The SMA exploits the common components shared between the systolic-array accelerator and the GPU, and provides lightweight reconfiguration capability to switch between the two modes in-situ.  ...  This work was supported by National Key R&D Program of China (2019YFF0302600), the National Natural Science Foundation of China (NSFC) grant (61702328, 61832006, 61729202, and U1636210), CCF-Tencent Open  ... 
arXiv:2002.08326v2 fatcat:3asj3sqruncz7czbtkxbookt2u

Fine-Grained Parallel Genomic Sequence Comparison [chapter]

Dominique Lavenier
2010 Parallel and Distributed Computing  
The combination of these two techniques provides better data accesses to the SSE registers and greatly optimizes the SIMD parallelization.  ...  the parallelization of the dynamic programming algorithm on systolic arrays.  ...  How to reference In order to correctly reference this scholarly work, feel free to copy and paste the following: Dominique Lavenier (2010).  ... 
doi:10.5772/9449 fatcat:ns22vyct3rdjre24cs33bjr7ue

Hyper-systolic matrix multiplication

Th. Lippert, N. Petkov, P. Palazzari, K. Schilling
2001 Parallel Computing  
It is based on a 1-D hyper-systolic processor abstraction. The procedure can be implemented on all types of parallel systems. Ó  ...  Projection of regular dependence graphs has evolved as one such technique [4,5,8±17] . As shown elsewhere [18±20], systolic algorithms can easily be transformed into data-parallel programs.  ...  For general de®nitions of the term systolic and semi-systolic we refer to [5] .  ... 
doi:10.1016/s0167-8191(00)00108-3 fatcat:opjtzanmcrapnalz2b43omzavu

An Overview of Hardware-Based Acceleration of Biological Sequence Alignment [chapter]

Laiq Hasan, Zaid Al-Ars
2011 Computational Biology and Applied Bioinformatics  
Methods like the ones based on systolic arrays are used to accelerate such applications.  ...  A CUDA program calls kernels that run on the GPU.  ...  Computational Biology and Applied Bioinformatics Nowadays it is difficult to imagine an area of knowledge that can continue developing without the use of computers and informatics.  ... 
doi:10.5772/23044 fatcat:sy4qnseozja57lngnkruhqddnq

Software Solutions for Converting a MIMO-OFDM Channel into Multiple SISO-OFDM Channels

M Sima, M. Senthilvelan, D. Iancu, J. Glossner, M. Moudgill, M. Schulte
2007 Third IEEE International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob 2007)  
The technique is applicable to emerging wireless communication protocols, such as WiMAX and Wi-Fi, and provides the flexibility required to adapt to continually changing and evolving standards without  ...  ., can be executed efficiently in software using a combination of CORDIC and unitary rotation algorithms in a multithreaded SIMD processor.  ...  This means that the cheaper semi-CORDIC approach is indeed a very good choice. The same improvement figures also apply for the MIMO-to-SISO conversion.  ... 
doi:10.1109/wimob.2007.4390803 fatcat:iugal7werbhdbn5lusb6g75aju

Hyper-Systolic Matrix Multiplication [article]

Thomas Lippert, Nikolay Petkov, Paolo Palazzari, Klaus Schilling
1998 arXiv   pre-print
The hyper-systolic algorithm makes use of a one-dimensional processor abstraction. The procedure can be implemented on all types of parallel systems.  ...  Semi-hyper-systolic algorithm We next consider the semi-hyper-systolic variant which corresponds to the semi-systolic algorithm just described. The initial distribution of data is shown in Fig. 5 .  ...  This success makes us confident that hyper-systolic processing can be applied to a variety of numer-ical problems which lead to n 2 computation events.  ... 
arXiv:cs/9809105v1 fatcat:b7gpgfhcoze2jnxp3wqbhbcstm

An embedded DRAM architecture for large-scale spatial-lattice computations

Norman Margolus
2000 Proceedings of the 27th annual international symposium on Computer architecture - ISCA '00  
Using embedded DRAM and a new technique for organizing SIMD memory and communications we can efficiently utilize 1Tbit/sec of sustained memory bandwidth in each chip in an indefinitely scalable array of  ...  This allows a 10,000-fold speedup per memory chip for these algorithms compared to the CAM-8 lattice gas computer, and is about one million times faster per memory chip for these calculations than a CM  ...  Long and complicated SIMD programs could be run on each image in realtime.  ... 
doi:10.1145/339647.339672 fatcat:3zr6nq2dzrad3dflqqqhnm7wxy

Computer vision algorithms on reconfigurable logic arrays

N.K. Ratha, A.K. Jain
1999 IEEE Transactions on Parallel and Distributed Systems  
Ratha Computer vision algorithms are natural candidates for high performance computing due to their inherent parallelism and intense computational demands. For  ...  The PEs can be programmed for systolic, SIMD, MIMD and pipelined mode. PE to PE communication patterns can also be programmed. Cost performance ratio is signi cantly low.  ...  System Design Implementation Techniques Fully Custom Semi-Custom Gate Array User-Programmable PLDs FPGAs Standard Cells General-Purpose This style of programming a FPGA makes it suitable  ... 
doi:10.1109/71.744833 fatcat:htpcqypklnghvfdedyl7dneyhu

Echocardiographic Assessment of Left Ventricular Systolic and Diastolic Functions in Dogs with Severe Sepsis and Septic Shock; Longitudinal Study

Mehmet Ege Ince, Kursad Turgut, Amir Naseri
2021 Animals  
The purpose of this study was to monitor left ventricular systolic dysfunction (LVSD) and diastolic dysfunction (LVDD) using transthoracic echocardiography (TTE) in dogs with severe sepsis and septic shock  ...  The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.  ...  In each dog, left ventricular end-diastolic and end-systolic dimensions were measured by M-mode image(s) using a leading edge-toleading edge technique.  ... 
doi:10.3390/ani11072011 fatcat:t6dq2or33fdljp3f56gu5o5cwu

Energy efficiency of sequence alignment tools—Software and hardware perspectives

Michał Kierzynka, Lars Kosmann, Micha vor dem Berge, Stefan Krupop, Jens Hagemeyer, René Griessl, Meysam Peykanu, Ariel Oleksiak
2017 Future generations computer systems  
The alignment algorithms have been widely investigated over the last few years, mainly with respect to their speed.  ...  However, no attention was given to their energy efficiency, which is becoming critical in high performance computing and cloud environment.  ...  Dynamic programming algorithms There are three basic sequence alignment methods based on the dynamic programming, namely: the Needleman-Wunsch algorithm (NW) [14] for global alignment, its semi-global  ... 
doi:10.1016/j.future.2016.05.006 fatcat:3kejmtmjwbfvlcmlk7fclqwl7q

HPMVS: A High Performance Visualization Tool Suite that Assists in Kidney Assessment

Timothy S. Newman, Ning Tang
2000 Journal of Computing and Information Technology  
The configuration of HPMVS can provide near real-time visualization by allowing highly intensive computations to be computed on a supercomputer and less intensive computations and final display to be realized  ...  The suite of tools is designed to aid medical staff in the assessment of renal disorders such as those caused by the von Hippel Lindau (VHL) Syndrome.  ...  We are also thankful for fruitful discussions related to this work with Drs. Peter Choyke and Stephen Bacharach of the U.S. National Institutes of Health and for the helpful comments of the reviewers.  ... 
doi:10.2498/cit.2000.02.05 fatcat:qvpyytm63ncthlf2matuvmnpgi

Proceedings of the ASP-DAC 2003. Asia and South Pacific Design Automation Conference 2003 (Cat. No.03EX627)

2003 Conference of Asia and South Pacific Design Automation 2003  
Session 7D Embedded Systems: Hardware/ Software Design Methodology and Optimization Co-chairs: Naehyuck Chang, Hiroaki Takada Capturing and Analyzing Requirement-In Case of Software and Applying to Hard-Akira  ...  Co-chairs: Masahiro Fujita, Rajesh Gupta 3A-1 Combining Architecture Exploration and a Path to Implementation to Build a Complete SoC Design Flow from System Specification to RTL Tudor Dumitras, Sam Kernel  ... 
doi:10.1109/aspdac.2003.1194983 fatcat:obdbe4dwivgsfpbeuvb7s73fpe

AnySeq: A High Performance Sequence Alignment Library based on Partial Evaluation [article]

André Müller , Richard Membarth , Sebastian Hack Saarland University)
2020 arXiv   pre-print
~plain score) by simple function composition rather than metaprogramming techniques which are often hard to understand.  ...  Our implementation supports multithreading and SIMD vectorization on CPUs, CUDA-enabled GPUs, and FPGAs.  ...  These compute an optimal local, global, or semi-global alignment of two sequences under a given scoring scheme by means of dynamic programming (DP).  ... 
arXiv:2002.04561v1 fatcat:lbu6ikpes5hgrnprknn2fwlldq

A High Performance Reconfigurable Core for Motif Searching Using Profile HMM

Khaled Benkrid, Panagiotis Velentzas, Server Kasap
2008 2008 NASA/ESA Conference on Adaptive Hardware and Systems  
This work describes the acceleration of the Viterbi decoding process by means of parallelizing the algorithm and mapping it to a systolic array.  ...  A common task in bioinformatics is the comparison of biological sequences to probabilistic models in order to evaluate their similarity.  ...  The profile HMM structure was modified slightly in order to successfully map the dynamic programming matrix to a systolic array of processing elements.  ... 
doi:10.1109/ahs.2008.16 dblp:conf/ahs/BenkridVK08 fatcat:eejw5vvgpje3vfym7sk7kvybry

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights [article]

Shail Dave, Riyadh Baghdadi, Tony Nowatzki, Sasikanth Avancha, Aviral Shrivastava, Baoxin Li
2021 arXiv   pre-print
In particular, it discusses enhancement modules in the architecture design and the software support; categorizes different hardware designs and acceleration techniques and analyzes them in terms of hardware  ...  encoding, storing, extracting, communicating, computing, and load-balancing the non-zeros; understanding how structured sparsity can improve storage efficiency and balance computations; understanding how to  ...  Applying Techniques for Sparsity to Other Domains In this work, we considered a wide variety of techniques that leverage sparsity for the machine learning domain, which represents an enormous research  ... 
arXiv:2007.00864v2 fatcat:k4o2xboh4vbudadfiriiwjp7uu
« Previous Showing results 1 — 15 out of 129 results