749 Hits in 11.2 sec

An FPGA implementation of a sparse quadratic programming solver for constrained predictive control

Juan Luis Jerez, George Anthony Constantinides, Eric C. Kerrigan
2011 Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '11  
We present a high-throughput floating-point FPGA implementation that exploits the parallelism inherent in interior-point optimization methods.  ...  The application of MPC to faster systems, which adds the requirement of greater sampling frequencies, relies on new ways of finding faster solutions to QP problems.  ...  ACKNOWLEDGMENTS The authors would like to acknowledge the support of the EPSRC (Grant EP/G031576/1), discussions with Prof. Jan Maciejowski, Prof. Ling Keck Voon, Mr. David Boland and Mr.  ... 
doi:10.1145/1950413.1950454 dblp:conf/fpga/JerezCK11 fatcat:rwl7ifuocjhhvldvi3norsns6e

NIC-based reduction algorithms for large-scale clusters

Fabrizio Petrini, Adam Moody, Juan Fernandez, Eitan Frachtenberg, Dhabaleswar K. Panda
2006 International Journal of High Performance Computing and Networking  
In particular, at large-scale-1812 processes-NIC-based reductions of small integer and floating-point arrays provided respective speedups of 121% and 39% over the host-based, production-level MPI implementation  ...  in the design of reduction algorithms.  ...  This work was partially supported by the U.S. Department of Energy through Los Alamos National Laboratory contract W-7405-ENG-36 and by the Spanish MCYT under grant TIC2003-08154-C06-03.  ... 
doi:10.1504/ijhpcn.2006.010635 fatcat:ncdw3h4rv5ak3jrpfkmumomlra

Model predictive control for deeply pipelined field-programmable gate array implementation: algorithms and circuitry

J.L. Jerez, E.C. Kerrigan, G.A. Constantinides, K.-V. Ling
2012 IET Control Theory & Applications  
The focus is on exploiting the structure and accelerating the computational bottleneck in an existing primal-dual interior-point method.  ...  We then introduce a new MPC formulation that can take advantage of the novel computational opportunities, in the form of parallel computational channels, offered by the proposed pipelined architecture  ...  ACKNOWLEDGMENT The authors would like to acknowledge the support of the EPSRC (Grant EP/G031576/1), discussions with Prof. Jan Maciejowski, Mr. David Boland and Mr.  ... 
doi:10.1049/iet-cta.2010.0441 fatcat:wpf4rkvaajg25dhmrswzznazfu

Reconfigurable decoder architectures for Raptor codes

Hady Zeineddine, Mohammad M. Mansour
2011 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
A range of partially-parallel decoders with desired throughput can be designed by replicating the processing nodes of a serial decoder. 978-1-4577-0539-7/11/$26.00  ...  This is achieved by 1) developing reconfigurable check node processors that attain a constant throughput while processing LT-and LDPC-nodes of varying degrees and numbers, 2) applying pseudo-random permutation  ...  For b ≥ 6, the bit-accurate decoding results compare to that of the floating point case.  ... 
doi:10.1109/icassp.2011.5946820 dblp:conf/icassp/ZeineddineM11 fatcat:ssmrhcwqbbbrbmeb74sjryqg2q

Constructive Interference in Parallel Algorithms

Craig C. Douglas, Willard L. Miranker
1988 SIAM Journal on Numerical Analysis  
We develop cases and conditions wherein this dependence generates a constructive interference in the computation. The resulting parallel algorithms can then be more efficient than serial counterparts.  ...  Parallel algorithms are developed in the setting of iterative multilevel methods. The constituent parts of the algorithms are dependent rather than independent as in conventional parallel algorithms.  ...  The point is that for appropriate problems this interference is constructive resulting in efficiencies capable of exceeding those of the serial counterparts.  ... 
doi:10.1137/0725026 fatcat:iuk37zzimbanphid2u4qyrj4pa

Floating-Point Verification Using Theorem Proving [chapter]

John Harrison
2006 Lecture Notes in Computer Science  
This chapter describes our work on formal verification of floating-point algorithms using the HOL Light theorem prover. let th6 = REAL_ARITH 'abs(c -a) < e ∧ abs(b) <= d =⇒ abs((a + b) -c) < d + e';;  ...  The allowable floating point numbers are then of the form ±2 e−N k with k < 2 p and 0 ≤ e < E.  ...  Now we have: 1 b = 2 −e 1 m b = 2 −(e+2p1) ( 2 2p1 m b ) and ulp( 1 b ) = 2 −(e+2p1) .  ... 
doi:10.1007/11757283_8 fatcat:kz7ckh7iyrglbm4yw3mv4zzd3a

Review of Basic Classes of Dividers Based on Division Algorithm

Udayan S. Patankar, Ants Koel
2021 IEEE Access  
ACKNOWLEDGEMENT A preliminary patent is applied in Estonia based on the research work of developing a new algorithm for division. Application no-70390 date-June 2020.  ...  indicating the use of floating-point/fixed-point divider for integer division cause wasting of resources.  ...  If any corrective stage is required in sequential dividers, it will degrade the efficiency of serial dividers.  ... 
doi:10.1109/access.2021.3055735 fatcat:flnsfd2szvgavhkcop7nozrff4

A Parallel Algorithm for a Physiological Non-linear Model of the Cochlea

Doron Sabo, Shlomo Weiss, Miriam Furst
2013 Procedia Computer Science  
These two characteristics of the serial solution limit parallelism and prevent efficient computations on a massively parallel processor.  ...  The previously known serial solution of the cochlear model is recursive in the longitudinal dimension and iterative in the time dimension.  ...  The Lipschitz constant defines for every two points on the graph of a continuous function the upper limit on the absolute value of the slope of the line that connects these two points [11] , [12] .  ... 
doi:10.1016/j.procs.2013.05.232 fatcat:l2erjuezmfavfhcrlecbrsu4pq

Parallel Implementations of the Split-Step Fourier Method for Solving Nonlinear Schrödinger Systems [article]

S.M. Zoldi (Department of Physics and Center for Nonlinear and Complex Systems, Duke University, Durham, NC), V. Ruban and A. Zenchuk (L.D. Landau Institute for Theoretical Physics), and S. Burtsev (Theoretical Division and Center for Nonlinear Studies, Los Alamos National Laboratory)
1997 arXiv   pre-print
The parallel algorithm is applicable to other computational problems constrained by the speed of the 1D FFT.  ...  The 1D Fast-Fourier Transform (FFT) is parallelized by writing the 1D FFT as a 2D matrix and performing independent 1D sequential FFTs on the rows and columns of this matrix.  ...  If N = 2 K we can simplify the above expression, SU = 2P (1 + ξ/K + f 2 K /(P K)) , (6) where the constants are absorbed into f and ξ.  ... 
arXiv:physics/9711012v1 fatcat:ihay4ctr2fc6tgjxtgdjdkteya

FPGA Based Serial and Single-Clock Cycle Pipelined Fast Fourier Transforms in a Radio Detection of Cosmic Rays [chapter]

Zbigniew Szadkowski
2013 Design and Architectures for Digital Signal Processing  
Acknowledgements This chapter has been supported by the National Center of Researches and Development  ...  structure of the 16-point FFT algorithm.  ...  Streaming architecture The Radix-4 decomposition, which divides the input sequence recursively to form four-point sequences, has the advantage that it requires only trivial multiplications in the 4-point  ... 
doi:10.5772/52946 fatcat:xfmtsbab7fct3mcropoapltx6m

SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication

Shaden Smith, Niranjay Ravindran, Nicholas D. Sidiropoulos, George Karypis
2015 2015 IEEE International Parallel and Distributed Processing Symposium  
There is a need for efficient, high-performance tools capable of processing the massive sparse tensors of today and the future.  ...  SPLATT uses a novel data structure that exploits the sparsity patterns of tensors.  ...  Group, and the Digital Technology Center at the University of Minnesota.  ... 
doi:10.1109/ipdps.2015.27 dblp:conf/ipps/SmithRSK15 fatcat:e5veh43nnfcvlbzhfv4h6j5ubu

Highly Parallel Sparse Cholesky Factorization

John R. Gilbert, Robert Schreiber
1992 SIAM Journal on Scientific and Statistical Computing  
It is rumored to be available in a forthcoming release of the CM Fortran library. Second, more efficient use of the low-level floating-point architecture of the CM-2 is possible.  ...  is slow relative to arithmetic; the divide and multiply operations occur on very sparse VP sets; and the VP ratio remains constant as the active part of the matrix gets smaller.More efficient use of virtual  ...  )) The two functions move-to-factor-grid and update-from-factor-grid do pretty much what their names say they do. Here is move-to-factor-grid:  ... 
doi:10.1137/0913067 fatcat:kbmkvxdlyvevbeshguupvca6gm

Acceleration of GPU-Based Ultrasound Simulation via Data Compression

Andrew A. Haigh, Eric C. McCreath
2014 2014 IEEE International Parallel & Distributed Processing Symposium Workshops  
Graphics Processing Units (GPUs) have attracted attention for performing scientific calculations due to their potential for efficiently performing large numbers of floating point computations.  ...  The large size of the grid and low degree of reuse of data means that it places a great demand on memory bandwidth.  ...  ACKNOWLEDGEMENT The authors would like to thank Anish Varghese for his helpful comments and suggestions.  ... 
doi:10.1109/ipdpsw.2014.140 dblp:conf/ipps/HaighM14 fatcat:bfx7orjy5baa7g7h5nygsobmma

Mapping the FDTD Application to Many-Core Chip Architectures

Daniel A. Orozco, Guang R. Gao
2009 2009 International Conference on Parallel Processing  
This paper reports a study of mapping the Finite Difference Time Domain (FDTD) application to the IBM Cyclops-64 (C64) many-core chip architecture [1].  ...  Major results of our study include: 1. A good mapping of FDTD can effectively exploit the on-chip parallelism of C64-like architectures and show good performance and scalability. 2.  ...  We acknowledge the nice survey on the recent work on split tiling by Huimin Cui, it has been a good reference for our initial stage of study on this subject.  ... 
doi:10.1109/icpp.2009.44 dblp:conf/icpp/OrozcoG09 fatcat:lwx5htygajadpb7nadsfvnvlbm

The pochoir stencil compiler

Yuan Tang, Rezaul Alam Chowdhury, Bradley C. Kuszmaul, Chi-Keung Luk, Charles E. Leiserson
2011 Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures - SPAA '11  
The array u keeps two copies of an X ×Y array of grid points, one for time t and one for time t + 1.  ...  A stencil computation repeatedly updates each point of a ddimensional grid as a function of itself and its near neighbors.  ...  ACKNOWLEDGMENTS Thanks to Matteo Frigo of Axis Semiconductor and Volker Strumpen of the University of Linz, Austria, for providing us with their code for trapezoidal decomposition of the 2D heat equation  ... 
doi:10.1145/1989493.1989508 dblp:conf/spaa/TangCKLL11 fatcat:ly2k5ojxfvdxbg2azjkla44ykm
« Previous Showing results 1 — 15 out of 749 results