An FPGA implementation of a sparse quadratic programming solver for constrained predictive control

2011
*
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '11
*

We present a high-throughput

doi:10.1145/1950413.1950454
dblp:conf/fpga/JerezCK11
fatcat:rwl7ifuocjhhvldvi3norsns6e
*floating*-*point*FPGA implementation that exploits*the*parallelism inherent in interior-*point*optimization methods. ...*The*application*of*MPC to faster systems, which adds*the*requirement*of*greater sampling frequencies, relies on new ways*of*finding faster solutions to QP problems. ... ACKNOWLEDGMENTS*The*authors would like to acknowledge*the*support*of**the*EPSRC (Grant EP/G031576/*1*), discussions with Prof. Jan Maciejowski, Prof. Ling Keck Voon, Mr. David Boland and Mr. ...##
###
NIC-based reduction algorithms for large-scale clusters

2006
*
International Journal of High Performance Computing and Networking
*

In particular, at large-scale-1812 processes-NIC-based reductions

doi:10.1504/ijhpcn.2006.010635
fatcat:ncdw3h4rv5ak3jrpfkmumomlra
*of*small integer and*floating*-*point*arrays provided respective speedups*of*121% and 39% over*the*host-based, production-level MPI implementation ... in*the*design*of*reduction algorithms. ... This work was partially supported by*the*U.S. Department*of*Energy through Los Alamos National Laboratory contract W-7405-ENG-36 and by*the*Spanish MCYT under grant TIC2003-08154-C06-03. ...##
###
Model predictive control for deeply pipelined field-programmable gate array implementation: algorithms and circuitry

2012
*
IET Control Theory & Applications
*

*The*focus is on exploiting

*the*

*structure*and accelerating

*the*computational bottleneck in an existing primal-dual interior-

*point*method. ... We then introduce a new MPC formulation that can take advantage

*of*

*the*novel computational opportunities, in

*the*

*form*

*of*parallel computational channels, offered by

*the*proposed pipelined architecture ... ACKNOWLEDGMENT

*The*authors would like to acknowledge

*the*support

*of*

*the*EPSRC (Grant EP/G031576/

*1*), discussions with Prof. Jan Maciejowski, Mr. David Boland and Mr. ...

##
###
Reconfigurable decoder architectures for Raptor codes

2011
*
2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
*

A range

doi:10.1109/icassp.2011.5946820
dblp:conf/icassp/ZeineddineM11
fatcat:ssmrhcwqbbbrbmeb74sjryqg2q
*of*partially-parallel decoders with desired throughput can be designed by replicating*the*processing nodes*of*a*serial*decoder. 978-*1*-4577-0539-7/11/$26.00 ... This is achieved by*1*) developing reconfigurable check node processors that attain a*constant*throughput while processing LT-and LDPC-nodes*of*varying degrees and numbers, 2) applying pseudo-random permutation ... For b ≥ 6,*the*bit-accurate decoding results compare to that*of**the**floating**point*case. ...##
###
Constructive Interference in Parallel Algorithms

1988
*
SIAM Journal on Numerical Analysis
*

We develop cases and conditions wherein this dependence generates a constructive interference in

doi:10.1137/0725026
fatcat:iuk37zzimbanphid2u4qyrj4pa
*the*computation.*The*resulting parallel algorithms can then be more*efficient*than*serial*counterparts. ... Parallel algorithms are developed in*the*setting*of*iterative multilevel methods.*The*constituent parts*of**the*algorithms are dependent rather than independent as in conventional parallel algorithms. ...*The**point*is that for appropriate problems this interference is constructive resulting in*efficiencies*capable*of*exceeding those*of**the**serial*counterparts. ...##
###
Floating-Point Verification Using Theorem Proving
[chapter]

2006
*
Lecture Notes in Computer Science
*

This chapter describes our work on formal verification

doi:10.1007/11757283_8
fatcat:kz7ckh7iyrglbm4yw3mv4zzd3a
*of**floating*-*point*algorithms using*the*HOL Light theorem prover. let th6 = REAL_ARITH 'abs(c -a) < e ∧ abs(b) <= d =⇒ abs((a + b) -c) < d + e';; ...*The*allowable*floating**point*numbers are then*of**the**form*±2 e−N k with k < 2 p and 0 ≤ e < E. ... Now we have:*1*b = 2 −e*1*m b = 2 −(e+*2p*−*1*) ( 2*2p*−*1*m b ) and ulp(*1*b ) = 2 −(e+*2p*−*1*) . ...##
###
Review of Basic Classes of Dividers Based on Division Algorithm

2021
*
IEEE Access
*

ACKNOWLEDGEMENT A preliminary patent is applied in Estonia based on

doi:10.1109/access.2021.3055735
fatcat:flnsfd2szvgavhkcop7nozrff4
*the*research work*of*developing a new algorithm for division. Application no-70390 date-June 2020. ... indicating*the*use*of**floating*-*point*/fixed-*point**divider*for integer division cause wasting*of*resources. ... If any corrective stage is required in sequential*dividers*, it will degrade*the**efficiency**of**serial**dividers*. ...##
###
A Parallel Algorithm for a Physiological Non-linear Model of the Cochlea

2013
*
Procedia Computer Science
*

These two characteristics

doi:10.1016/j.procs.2013.05.232
fatcat:l2erjuezmfavfhcrlecbrsu4pq
*of**the**serial*solution limit parallelism and prevent*efficient*computations on a massively parallel processor. ...*The*previously known*serial*solution*of**the*cochlear model is recursive in*the*longitudinal dimension and iterative in*the*time dimension. ...*The*Lipschitz*constant*defines for every two*points*on*the*graph*of*a continuous function*the*upper limit on*the*absolute value*of**the*slope*of**the*line that connects these two*points*[11] , [12] . ...##
###
Parallel Implementations of the Split-Step Fourier Method for Solving Nonlinear Schrödinger Systems
[article]

1997
*
arXiv
*
pre-print

*The*parallel algorithm is applicable to other computational problems constrained by

*the*speed

*of*

*the*1D FFT. ...

*The*1D Fast-Fourier Transform (FFT) is parallelized by writing

*the*1D FFT as a 2D matrix and performing independent 1D sequential FFTs on

*the*rows and columns

*of*this matrix. ... If N = 2 K we can simplify

*the*above expression, SU =

*2P*(

*1*+ ξ/K + f 2 K /(P K)) , (6) where

*the*

*constants*are absorbed into f and ξ. ...

##
###
FPGA Based Serial and Single-Clock Cycle Pipelined Fast Fourier Transforms in a Radio Detection of Cosmic Rays
[chapter]

2013
*
Design and Architectures for Digital Signal Processing
*

Acknowledgements This chapter has been supported by

doi:10.5772/52946
fatcat:xfmtsbab7fct3mcropoapltx6m
*the*National Center*of*Researches and Development ...*structure**of**the*16-*point*FFT algorithm. ... Streaming architecture*The*Radix-4 decomposition, which*divides**the*input sequence recursively to*form*four-*point*sequences, has*the*advantage that it requires only trivial multiplications in*the*4-*point*...##
###
SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication

2015
*
2015 IEEE International Parallel and Distributed Processing Symposium
*

There is a need for

doi:10.1109/ipdps.2015.27
dblp:conf/ipps/SmithRSK15
fatcat:e5veh43nnfcvlbzhfv4h6j5ubu
*efficient*, high-performance tools capable*of*processing*the*massive sparse tensors*of*today and*the*future. ... SPLATT uses a novel data*structure*that exploits*the*sparsity patterns*of*tensors. ... Group, and*the*Digital Technology Center at*the*University*of*Minnesota. ...##
###
Highly Parallel Sparse Cholesky Factorization

1992
*
SIAM Journal on Scientific and Statistical Computing
*

It is rumored to be available in a forthcoming release

doi:10.1137/0913067
fatcat:kbmkvxdlyvevbeshguupvca6gm
*of**the*CM Fortran library. Second, more*efficient*use*of**the*low-level*floating*-*point*architecture*of**the*CM-2 is possible. ... is slow relative to arithmetic;*the**divide*and multiply operations occur on very sparse VP sets; and*the*VP ratio remains*constant*as*the*active part*of**the*matrix gets smaller.More*efficient*use*of*virtual ... ))*The*two functions move-to-factor-grid and update-from-factor-grid do pretty much what their names say they do. Here is move-to-factor-grid: ...##
###
Acceleration of GPU-Based Ultrasound Simulation via Data Compression

2014
*
2014 IEEE International Parallel & Distributed Processing Symposium Workshops
*

Graphics Processing Units (GPUs) have attracted attention for performing scientific calculations due to their potential for

doi:10.1109/ipdpsw.2014.140
dblp:conf/ipps/HaighM14
fatcat:bfx7orjy5baa7g7h5nygsobmma
*efficiently*performing large numbers*of**floating**point*computations. ...*The*large size*of**the*grid and low degree*of*reuse*of*data means that it places a great demand on memory bandwidth. ... ACKNOWLEDGEMENT*The*authors would like to thank Anish Varghese for his helpful comments and suggestions. ...##
###
Mapping the FDTD Application to Many-Core Chip Architectures

2009
*
2009 International Conference on Parallel Processing
*

This paper reports a study

doi:10.1109/icpp.2009.44
dblp:conf/icpp/OrozcoG09
fatcat:lwx5htygajadpb7nadsfvnvlbm
*of*mapping*the*Finite Difference Time Domain (FDTD) application to*the*IBM Cyclops-64 (C64) many-core chip architecture [*1*]. ... Major results*of*our study include:*1*. A good mapping*of*FDTD can effectively exploit*the*on-chip parallelism*of*C64-like architectures and show good performance and scalability. 2. ... We acknowledge*the*nice survey on*the*recent work on split tiling by Huimin Cui, it has been a good reference for our initial stage*of*study on this subject. ...##
###
The pochoir stencil compiler

2011
*
Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures - SPAA '11
*

*The*array u keeps two copies

*of*an X ×Y array

*of*grid

*points*, one for time t and one for time t +

*1*. ... A stencil computation repeatedly updates each

*point*

*of*a ddimensional grid as a function

*of*itself and its near neighbors. ... ACKNOWLEDGMENTS Thanks to Matteo Frigo

*of*Axis Semiconductor and Volker Strumpen

*of*

*the*University

*of*Linz, Austria, for providing us with their code for trapezoidal decomposition

*of*

*the*2D heat equation ...

