Filters








86,741 Hits in 5.9 sec

Fast Parallel Algorithms for Matrix Reduction to Normal Forms

Gilles Villard
1997 Applicable Algebra in Engineering, Communication and Computing  
We investigate fast parallel algorithms to compute normal forms of matrices and the corresponding transformations.  ...  As a corollary we establish a polynomial-time sequential algorithm to compute transformations for the Smith form over K [x].  ...  Acknowledgements: Thanks go to referee 1 for text improvements and grammar corrections.  ... 
doi:10.1007/s002000050089 fatcat:nq52uowzobh3dasfbkwfrvs76y

Fast combinatorial optimization with parallel digital computers

Y. Okabe, H. Kakeya
2000 IEEE Transactions on Neural Networks  
This paper presents an algorithm which realizes fast search for the solutions of combinatorial optimization problems with parallel digital computers.  ...  By removing the components of the eingenvectors with eminent negative eigenvalues of the weight matrix, the proposed algorithm avoids oscillation and realizes energy reduction under synchronous discrete  ...  Then fast parallel and synchronous search algorithm for each problem is presented in the end of each section. III. FAST ALGORITHM FOR PARTITION PROBLEM A.  ... 
doi:10.1109/72.883436 pmid:18249857 fatcat:yp6n25knjveefegaabgq7apqrq

Page 6997 of Mathematical Reviews Vol. , Issue 89M [page]

1989 Mathematical Reviews  
They also discuss various algorithms used in numerical linear algebra for the solution of problems related to the problem of reduction to a special form and determine the connection of conditions for the  ...  The authors discuss various methods for the reduction of matrices which depend on varying parameters to a special form.  ... 

Parallel Tridiagonal Equation Solvers

Harold S. Stone
1975 ACM Transactions on Mathematical Software  
For pipeline computers similar to CDC STAR, cyclic odd-even reduction appears to be the most preferable algorithm for all cases.  ...  This paper compares three parallel algorithms for the direct solution of tridiagonal linear systems of equations. The algorithms are suitable for computers such as ILLIAC IV and CDC STAR.  ...  Acknowledgement The author is deeply indebted to Prof. Gene Golub of Stanford University for many discussions, comments, and suggestions regarding this paper.  ... 
doi:10.1145/355656.355657 fatcat:dcegxoxuajhllhqsig6yz75ota

Fast Parallel Computation of Hermite and Smith Forms of Polynomial Matrices

Erich Kaltofen, M. S. Krishnamoorthy, B. David Saunders
1987 SIAM Journal on Algebraic and Discrete Methods  
The Smith normal form algorithms are applied to the Rational canonical form of matrices overfinite fields and the field of rational numbers.  ...  Furthermore, we give a polynomial-time deterministic sequential algorithm for the Smith normal form overthe rationals.  ...  Unlikeour parallel Hermite normal form algorithm our parallel solution for the Smith normal form also provides a practical algorithm superior to previously known methods. We wish to add tworemarks.  ... 
doi:10.1137/0608057 fatcat:wi5nzwifljfifpxvshky2cd6pm

Computing Popov and Hermite forms of polynomial matrices

G. Villard
1996 Proceedings of the 1996 international symposium on Symbolic and algebraic computation - ISSAC '96  
For a polynomial matrix P(z) of degree d in M~,~(K[z]) where K is a commutative field, a reduction to the Hermite normal form can be computed in O (ndM(n) + M(nd)) arithmetic operations if M(n) is the  ...  Further, a reduction can be computed using O(log~+' (ml)) pamlel arithmetic steps and O(L(nd) ) processors if the same processor bound holds with time O (logX (rid)) for determining the lexicographically  ...  Proposition 5 There exists a Las Vegas type probabilistic algorithm to compute a reduction to the Hermite normal form of a non singular matrix in Mm,.  ... 
doi:10.1145/236869.237082 dblp:conf/issac/Villard96 fatcat:d4syt5g6hbgtpbvnfkhj54d5yq

PL-NMF: Parallel Locality-Optimized Non-negative Matrix Factorization [article]

Gordon E. Moon, Aravind Sukumaran-Rajam, Srinivasan Parthasarathy and P. Sadayappan
2019 arXiv   pre-print
Non-negative Matrix Factorization (NMF) is a key kernel for unsupervised dimension reduction used in a wide range of applications, including topic modeling, recommender systems and bioinformatics.  ...  In this paper, we devise a parallel NMF algorithm based on the HALS (Hierarchical Alternating Least Squares) scheme that incorporates algorithmic transformations to enhance data locality.  ...  Algorithm 4 shows the pseudo-code for phase 2. In GPUs, the reduction across V (for normalization of W ) can be performed using global memory atomic operations which are very expensive.  ... 
arXiv:1904.07935v1 fatcat:e6brehvipngrhflwvttp7ehyde

GPU Tensor Cores for fast Arithmetic Reductions [article]

Cristóbal A. Navarro, Roberto Carrasco, Ricardo J. Barrientos, Javier A. Riquelme, Raimundo Vega
2020 arXiv   pre-print
The asymptotic running time of the proposed chained tensor core approach is T(n)=5 log_m^2n and its speedup is S=45 log_2m^2 over the classic O(n log n) parallel reduction algorithm.  ...  This work proposes a GPU tensor core approach that encodes the arithmetic reduction of n numbers as a set of chained m × m matrix multiply accumulate (MMA) operations executed in parallel by GPU tensor  ...  The following sub-section presents the new algorithm for parallel arithmetic reductions using tensor cores operations and analyzes its cost as well as speedup over the parallel reduction algorithm described  ... 
arXiv:2001.05585v1 fatcat:i5nx23gz7ncpzi4bsqlfgodbqi

Heuristic Adaptability to Input Dynamics for SpMM on GPUs [article]

Guohao Dai, Guyue Huang, Shang Yang, Zhongming Yu, Hengrui Zhang, Yufei Ding, Yuan Xie, Huazhong Yang, Yu Wang
2022 arXiv   pre-print
Orthogonal design principles for such a sparse problem should be extracted to form different algorithms, and further used for performance tuning. (2) Nontrivial implementations in the algorithm space.  ...  We propose techniques like conditional reduction to implement algorithms missing in previous studies. We further propose DA-SpMM, a Data-Aware heuristic GPU kernel for SpMM.  ...  Figure 1 shows performances of ten algorithms normalized to the best algorithm for each matrix.  ... 
arXiv:2202.08556v1 fatcat:eevj2z76kfexzheqmidsnigae4

Block implementations of the symmetric QR and Jacobi algorithms [chapter]

Peter Arbenz, Michael Oettli
1992 Lecture Notes in Computer Science  
A common approach to solve problems in numerical linear algebra e ciently on modern high speed computers is to redesign the classical algorithm, which was originally developed for serial computers.  ...  In this paper, we discuss block variants of QR and Jacobi algorithms for the computation of the complete spectral decomposition of symmetric matrices.  ...  Ian Du for making computing time on the Alliant FX-80 of the CERFACS in Toulouse available to us and Michel Dayde for the technical support to use this machine.  ... 
doi:10.1007/3-540-55895-0_509 fatcat:jglbkye4prh6hltemhcrdscgmy

A Modified Low Complexity Digit-Level Gaussian Normal Basis Multiplier [chapter]

Reza Azarderakhsh, Arash Reyhani-Masoleh
2010 Lecture Notes in Computer Science  
For T > 2, a complexity reduction algorithm is proposed to reduce the number of XOR gates without increasing the gate delay of the digit-level multiplier.  ...  Gaussian normal bases have been included in a number of standards, such as IEEE [1] and NIST [2] for elliptic curve digital signature algorithm (ECDSA).  ...  Acknowledgment The authors of the paper would like to thank the reviewers for their comments. This work has been supported in part by an NSERC Discovery grant awarded to A. Reyhani-Masoleh.  ... 
doi:10.1007/978-3-642-13797-6_3 fatcat:wkmglpf3b5dkljlxmkkxwndoe4

A Novel Fast Training Method for SVM and Its Application in Fault Diagnosis of Service Robot

Xianfeng Yuan, Mumin Song, Fengyu Zhou, Yugang Wang, Zhumin Chen
2015 International Journal of Online Engineering (iJOE)  
On the other hand, we take advantage of the excellent parallel computing abilities of Graphics Processing Unit (GPU) to pre-calculate the kernel matrix, which avoids the recalculation during the cross  ...  To speed up the training process of SVM, on the one hand, sample reduction is done using the proposed support vectors selection (SVS) algorithm, which can ensure good classification accuracy and generalization  ...  ACKNOWLEDGMENT The authors would like to thank all the Editors and the anonymous reviewers for their valuable comments which are helpful for us to improve this paper.  ... 
doi:10.3991/ijoe.v11i6.4846 fatcat:ntmglsrqonggnozq546luffmli

Restructured recursive DCT and DST algorithms

PeiZong Lee, Fang-Yu Huang
1994 IEEE Transactions on Signal Processing  
Finally, we propose two parallel algorithms for accelerating the computation.  ...  The proposed method is based on certain recursive properties of the DCT coe cient matrix, and can be generalized to design recursive algorithms for the 2-D DCT and the 2-D DST.  ...  Acknowledgements The authors would like to thank Professor H. V. Sorensen and the anonymous referees for their valuable comments on this paper.  ... 
doi:10.1109/78.298269 fatcat:32vkkjxq45a4jlev24t4mcgia4

Graph Based Power Flow Calculation for Energy Management System [article]

Junjie Shi, Guangyi Liu, Renchang Dai, Jingjin Wu, Chen Yuan, Zhiwei Wang
2018 arXiv   pre-print
A linear solver for power flow application is formulated and decomposed in nodal parallelism and hierarchical parallelism to fully utilize graph parallel computing capability.  ...  Case studies on practical large-scale systems provide supporting evidence that the new algorithm is promising for online computing for EMS.  ...  To factorize matrix using Cholesky elimination algorithm, three steps are involved for hierarchical parallel computing: 1) determining fill-ins, 2) forming elimination tree, and 3) partitioning elimination  ... 
arXiv:1811.02512v1 fatcat:h5cu2uwuije2hcrqu545tgg3qa

Bidiagonalization and diagonalization

W.W. Hager
1987 Computers and Mathematics with Applications  
Rodin Al~traet--Techniques to diagonalize and to bidiagonalize a matrix are discussed.  ...  This new algorithm seems to be better suited than the QR algorithm for implementation on computers with a vector processor, with parallel processors, or with parallel-vector processors.  ...  For completeness, we now present a detailed statement of the fast Givens algorithm to bidiagonalize a m x n matrix A where m i> n.  ... 
doi:10.1016/0898-1221(87)90051-4 fatcat:mn3mqdg73vg6ldccrgvqepsbfm
« Previous Showing results 1 — 15 out of 86,741 results