14,244 Hits in 7.3 sec

Large-scale Simulations of 3D Groundwater Flow Using Parallel Geometric Multigrid Method

Kengo Nakajima
2013 Procedia Computer Science  
In the present work, the effect of sparse matrix storage formats on the performance of parallel geometric multigrid solvers was evaluated, and a new data structure for the Ellpack-Itpack (ELL) format is  ...  The proposed method is implemented for pGW3D-FVM, a parallel code for 3D groundwater flow simulations using the multigrid method, and the robustness and performance of the code was evaluated on up to 4,096  ...  Acknowledgements This work is supported by Core Research for Evolutional Science and Technology (CREST), the Japan Science and Technology Agency (JST), Japan.  ... 
doi:10.1016/j.procs.2013.05.293 fatcat:5lbwhvqvwfbuzbdc6xwmoj6ogy

Parallel spectroscopic imaging reconstruction with arbitrary trajectories using k-space sparse matrices

Meng Gu, Chunlei Liu, Daniel M. Spielman
2009 Magnetic Resonance in Medicine  
Alternative methods for reconstruction with undersampled Cartesian k-space data are the SMASH and GRAPPA algorithms that do the reconstruction in the k-space domain.  ...  For undersampled k-space data on a Cartesian grid, the sensitivity encoding (SENSE) algorithm (5) can be applied to each spectral point in image domain after performing the Fourier transform to obtain  ...  Grant sponsor: NIH; grant numbers: AG18942, CA 48269, and RR 09784mtbn.  ... 
doi:10.1002/mrm.21838 pmid:19165883 pmcid:PMC2754750 fatcat:ug6grszegjfslkac4ouidu6vvq

Heuristic Adaptability to Input Dynamics for SpMM on GPUs [article]

Guohao Dai, Guyue Huang, Shang Yang, Zhongming Yu, Hengrui Zhang, Yufei Ding, Yuan Xie, Huazhong Yang, Yu Wang
2022 arXiv   pre-print
Orthogonal design principles for such a sparse problem should be extracted to form different algorithms, and further used for performance tuning. (2) Nontrivial implementations in the algorithm space.  ...  Many previous studies exploit GPUs for SpMM acceleration because GPUs provide high bandwidth and parallelism.  ...  Consider the first loop, 𝑀-Loop, of SpMM shown in Figure 2 (a). 𝑀-Loop iterates over the rows of the sparse matrix 𝐴.  ... 
arXiv:2202.08556v1 fatcat:eevj2z76kfexzheqmidsnigae4

Parallel implementation of recursive spectral bisection on the connection machine CM-5 system [chapter]

Zdenek Johan, Kapil K. Mathur, S. Lennart Johnsson, Thomas J.R. Hughes
1995 Parallel Computational Fluid Dynamics 1993  
ACKNOWLEDGMENTS We would like to thank Arthur Raefsky (CENTRIC Engineering Systems), Horst Simon (NASA Ames), John Kennedy, Jacek Myczkowski and Richard Shapiro (Thinking Machines) for their helpful comments  ...  Three dot-product operations for each partition; 2. One matrix-vector product of the form u = L v; and 3.  ...  Elapsed times for di erent parts of the RSB algorithm for a partitioning into 256 subdomains on a 64-node CM-5 system.Table 2. M6 wing. Cost analysis for the computation of the Fiedler vector.  ... 
doi:10.1016/b978-044481999-4/50178-x fatcat:u44r26vgnvh6himuoaffxozyai

Bpop: an efficient program for estimating base population allele frequencies in single and multiple group structured populations

Ismo Stranden, Esa A. Mäntysaari
2020 Agricultural and Food Science  
Thus, the IOP and IM approaches can be recommended for large data sets because of their low memory use and computing time.  ...  The required dense matrix products involving (A22 )-1v were implemented efficiently using sparse submatrices of A-1, where A and A22 are pedigree relationship matrices for all and genotyped animals, respectively  ...  Acknowledgements The authors acknowledge the Irish Cattle Breeding Federation (ICBF) for providing the beef cattle data.  ... 
doi:10.23986/afsci.90955 fatcat:oxz64p6xirhqrcnrpqach6wngy

On the Fine-Grain Decomposition of Multicommodity Transportation Problems

Stavros A. Zenios
1991 SIAM Journal on Optimization  
Implementations on the Connection Machine CM-2 are discussed for both dense and sparse transportation problems. The dense implementation achieves computing rate of 1.6-3 GFLOPS.  ...  Hence, a fine-grain decomposition scheme is developed that is suitable for massively parallel computer architectures of the SIMD (i.e., single instruction stream, multiple data stream) class.  ...  Acknowledgements: Professor Yair Censor deserves honorary co-authorship on this paper for numerous illuminating discussions on row-action algorithms and his constant encouragement.  ... 
doi:10.1137/0801038 fatcat:in4zj3rmj5ci5o3253olljf6ha

Quark propagator on the Connection Machine

T. Lippert, K. Schilling, N. Petkov
1992 Parallel Computing  
Schilling and N. Petkov, Quark propagator on the Connection Machine, Parallel Computing 18 (1992) 1291-1299.  ...  We discuss and compare the structure, implementation and performance of two linear equation solvers, the Jacobi algorithm and the Conjugate Gradient algorithm, on the Connection Machine CM-2.  ...  The 'Wilson fermion' matrix in Eq. 1 is then given ; )U~ (x IZ)6,.,+~,). Fig. 2 . 2 (a) Time per iteration for Jacobi and Conjugate Gradient on the 8 4 lattice.  ... 
doi:10.1016/0167-8191(92)90120-v fatcat:2zndav46vrb4vfwk4ycdsgetem

Parallel GPF solution: A GPU‐CPU‐based vectorization parallelization and sparse technique for NR implementation

Jingbo Zhao, Yan Tao
2021 IET Renewable Power Generation  
Considering the studies carried out in the authors' previous work of the application of GPU and sparse techniques on TS power flow (TS-PF), its applications on parallel implementations of corresponding  ...  However, sequential GPF lacks timely solution performance required in several applications built on top of GPF, for example, contingency analysis.  ...  and multi-type flexible power transmission devices, No.  ... 
doi:10.1049/rpg2.12316 fatcat:nntnlebhhffhzofjd5ydyko4fq

Model Order Reduction of Large-Scale Finite Element Systems in an MPI Parallelized Environment for Usage in Multibody Simulation

Thomas Volzer, Peter Eberhard
2016 Archive of Mechanical Engineering  
Besides, an iterative solver is considered within the CMS-based method.  ...  Due to the always increasing size of the non-reduced systems, the calculation of the projection matrix leads to a large demand of computational resources and cannot be done on usual serial computers with  ...  Except for the CMS based method, only direct methods can currently be used in Morembs++ due to the properties of the coefficient matrix.  ... 
doi:10.1515/meceng-2016-0027 fatcat:voafy32znnbbjle4ivhrklscmq

3D Alternating Direction TV-Based Cone-Beam CT Reconstruction with Efficient GPU Implementation

Ailong Cai, Linyuan Wang, Hanming Zhang, Bin Yan, Lei Li, Xiaoqi Xi, Min Guan, Jianxin Li
2014 Computational and Mathematical Methods in Medicine  
The iteration for this algorithm is simple but convergent. The simulation and real CT data reconstruction results indicate that the proposed algorithm is both fast and accurate.  ...  The applied proximal technique avoids the horrible pseudoinverse computation of big matrix which makes the proposed algorithm applicable and efficient for CBCT imaging.  ...  Introduction Recently, iterative image reconstruction (IIR) algorithms [1] [2] [3] [4] [5] [6] , especially compressive sensing (CS) [7] [8] [9] [10] based ones, have been developed for X-ray computed  ... 
doi:10.1155/2014/982695 pmid:25045400 pmcid:PMC4089849 fatcat:q3aooknkszhnjkmicbjioqwada

Page 6251 of Mathematical Reviews Vol. , Issue 2003h [page]

2003 Mathematical Reviews  
“These algorithms are based on our earlier work on computing row and column counts for sparse Cholesky factorization, plus an efficient method to compute the column elimination tree of a sparse matrix  ...  W. (1-ORNL-CM; Oak Ridge, TN Computing row and column counts for sparse OR and LU factorization. (English summary) BIT 41 (2001), no. 4, 693-710.  ... 

Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards [article]

Ang Li, Radu Serban, Dan Negrut
2015 arXiv   pre-print
We discuss an approach for solving sparse or dense banded linear systems A x = b on a Graphics Processing Unit (GPU) card.  ...  sparse direct solvers: PARDISO, SuperLU, and MUMPS.  ...  Solver comparisons raw data. For completeness, we provide here the raw comparison data for the tested solvers which was used in generating the figures and plots in the paper.  ... 
arXiv:1509.07919v1 fatcat:z6434bxvmrcq3emoxzxvm54gcu

Page 7619 of Mathematical Reviews Vol. , Issue 95m [page]

1995 Mathematical Reviews  
Our computational results on a CM-2 demonstrate the potential supe- riority of the partitioned inverse approach over the conventional substitution algorithm for highly parallel sparse triangular solu-  ...  Summary: “The development of efficient, general-purpose soft- ware for the iterative solution of sparse linear systems on parallel MIMD computers depends on recent results from a wide variety of research  ... 

Noniterative MAP Reconstruction Using Sparse Matrix Representations

Guangzhi Cao, C.A. Bouman, K.J. Webb
2009 IEEE Transactions on Image Processing  
We present a method for noniterative maximum a posteriori (MAP) tomographic reconstruction which is based on the use of sparse matrix representations.  ...  The second idea is a method for efficiently storing and computing the required orthonormal transformations, which we call a sparse-matrix transform (SMT).  ...  Reconstructed images of (r) at z = 3 cm using the compressed H matrix based on KL transform (used both for data whitening and matrix decorrelation).  ... 
doi:10.1109/tip.2009.2023724 pmid:19556196 fatcat:oblllbiwebctdgty65pk3fp5yq

Accelerated Spatially Resolved Electrical Simulation of Photovoltaic Devices Using Photovoltaic-Oriented Nodal Analysis

Xiaofeng Wu, Martin Bliss, Archana Sinha, Thomas R. Betts, Rajesh Gupta, Ralph Gottschalg
2015 IEEE Transactions on Electron Devices  
Sharpe from CREST, Loughborough University, for the fruitful discussions on sparse matrix techniques.  ...  Iterative methods can have lower time complexity and thus higher efficiency than direct ones. Being row independent [28] , they are suitable to combine sparse matrix techniques.  ...  Parallel computing is an appropriate candidate for accelerating a program with intensive matrix computations.  ... 
doi:10.1109/ted.2015.2409058 fatcat:tfjxhgosnbgjbjj3o2xyd65idq
« Previous Showing results 1 — 15 out of 14,244 results