Filters








11,913 Hits in 4.6 sec

Page 3182 of Mathematical Reviews Vol. , Issue 98E [page]

1998 Mathematical Reviews  
The new nested dissection heuristic called Shrink-Split ND (SSND) is based on parallel graph contraction.  ...  P-based nested dissection.  ... 

A NUMA-Aware Fine Grain Parallelization Framework for Multi-core Architecture

Corentin Rossignon, Pascal Henon, Olivier Aumage, Samuel Thibault
2013 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum  
To evaluate the benefit of our work we present some experiments on the fine grain parallelization of an iterative solver for sparse linear system with some comparisons with the Intel TBB approach.  ...  In this paper, we present some solutions to handle two problems commonly encountered when dealing with fine grain parallelization on multi-core architecture: expressing algorithm using a task grain size  ...  Our main motivation is that we want a programming model that adds as few as possible efforts starting from a natural task based parallelization of an algorithm (using TBB for example) to obtain a better  ... 
doi:10.1109/ipdpsw.2013.204 dblp:conf/ipps/RossignonHAT13 fatcat:7fkxf74bwjctnppge73xbkbpba

Teaching Parallel Programming for Time-Efficient Computer Applications

A. Asaduzzaman, R. Asmatulu, M. Rahman
2014 International Journal of Computer Applications  
Based on the Steady State Heat Equation experiment, CUDA/GPU parallel programming may achieve up to 241x speed up factor while simulating heat transfer on a 5000x5000 thin surface.  ...  In this article, a novel approach is proposed to teach parallel computing that will prepare computer application developers for present and future computation challenges.  ...  Therefore, an approach to teach parallel programming is needed that focuses on higher-level programming strategies for computational problems and especially on ease of programmability [13] .  ... 
doi:10.5120/15585-4264 fatcat:527hqvgdhjgzbh6im6cynvio3q

Parallel Algorithms for Graph Optimization Using Tree Decompositions

Blair D. Sullivan, Dinesh Weerapurage, Chris Groer
2013 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum  
For their help with the MADNESS parallel runtime, we thank Robert Harrison, Rebecca Hartman-Baker and Benjamin Mintz.  ...  a grant through the Applied Mathematics Program.  ...  Task-oriented parallel dynamic programming Having described our novel ideas for parallelizing the construction of tree decompositions, we now move on to our approach for applying distributed computing  ... 
doi:10.1109/ipdpsw.2013.242 dblp:conf/ipps/SullivanWG13 fatcat:vvu3wqxhfnhn3awwxsllsrddna

Comparative Analysis of Automatic Parallelization Techniques

Muntha SR, Prasad A, Gogineni K, Nikhil L, Harshavardhan VL
2017 Journal of computer science and systems biology  
They have been recently been exploited for the general purpose of computation by the "CUDA" programming environment on "Nvidia" GPUs and ATI based stream computation method on ("ATI based CPU'S").  ...  It is based on an execution methodology which is based on parallel analysis of two strings. It provides a major boost in the execution time almost twice that of a regular algorithm.  ... 
doi:10.4172/jcsb.1000262 fatcat:q3k5f7x5rnbmrg2uqv5774qbyy

The impact of high-performance computing in the solution of linear systems: trends and problems

Iain S. Duff
2000 Journal of Computational and Applied Mathematics  
We review the in uence of the advent of high-performance computing on the solution of linear equations.  ...  We will concentrate on direct methods of solution and consider both the case when the coe cient matrix is dense and when it is sparse.  ...  Acknowledgements I would like to thank my colleagues Patrick Amestoy, Jacko Koster, Xiaoye Li, John Reid, and Jennifer Scott for some helpful remarks on a draft of this paper.  ... 
doi:10.1016/s0377-0427(00)00401-5 fatcat:v6vcoq2a7ncjtexywdeiwjmjte

Software engineering for multicore systems

Victor Pankratius, Christoph Schaefer, Ali Jannesari, Walter F. Tichy
2008 Proceedings of the 1st international workshop on Multicore software engineering - IWMSE '08  
They were programmed in different languages and benchmarked on several multicore computers.  ...  This paper presents an experience report with four diverse case studies on multicore software development for generalpurpose applications.  ...  Acknowledgements We thank Agilent Technologies Inc. for providing the source code of Masshunter Metabolite ID for study as well as Agilent Technologies Foundation for the financial support.  ... 
doi:10.1145/1370082.1370096 fatcat:vnkgfd7uunaatn7gmnea6d3itu

Patterns and Exemplars: Compelling Strategies for Teaching Parallel and Distributed Computing to CS Undergraduates

Joel Adams, Richard Brown, Elizabeth Shoop
2013 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum  
Parallel programming patterns and exemplar applications naturally complement each other, and together provide a unified and practical strategy for PDC education at multiple course levels.  ...  Learning patterns enables students to quickly gain the intellectual and coding skills they will need to embrace the future of parallel and distributed computing (PDC).  ...  ACKNOWLEDGMENTS We have benefited from the contributions of the parallel patterns community, particularly Tim Mattson of Intel Corporation, who has been a great personal support in our emerging efforts  ... 
doi:10.1109/ipdpsw.2013.275 dblp:conf/ipps/AdamsBS13 fatcat:r4ltn7yxa5hs5jnxvpeqibkt34

Examining the architecture of cellular computing through a comparative study with a computer

Degeng Wang, Michael Gribskov
2005 Journal of the Royal Society Interface  
Cellular processes are implemented in biochemical pathways in parallel manners. In a computer, on the other hand, the software provides only instructions and data for the CPU.  ...  A process represents just sequentially ordered actions by the CPU and only virtual parallelism can be implemented through CPU time-sharing.  ...  Multiple RNAs can be translated in a parallel manner into proteins (figure 3b ), whereas instructions in a computer program are retrieved sequentially through the bus.  ... 
doi:10.1098/rsif.2005.0038 pmid:16849179 pmcid:PMC1629074 fatcat:gcke4gnb4vb7lid5pidmqdbcym

Highly Parallel Sparse Cholesky Factorization

John R. Gilbert, Robert Schreiber
1992 SIAM Journal on Scientific and Statistical Computing  
The algorithm, which we call Router Cholesky, is based on a theoretically eflldent algorithm in the PRAM model of parallel computation.  ...  In some cases entirely new approaches may be appropriate for highly parallel algorithms; examples of experiments with such apprc_ches include particlein-box flow simulation, knowledge base maintenance  ... 
doi:10.1137/0913067 fatcat:kbmkvxdlyvevbeshguupvca6gm

Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems

Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, Florent Lopez
2016 ACM Transactions on Mathematical Software  
To face the advent of multicore processors and the ever increasing complexity of hardware architectures, programming models based on DAG parallelism regained popularity in the high performance, scientific  ...  This paper evaluates the usability and effectiveness of runtime systems based on the Sequential Task Flow model for complex applications, namely, sparse matrix multifrontal factorizations which feature  ...  Faverge as well as the reviewers for their constructive suggestions on a preliminary version of this manuscript.  ... 
doi:10.1145/2898348 fatcat:mos5jdb5crfbffzzp4a3ut4yye

ParCYCLIC: finite element modelling of earthquake liquefaction response on parallel computers

Jun Peng, Jinchi Lu, Kincho H. Law, Ahmed Elgamal
2004 International journal for numerical and analytical methods in geomechanics (Print)  
This paper presents the computational procedures and solution strategy employed in ParCYCLIC, a parallel nonlinear finite element program developed based on an existing serial code CYCLIC for the analysis  ...  Not only good agreement is achieved between the computed and recorded results, but also the computational results show excellent parallel performance and scalability of ParCYCLIC on parallel computers  ...  One approach in developing application software for distributed memory parallel computers is to use the single-program-multiple-data (SPMD) paradigm [28, 29] .  ... 
doi:10.1002/nag.384 fatcat:s2r4jt6fazh7lmz6ptei3o7ppm

Comparing the performance of concurrent linked-list implementations in Haskell

Martin Sulzmann, Edmund S.L. Lam, Simon Marlow
2008 Proceedings of the 4th workshop on Declarative aspects of multicore programming - DAMP '09  
Finally, we suggest the addition of a single primitive which in our experiments improves the performance of one of the STM-based implementations by more than a factor of 7.  ...  Haskell has a rich set of synchronization primitives for implementing shared-state concurrency abstractions, ranging from the very high level (Software Transactional Memory) to the very low level (mutable  ...  Acknowledgements We thank the reviewers for their comments on a previous version of this paper.  ... 
doi:10.1145/1481839.1481845 dblp:conf/popl/SulzmannLM09 fatcat:z6phijrcszfhhmkx2ee4ro7ti4

A FPGA implementation of JPEG baseline encoder for wearable devices

Yuecheng Li, Wenyan Jia, Bo Luan, Zhi-hong Mao, Hong Zhang, Mingui Sun
2015 2015 41st Annual Northeast Biomedical Engineering Conference (NEBEC)  
An optimized dataflow configuration with a padding scheme simplifies the timing control for data transfer.  ...  Our experiments with a system-on-chip multi-sensor system have verified our FPGA implementation with respect to real-time performance, computational efficiency, and FPGA resource utilization.  ...  The JPEG baseline codec, which is based on the 8×8 discrete cosine transform (DCT) and sequential encoding, has been the most implemented codec suitable for wearable devices [2] .  ... 
doi:10.1109/nebec.2015.7117173 pmid:26190911 pmcid:PMC4505724 fatcat:vaomu7alajcohjf5xfstsklm3a

Comparing the performance of concurrent linked-list implementations in Haskell

Martin Sulzmann, Edmund S.L. Lam, Simon Marlow
2009 SIGPLAN notices  
Finally, we suggest the addition of a single primitive which in our experiments improves the performance of one of the STM-based implementations by more than a factor of 7.  ...  Haskell has a rich set of synchronization primitives for implementing shared-state concurrency abstractions, ranging from the very high level (Software Transactional Memory) to the very low level (mutable  ...  Acknowledgements We thank the reviewers for their comments on a previous version of this paper.  ... 
doi:10.1145/1629635.1629643 fatcat:k7nsqmnk45f2hpyo5jcmjgh2bq
« Previous Showing results 1 — 15 out of 11,913 results