3,476 Hits in 6.0 sec

Page 3582 of Mathematical Reviews Vol. , Issue 90F [page]

1990 Mathematical Reviews  
Kunde, Routing and sorting on mesh-connected arrays (extended abstract) (pp. 423-433); Yijie Han and Yoshi- hide Igarashi, Time lower bounds for parallel sorting on a mesh- connected processor array (  ...  Parallel routing and sorting: Danny Krizanc, Sanguthevar Ra- jasekaran and Thanasis Tsantilas, Optimal routing algorithms for mesh-connected processor arrays (extended abstract) (pp. 411- 422); Manfred  ... 

14.9 TFLOPS Three-Dimensional Fluid Simulation for Fusion Science with HPF on the Earth Simulator

H. Sakagami, H. Murai, Yoshiki Seo, M. Yokokawa
2002 ACM/IEEE SC 2002 Conference (SC'02)  
We succeeded in getting 14.9 TFLOPS performance when running a plasma simulation code IMPACT-3D parallelized with High Performance Fortran on 512 nodes of the Earth Simulator.  ...  The mesh size is 2048x2048x4096, and the third dimension was distributed for the parallelization.  ...  with the mesh size of 2048x2048x4096 on 512 nodes of the Earth Simulator.  ... 
doi:10.1109/sc.2002.10051 dblp:conf/sc/SakagamiMSY02 fatcat:mlg2q7iytnfard3exw37hrqvsa

Using high performance Fortran for parallel programming

G. Sarma, T. Zacharia, D. Miles
1998 Computers and Mathematics with Applications  
The conversion of the code from an original implementation on the Connection Machine systems using CM Fortran is described.  ...  The sections of the code requiring minimal inter-processor communication are easily parallelized, by changing only the syntax for specifying data layout.  ...  The correspondence between the global and element degrees of freedom is derived from the mesh connectivity array. It is convenient to place all entries of an element vector on the same processor.  ... 
doi:10.1016/s0898-1221(98)00095-9 fatcat:fzwcq52ks5cjnday5v4pamdpze

ParFUM: a parallel framework for unstructured meshes for scalable dynamic physics applications

Orion S. Lawlor, Sayantan Chakravorty, Terry L. Wilmarth, Nilesh Choudhury, Isaac Dooley, Gengbin Zheng, Laxmikant V. Kalé
2006 Engineering with Computers  
The framework is highly flexible and has been enhanced with numerous capabilities for the manipulation of unstructured meshes, such as parallel mesh adaptivity and collision detection. 1  ...  The Charm++ Parallel Framework for Unstructured Meshes allows one to write parallel programs that operate on unstructured meshes with only minimal knowledge of parallel computing, while making it possible  ...  Acknowledgments The authors wish to thank Milind Bhandarkar, the developer of ParFUM's predecessor, the Charm++ FEM Framework.  ... 
doi:10.1007/s00366-006-0039-5 fatcat:qm2vq7rl3vh4vj7ayu2ggpowx4

Partial Multinode Broadcast and Partial Exchange Algorithms for d-Dimensional Meshes

E.A. Varvarigos, D.P. Bertsekas
1994 Journal of Parallel and Distributed Computing  
We further look at a dynamic version of the problem, where broadcast requests are generated at random times.  ...  We propose algorithms for the d-dimensional mesh network that execute the partial multinode broadcast and the partial exchange communication tasks in near-optimal time.  ...  Some Preliminary Results . Packing and Monotone Routing for Meshes and i, respectively. . PMNB in d-dimensional Tori and Arrays  ... 
doi:10.1006/jpdc.1994.1130 fatcat:ondd7odirbe43pr7y5iytgwxdq

Parallel DSMC method using dynamic domain decomposition

J.-S. Wu, K.-C. Tseng
2005 International Journal for Numerical Methods in Engineering  
The current DSMC method is implemented on an unstructured mesh using particle raytracing technique, which takes the advantages of the cell connectivity information.  ...  the cells among processors during simulation.  ...  The authors also would like to express their sincere thanks to the computing resources provided by the National Center for High-Performance Computing of Taiwan.  ... 
doi:10.1002/nme.1232 fatcat:rmqgsfbc3jcjngfw5z4twlxnxm

Abstractions and Middleware for Petascale Computing and Beyond

Ivo F. Sbalzarini
2010 International Journal of Distributed Systems and Technologies  
gap: as machines contain more and more processor cores, the mean-time between failure drops below the typical runtime of a simulation, and 4. the data gap: storing, accessing, and analyzing the peta-bytes  ...  We outline the structure and functionality of such a middleware and demonstrate its feasibility on the example of the parallel particle-mesh library (PPM).  ...  Even though many bits and pieces exist in all areas (abstractions, languages, compilers, middleware, tools), they are yet to be combined in a programming model that is independent of the number of processors  ... 
doi:10.4018/jdst.2010040103 fatcat:yihhihij7rblbo3jzknckjeglu

Constant Time Simulation of an R-Mesh on an LR-Mesh

Carlos Alberto Cordova-Flores, Jose Alberto Fernandez-Zepeda, Anu G. Bourgeois
2007 2007 IEEE International Parallel and Distributed Processing Symposium  
This paper presents a constant time simulation of an R-Mesh on an LR-Mesh (a restricted model of the R-Mesh), proving that in spite of the differences, the two models possess the same complexity.  ...  In other words, the LR-Mesh can simulate a step of the R-Mesh in constant time with a polynomial increase in size. This simulation is based on Reingold's algorithm to solve USTCON in log-space.  ...  This simulation is optimal in the number of processors, and before this paper, it was the fastest simulation of this type.  ... 
doi:10.1109/ipdps.2007.370459 dblp:conf/ipps/Cordova-FloresFB07 fatcat:moq444q7m5cmveivbocsow6xm4

Mapping Control-Intensive Video Kernels onto a Coarse-Grain Reconfigurable Architecture: the H.264/AVC Deblocking Filter

C. Arbelo, A. Kanstein, S. Lopez, J. F. Lopez, M. Berekovic, R. Sarmiento, J.-Y. Mignolet
2007 2007 Design, Automation & Test in Europe Conference & Exhibition  
compared with an implementation on a Very Long Instruction Word (VLIW) dedicated processor.  ...  The results obtained show a considerable reduction in the number of cycles and memory accesses needed to perform the filtering as well as an increase in the degree of instruction parallelism (ILP) when  ...  The ADRES/DRESC framework The ADRES coarse-grained array processor, as shown in Fig. 3 , consists of an array of functional units (FUs), enhanced with register files (RFs) and connected through routing  ... 
doi:10.1109/date.2007.364587 dblp:conf/date/ArbeloKLLBSM07 fatcat:mfapuuuw65gazgr7cjdogcd7gu

Profiling Of Code_Saturne With Hpctoolkit And Tau, And Autotuning Kernels With Orio

B. Lindia
2014 Zenodo  
This study has profiled the application Code Saturne, which is part of the PRACE benchmark suite.  ...  Orio has been used on traditional Intel processors, Intel Xeon Phi and NVIDIA GPUs.The compute kernels have a small contribution to the overall execution time for Code Saturne.  ...  In order to find a good ratio between number of MPI ranks and OpenMP threads for optimal computation efficiency, we first examined the T-Junction case with tubular pipe geometry, 218k cells mesh, running  ... 
doi:10.5281/zenodo.822763 fatcat:f5yfn2npqrbytmkt4kpxalx3ei

PARTI primitives for unstructured and block structured problems

A. Sussman, J. Saltz, R. Das, S. Gupta, D. Mavriplis, R. Ponnusamy, K. Crowley
1992 Computing Systems in Engineering  
We present experimental data from a 3-D unstructured Euler solver run on the Intel Touchstone Delta to demonstrate the usefulness of our methods.  ...  This paper describes a set of primitives (PARTI) developed to e ciently execute unstructured and block structured problems on distributed memory parallel machines.  ...  This research was performed in part using the Intel Touchstone Delta System operated by Caltech on behalf of the Concurrent Supercomputing Consortium.  ... 
doi:10.1016/0956-0521(92)90096-2 fatcat:5jp32g7zmfhhxi3pqae4beyjwi

Efficient deterministic parallel simulation of 2D semiconductor devices based on WENO-Boltzmann schemes

José M. Mantas, María J. Cáceres
2009 Computer Methods in Applied Mechanics and Engineering  
The parallel algorithm has been implemented in C++ augmented with calls to MPI functions and functions of optimized linear algebra libraries.  ...  The data subdomain which demands most Preprint submitted to Elsevier 18 September 2008 of the computational workload has been suitably distributed among the processors and several parallel design decisions  ...  It was very useful to check the experimental results of the parallel solver for a single gate MOSFET device.  ... 
doi:10.1016/j.cma.2008.10.003 fatcat:2dgg33bzabg4bemvm6bwmjglfy

Run-time optimization of sparse matrix-vector multiplication on SIMD machines [chapter]

Louis H. Ziantz, Can C. Özturan, Boleslaw K. Szymanski
1994 Lecture Notes in Computer Science  
In this paper, we report on run-time optimization of array distribution and offprocessor data fetching to reduce both the communication and computation time.  ...  Actual runs on test matrices produced up to a 35 percent relative improvement over a block distribution with a naive multiplication algorithm while simulations over a wider range of processors indicate  ...  Duff who made their collections of meshes and sparse matrices available for use in benchmark tests in evaluating the algorithm's performance.  ... 
doi:10.1007/3-540-58184-7_111 fatcat:wtls7jf5sza2zicwl25leyxdfu

Evolution of Intracellular Ca2 + Waves from about 10,000 RyR Clusters: Towards Solving a Computationally Daunting Task [chapter]

Pan Li, Wenjie Wei, Xing Cai, Christian Soeller, Mark B. Cannell, Arun V. Holden
2009 Lecture Notes in Computer Science  
It is shown that high-performance implementations and optimizations must match both the underlying computations and the target parallel platform.  ...  Several good practices are summarized for parallel programming and performance analysis on the multi-core architecture, which can be of help to many other scientists.  ...  In this part, we also developed a set of methodologies about how to select a suitable numerical method and find the corresponding time step size when the total simulation error is indirectly given.  ... 
doi:10.1007/978-3-642-01932-6_2 fatcat:xpf52ikgtravtcfslnm3vkuwtq

Distributed data structure design for scientific computation

Jan-Jan Wu, Pangfeng Liu
1998 Proceedings of the 12th international conference on Supercomputing - ICS '98  
The VGDS e ort focuses on developing an integrated, distributed environment that allows fast prototyping of a diverse set of simulation problems in scienti c and engineering domains, including regular,  ...  The framework de nes three b ase libraries, Array, Graph, and Tree, that capture major data structures involved i n scienti c computation.  ...  Acknowledgment Support for this work is provided by National Science Council of Taiwan under grant 86-2213-E-001-009.  ... 
doi:10.1145/277830.277879 dblp:conf/ics/WuL98 fatcat:ypa6hducyff2tooq2blrubuayy
« Previous Showing results 1 — 15 out of 3,476 results