12,760 Hits in 4.9 sec


Thomas Rauber, Robert Reilein, Gudula Rünger
2001 Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '01  
In this paper, we introduce a library support for the specification of message-passing programs in a group-SPMD style allowing different partitions in a single program.  ...  A suitable parallel programming model is a group-SPMD model, which requires a structuring of the processors into subsets and a partition of the program into multi-processor tasks.  ...  Acknowledgement We thank the NIC Jülich for providing access to the Cray T3E and Matthias Kühnemann for performing measurements of the iterated RK method on the T3E and the CLiC.  ... 
doi:10.1145/582034.582061 dblp:conf/sc/RauberRR01 fatcat:nmrfl3dl3rgmfbr2ykbfs3t7re

A decomposition approach for optimizing the performance of MPI libraries

O. Hartmann, M. Kunemann, T. Rauber, G. Ruger
2006 Proceedings 20th IEEE International Parallel & Distributed Processing Symposium  
In this article we show that the performance of both, standard libraries and vendor-specific libraries, can be improved by an orthogonal organization of the processors in 2D or 3D meshes and by decomposing  ...  The decomposition approach has been implemented in the form of a library extension which is called for each activation of a collective MPI operation.  ...  Orthogonal Processor Groups The realization of collective communication operations in consecutive phases based on an orthogonal partitioning of the processor set can be applied for arbitrary MPI libraries  ... 
doi:10.1109/ipdps.2006.1639721 dblp:conf/ipps/HartmannKRR06 fatcat:eff5y3pmajgi3m5kh63p3lwgmm

Mixed Task and Data Parallel Executions in General Linear Methods

Thomas Rauber, Gudula Rünger
2007 Scientific Programming  
In this paper we study mixed task and data parallel implementations for general linear methods realized using a library for multiprocessor task programming.  ...  A class of applications which can benefit from this programming style are methods for solving systems of ordinary differential equations.  ...  Acknowledgment We thank the anonymous referees for their helpful comments. We also thank the NIC Jülich for access to its parallel machines.  ... 
doi:10.1155/2007/683198 fatcat:qfx4dyhn2rcvpinj4vx2ggib2q

Optimizing MPI collective communication by orthogonal structures

Matthias Kühnemann, Thomas Rauber, Gudula Rünger
2006 Cluster Computing  
But also on a Cray T3E a significant improvement can be obtained by a careful selection of the processor groups.  ...  In this article, we show for different MPI implementations how the execution time of collective communication operations can be significantly improved by a restructuring based on orthogonal processor structures  ...  Acknowledgment We thank the NIC Jülich for providing access to a Cray T3E.  ... 
doi:10.1007/s10586-006-9740-9 fatcat:j6gzxi5wizeenimocl35e5z3nq

Library Support for Hierarchical Multi-Processor Tasks

T. Rauber, G. Runger
2002 ACM/IEEE SC 2002 Conference (SC'02)  
We present a runtime library to support the coordination of hierarchically structured multi-processor tasks.  ...  The library exploits an extended parallel group SPMD programming model and manages the entire task execution including the dynamic hierarchy of processor groups.  ...  Acknowledgement We thank the NIC Jülich for providing access to the Cray T3E.  ... 
doi:10.1109/sc.2002.10064 dblp:conf/sc/RauberR02 fatcat:zzz4gjjgg5gz3djvj72f3quy2a

ePUMA embedded parallel DSP processor with Unique Memory Access

Dake Liu, A. Karlsson, J. Sohl, Jian Wang, M. Petersson, Wenbiao Zhou
2011 2011 8th International Conference on Information, Communications & Signal Processing  
It is an on chip multi-DSP-processor (CMP) targeting to predictable signal processing for communications and multimedia.  ...  Computing unto 100GOPS without cooling is essential for high-end embedded systems and much required by markets.  ...  The FSM can be configured and it can support data access control for an iterative loop of a group of instructions.  ... 
doi:10.1109/icics.2011.6173516 dblp:conf/IEEEicics/LiuKSWPZ11 fatcat:a2towkbxerb6jorr6rwulxvvgq

Efficient, Massively Parallel Eigenvalue Computation

Yan Huo, Robert Schreiber
1993 The International Journal of Supercomputing Applications  
N: Order of matrix T_: Residual, HU T AU -A[IF/N O: Deviation from orthogonality, []uTu -II[F/N Numerical Results for Glued Wilkinson Matrix W + Huo acknowledges support from Electrical Engineering  ...  It is a difficult tridiagonal test cases for dealing with groups of close eigenvalues.  ... 
doi:10.1177/109434209300700402 fatcat:pmrgyw555bfylfv3r5wdhhhiwq

Multicore software technologies

Hahn Kim, Robert Bond
2009 IEEE Signal Processing Magazine  
Data parallelism is supported by work-items in a work-group executing a kernel together or multiple work-groups executing in parallel. See Figure 5 for an example of a vector add [26] .  ...  PVL runs on homogeneous multicore processors, e.g., x86 and PowerPC, and is built on top of computation and communication libraries optimized for each supported hardware architecture, e.g., MPI and VSIPL  ... 
doi:10.1109/msp.2009.934141 fatcat:wtla5y56mneqri2f6ip2b6keg4

PPM – A highly efficient parallel particle–mesh library for the simulation of continuum systems

I.F. Sbalzarini, J.H. Walther, M. Bergdorf, S.E. Hieber, E.M. Kotsalis, P. Koumoutsakos
2006 Journal of Computational Physics  
The present library solves the key parallelization issues involving particle-mesh interpolations and the balancing of processor particle loading, using a novel adaptive tree for mixed domain decompositions  ...  This paper presents a highly efficient parallel particle-mesh (PPM) library, based on a unifying particle formulation for the simulation of continuous systems.  ...  Gonnet (CSE Lab, ETHZ) as well as implementation support by I. Oppermann and B. Polasek (ETHZ). Computer resources were provided by the Swiss National Supercomputing Centre (CSCS).  ... 
doi:10.1016/ fatcat:jkvh4xqjpbgvlike56zcvsp4sq

Parallel, multigrain iterative solvers for hiding network latencies on MPPs and networks of clusters

James R. McCombs, Andreas Stathopoulos
2003 Parallel Computing  
reason for synchronization in linear algebra codes, are particularly expensive because the data being exchanged between processors is small compared to the overheads.  ...  GMRES [21] for linear systems and Arnoldi [6] for symmetric eigenvalue problems are two popular choices.  ...  Tables 5 and 6 show the results for each matrix and each partitioning scheme, for 64 and 128 processors, a block size of ¦ ¥ , and four solve groups.  ... 
doi:10.1016/s0167-8191(03)00101-7 fatcat:d3gbbga5m5dapnf4z4zvc7gdw4

Distributed data structure design for scientific computation

Jan-Jan Wu, Pangfeng Liu
1998 Proceedings of the 12th international conference on Supercomputing - ICS '98  
The framework de nes three b ase libraries, Array, Graph, and Tree, that capture major data structures involved i n scienti c computation.  ...  The layered approach enables easy extension of the base libraries to a variety of application-speci c data structures. Experimental results on a Sun UltraSparc workstation cluster is reported.  ...  Acknowledgment Support for this work is provided by National Science Council of Taiwan under grant 86-2213-E-001-009.  ... 
doi:10.1145/277830.277879 dblp:conf/ics/WuL98 fatcat:ypa6hducyff2tooq2blrubuayy

Future wireless convergence platforms

John Glossner, Stamatis Vassiliadis, Mayan Moudgill, Daniel Iancu, Gary Nacer, Sanjay Jintukar, Stuart Stanley, Michael Samori, Tanuj Raja, Michael Schulte
2005 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis - CODES+ISSS '05  
The processor is programmed in C with supercomputer-class compiler support for automatic vectorization, multithreading, and DSP semantic analysis.  ...  From a processor architecture perspective, support for signal processing (both audio and video), control code, and Java execution will be required in a convergent device.  ...  While this is good for code density, orthogonality 1 suffered.  ... 
doi:10.1145/1084834.1084841 dblp:conf/codes/GlossnerMINJSSRSV05 fatcat:6gigbaib6zgqtht3356k4mhqfu

A parallel algorithm for the reduction of a nonsymmetric matrix to block upper-Hessenberg form

Michael W. Berry, Jack J. Dongarra, Youngbae Kim
1995 Parallel Computing  
We conclude with an evaluation of the algorithm's communication cost, and suggest areas for further improvement. 0167-8191/95/$09.50 0 1995 Elsevier Science B.V.  ...  In this paper, we present an algorithm for the reduction to block upper-Hessenberg form which can be used to solve the nonsymmetric eigenvalue problem on message-passing multicomputers.  ...  The authors also thank the anonymous referees for the helpful comments and suggestions for improving the manuscript.  ... 
doi:10.1016/0167-8191(95)00015-g fatcat:ikzbphridfgbfbxphclvygafgq

Implementation of algebraic procedures on the GPU using CUDA architecture on the example of generalized eigenvalue problem

Łukasz Syrocki, Grzegorz Pestka
2016 Open Computer Science  
The presented matrix structures allow for the analysis of the advantages of using graphics processors in such calculations.  ...  AbstractThe ready to use set of functions to facilitate solving a generalized eigenvalue problem for symmetric matrices in order to efficiently calculate eigenvalues and eigenvectors, using Compute Unified  ...  ., LAPACK Working Note 41, UT-CS- [24] Intel Processors, 92-151, March, 1992.  ... 
doi:10.1515/comp-2016-0006 fatcat:xs5ozf2hjjd5vdssoaflagejbe

Exploiting processor groups to extend scalability of the GA shared memory programming model

Jarek Nieplocha, Manoj Krishnan, Bruce Palmer, Vinod Tipparaju, Yeliang Zhang
2005 Proceedings of the 2nd conference on Computing frontiers - CF '05  
Exploiting processor groups is becoming increasingly important for programming next-generation high-end systems composed of tens or hundreds of thousands of processors.  ...  Similarly, processor groups were very effective in improving scalability of a Molecular Dynamics application.  ...  ORT [4] , a library based on group-SPMD programming model with orthogonal processor groups built on top of MPI, targets primarily grid-based applications.  ... 
doi:10.1145/1062261.1062305 dblp:conf/cf/NieplochaKPTZ05 fatcat:tihgxxb32zgolmare5optoiiji
« Previous Showing results 1 — 15 out of 12,760 results