35,097 Hits in 4.4 sec

The Case for Message Passing on Many-Core Chips [chapter]

Rakesh Kumar, Timothy G. Mattson, Gilles Pokam, Rob Van Der Wijngaart
2010 Multiprocessor System-on-Chip  
Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining  ...  Send comment regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services.  ...  Hence these algorithms work well with message passing and shared address space programming models. The classic "result parallelism" pattern is geometric decomposition.  ... 
doi:10.1007/978-1-4419-6460-1_5 fatcat:dtilxwuw7vcd5ecgrkc5ptzw2y

A refinement methodology for developing data-parallel applications [chapter]

Lars Nyland, Jan Prins, Allen Goldberg, Peter Mills, John Reif, Robert Wagner
1996 Lecture Notes in Computer Science  
Primary issues are algorithm choice, correctness and efficiency, followed by data decomposition, load balancing and message-passing coordination.  ...  We conclude by describing tool support for the process.  ...  Conclusions We have proposed a tree-based refinement strategy for developing data-parallel applications.  ... 
doi:10.1007/3-540-61626-8_18 fatcat:5xlosngmibgcllzypdwezo6oru

Parallel adaptive wavelet analysis

R Kutil, A Uhl
2001 Future generations computer systems  
Therefore, parallel processing is one of the possibilities to accelerate the processing speed.  ...  Given a large or high-dimensional data set the computational demand is too high for interactive or "nearly-interactive" processing.  ...  Acknowledgements We want to thank the NIC Jülich for providing access to its Cray T3E systems. R. Kutil's work was partially supported by the Austrian Science Fund FWF, Project No. P11045-ÖMA.  ... 
doi:10.1016/s0167-739x(00)00079-0 fatcat:loqw4lut2na6ncfwjnvk4oc5ua

Avoiding Algorithmic Obfuscation in a Message-Driven Parallel MD Code [chapter]

James C. Phillips, Robert Brunner, Aritomo Shinozaki, Milind Bhandarkar, Neal Krawetz, Attila Gursoy, Laxmikant Kalé, Robert D. Skeel, Klaus Schulten
1999 Lecture Notes in Computational Science and Engineering  
In order to avoid obfuscation of the simulation algorithm by the parallel framework, the algorithm associated with a patch is encapsulated by a single function executing in a separate thread.  ...  The execution of compute objects takes place in a prioritized message-driven manner, allowing maximum overlap of work and communication without significant programmer effort.  ...  Finally, it was found that a patchcentric flow of control created a mixing of the essentially serial simulation algorithm with the parallel logic for responding to incoming messages, obfuscating both and  ... 
doi:10.1007/978-3-642-58360-5_28 fatcat:ek7yeejkbjfy3lmc4csj7ssrqi

Modeling of Communication Complexity in Parallel Computing

Juraj Hanuliak
2014 American Journal of Networks and Communications  
Parallel principles are the most effective way how to increase parallel computer performance and parallel algorithms (PA) too.  ...  In this sense the paper is devoted to modeling of communication complexity in parallel computing (parallel computers and algorithms).  ...  Acknowledgements This work was done within the project "Complex modeling, optimization and prediction of parallel computers and algorithms" at University of Zilina, Slovakia.  ... 
doi:10.11648/j.ajnc.s.2014030501.13 fatcat:xubvnbfukrhzbnrwkbxjtk5qxy

Parallel Algorithm for Finding Inverse of a Matrix and its Application in Message Sharing (Coding Theory)

Shruti Saraf, Swati Dhingra, Greetta Pinheiro
2016 International Journal of Computer Applications  
A parallel algorithm for finding the inverse of the matrix using Gauss Jordan method in OpenMP.  ...  Then, authors have analyzed the parallel algorithm for computing the inverse of the matrix and compared it with its perspective sequential algorithm in terms of run time, speed-up and efficiency.  ...  With the use of parallel inverse algorithm of the matrix, the system can decrypt the message much faster than its sequential algorithm.  ... 
doi:10.5120/ijca2016908569 fatcat:3vmquandujgq7id2mvj43jkg54

Multi-Variable Agents Decomposition for DCOPs

Ferdinando Fioretto, William Yeoh, Enrico Pontelli
models and proposes the use of GPUs; and (iii) Reduces the amount of computation and communication required in several classes of DCOP algorithms.  ...  DCOP decompositiontechnique, which: (i) Exploits the co-locality of each agent's variables, allowing us to adopt efficient centralized techniques within each agent; (ii) Enables the use of hierarchical parallel  ...  messages to request for cost estimates and announce complete solutions.These broadcasts occur more regularly with Decomposition and Compilation than with the MVA decomposition.• The number of messages  ... 
doi:10.1609/aaai.v30i1.10127 fatcat:y7evhbv66jdm5ocwmubcfksz5m

A Multi-stage Graph Decomposition Algorithm for Distributed Constraint Optimisation

Terence Law, Adrian Pearce
2006 2006 IEEE/WIC/ACM International Conference on Intelligent Agent Technology  
Algorithms must contain enough centralisation in order to limit communication sufficiently, while facilitating distributed and parallel search (for example, see [1]).  ...  Our decomposition algorithm is particularly attractive for its employment of the repeatedly-half principle to manage complexity.  ...  required for the Adopt algorithm, while DistDecomp only requires a fixed number of communication messages.  ... 
doi:10.1109/iat.2006.18 dblp:conf/iat/LawP06 fatcat:xoifvmbddnc4xdvyedmrx5joyy

A Parallel Quadtree Approach For Image Compression Using Wavelets

Hamed Vahdat Nejad, Hossein Deldari
2008 Zenodo  
At the other hand, parallel computing technologies are an efficient method for image compression using wavelets. In this paper, we propose a parallel wavelet compression algorithm based on quadtrees.  ...  We implement the algorithm using MatlabMPI (a parallel, message passing version of Matlab), and compute its isoefficiency function, and show that it is scalable.  ...  Since wavelet decomposition is naturally adaptable to quadtrees, we use a quadtree structure for parallelizing image compression. We implement the application using message passing model.  ... 
doi:10.5281/zenodo.1075289 fatcat:ufcbbwaydzhp5fbbdkiqkyahsu

The application of graph decomposition to development of large scale agent-based economic models

A.R. Bakhtizin, Makarov V.., Sushko E.., Sushko G..
2019 Advances in Systems Science and Applications  
In this work we describe the application of the graph decomposition algorithms for the development of a scalable high-performance agent-based model of population of Russia described in terms of demography  ...  To perform a load balancing of agents between cluster computer nodes the METIS graph decomposition algorithm was used.  ...  The METIS library implements a multilevel recursive coarsegrained algorithm for calculation of the reasonably good decomposition.  ... 
doi:10.25728/assa.2019.19.1.594 fatcat:2gxkeqjhmzgc5foebdylchbhxu

Performance tuning and evaluation of a parallel community climate model

John B. Drake, Steve Hammond, Rodney James, Patrick H. Worley
1999 Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99  
The Parallel Community Climate Model (PCCM) is a message-passing parallelization of version 2.1 of the Community Climate Model (CCM) developed by researchers at Argonne and Oak Ridge National Laboratories  ...  In preparation for use in the Department of Energy's Parallel Climate Model (PCM), PCCM has recently been updated with new physics routines from version 3.2 of the CCM, improvements to the parallel implementation  ...  For T170L18, the maximum is 128 processors. Also note that the choice of parallel FFT algorithm is not important in 1xP decompositions.  ... 
doi:10.1145/331532.331566 dblp:conf/sc/DrakeHJW99 fatcat:kopucokyafa7jdvlx2godpm3im

The Cost and Benefits of Coordination Programming: Two Case Studies in Concurrent Collections and S-NET

Pavel Zaichenkov, Olga Tveretina, Alex Shafarenko, Bert Gijsbers, Clemens Grelck
2016 Parallel Processing Letters  
Our case study is based on two applications: a face detection algorithm implemented as a pipeline of feature classifiers and a numerical algorithm from the linear algebra domain, namely Cholesky decomposition  ...  The selected applications are representative and have been selected by Intel researchers as evaluation testbeds for CnC in the past.  ...  The data-driven implementation in S-Net is based on precisely the same sequence of algorithmic steps as the CnC one.  ... 
doi:10.1142/s0129626416500110 fatcat:spubspagovhttm57xaufztivp4

Parallel Volume Rendering with Early Ray Termination for Visualizing Large-Scale Datasets [chapter]

Manabu Matsui, Fumihiko Ino, Kenichi Hagihara
2004 Lecture Notes in Computer Science  
This paper presents an efficient parallel algorithm for volume rendering of large-scale datasets.  ...  As a result, our load-balanced algorithm reduces the execution time to at least 66%, not only for dense objects but also for transparent objects.  ...  The SRC algorithm is an object-parallel algorithm that parallelizes the ray casting algorithm with a block-block decomposition.  ... 
doi:10.1007/978-3-540-30566-8_30 fatcat:6xjtkxpqlzb5pc5ypcqflyhiri

A Case Study in Coordination Programming: Performance Evaluation of S-Net vs Intel's Concurrent Collections

Pavel Zaichenkov, Bert Gijsbers, Clemens Grelck, Olga Tveretina, Alex Shafarenko
2014 2014 IEEE International Parallel & Distributed Processing Symposium Workshops  
As a coordination language S-NET achieves a near-complete separation of concerns between sequential software components implemented in a separate algorithmic language and their parallel orchestration in  ...  We investigate the merits of S-NET and CnC with the help of a relevant and non-trivial linear algebra problem: tiled Cholesky decomposition.  ...  As an example application we choose tiled Cholesky decomposition, a linear algebra algorithm that lends itself easily to parallelization for a multi-core system.  ... 
doi:10.1109/ipdpsw.2014.118 dblp:conf/ipps/ZaichenkovGGTS14 fatcat:i4ntu7rstbbinbkdmoldww5fey

Optimization of 3-D Wavelet Decomposition on Multiprocessors

Rade Kutil, Andreas Uhl
2000 Journal of Computing and Information Technology  
In this work we discuss various ideas for the optimization of 3-D wavelet/subband decomposition on shared memory MIMD computers.  ...  We theoretically evaluate the characteristics of these approaches and verify the results on parallel computers.  ...  We want to thank the ZID of the University of Linz for providing access to its SGI Origin2000.  ... 
doi:10.2498/cit.2000.01.04 fatcat:mpjzdjkuivhe5io4334wzedzqe
« Previous Showing results 1 — 15 out of 35,097 results