Filters








702 Hits in 3.5 sec

Graph Coloring on a Coarse Grained Multiprocessor (Extended Abstract) [chapter]

Assefaw Hadish Gebremedhin, Isabelle Guérin Lassous, Jens Gustedt, Jan Arne Telle
2000 Lecture Notes in Computer Science  
We present the first efficient algorithm for a coarse grained multiprocessor that colors a graph G with a guarantee of at most ∆ G ¡ 1 colors.  ...  Résumé : Nous présentons le premier algorithme pour une machine multiprocesseurs à gros grain qui colorie un graphe G avec une garantie d'au plus ∆ G ¡ 1 couleurs.  ...  In the next section we review the coarse grained models of parallel computation and the basics of graph coloring heuristics.  ... 
doi:10.1007/3-540-40064-8_18 fatcat:yqxj5agtgrhvbo3s2osoowcysq

Cache Optimization for Coarse Grain Task Parallel Processing Using Inter-Array Padding [chapter]

Kazuhisa Ishizaka, Motoki Obata, Hironori Kasahara
2004 Lecture Notes in Computer Science  
In multigrain parallelization, coarse grain task parallelism among loops and subroutines and near fine grain parallelism among statements are used in addition to the traditional loop parallelism.  ...  The wide use of multiprocessor system has been making automatic parallelizing compilers more important.  ...  The evaluation on two multiprocessors shows that OSCAR with padding on page coloring gave us the best performance on both machines.  ... 
doi:10.1007/978-3-540-24644-2_5 fatcat:27gypcyrobhibbgjn6ddcnvr5y

Coarse Grained Parallel Algorithms for Detecting Convex Bipartite Graphs [chapter]

Edson Cáceres, Albert Chan, Frank Dehne, Giuseppe Prencipe
2000 Lecture Notes in Computer Science  
Our algorithm for determining whether a bipartite graph is convex includes a novel, coarse grained parallel, version of the PQ tree data structure introduced by Booth and Lueker.  ...  In this paper, we present parallel algorithms for the coarse grained multicomputer (CGM) and bulk synchronous parallel computer (BSP) for solving two w ell known graph problems: (1) determining whether  ...  Multiple General Reduce Operations on a PQ-Tree Using the coarse grained parallel MDreduce algorithm presented in the previous section, we will now develop coarse grained parallel algorithm for the general  ... 
doi:10.1007/3-540-40064-8_9 fatcat:rp5fgk2ikzctnprydnmlmwa7pe

Page 2771 of Mathematical Reviews Vol. , Issue 2002D [page]

2002 Mathematical Reviews  
networks (172-183); Assefaw Hadish Gebremedhin, Isabelle Guérin Lassous, Jens Gustedt and Jan Arne Telle, Graph coloring on a coarse grained multiprocessor (extended abstract) (184-195); Frank Gurski  ...  use (71- 82); Edson Caceres, Albert Chan, Frank Dehne and Giuseppe Prencipe, Coarse grained parallel algorithms for detecting convex bipartite graphs (83-94); Serafino Cicerone and Gabriele Di Ste- fano  ... 

SHARED MEMORY VERSUS MESSAGE PASSING FOR ITERATIVE SOLUTION OF SPARSE, IRREGULAR PROBLEMS

FREDERIC T. CHONG, ANANT AGARWAL
1999 Parallel Processing Letters  
The benefits of hardware support for shared memory versus those for message passing are difficult to evaluate without an in-depth study of real applications on a common platform.  ...  We evaluate the communication mechanisms of the MIT Alewife machine, a multiprocessor which provides integrated cache-coherent shared memory, message passing, and DMA.  ...  Nodes are colored to ensure independence and then the computation works on the nodes one color at a time.  ... 
doi:10.1142/s0129626499000177 fatcat:ja24n7c6w5ghjk3jfiq3dqmzgq

A multigrain Delaunay mesh generation method for multicore SMT-based architectures

Christos D. Antonopoulos, Filip Blagojevic, Andrey N. Chernikov, Nikos P. Chrisochoides, Dimitrios S. Nikolopoulos
2009 Journal of Parallel and Distributed Computing  
We focus on Parallel Constrained Delaunay Mesh (PCDM) generation. We exploit coarse-grain parallelism at the subdomain level, medium-grain at the cavity level and fine-grain at the element level.  ...  The exploitation of the coarser degree of granularity facilitates scalability both in terms of execution time and problem size on loosely-coupled clusters.  ...  Fine-Grain: Execution on a SMT-based Multiprocessor As a next step, we evaluated the performance of a fine+coarse multi-grain PCDM implementation on the same layered, CMP/SMT based multiproces-sor.  ... 
doi:10.1016/j.jpdc.2009.03.009 fatcat:ytds5g2b6jgn3m5mrvxnhgyivi

Multigrain parallel Delaunay Mesh generation

Christos D. Antonopoulos, Xiaoning Ding, Andrey Chernikov, Filip Blagojevic, Dimitrios S. Nikolopoulos, Nikos Chrisochoides
2005 Proceedings of the 19th annual international conference on Supercomputing - ICS '05  
We focus on Parallel Constrained Delaunay Mesh (PCDM) generation. We exploit coarse-grain parallelism at the subdomain level and fine-grain at the element level.  ...  However, experiments on a simulated SMT indicate that with modest hardware support it is possible to exploit fine-grain parallelism opportunities.  ...  We would like to thank Chaman Verma for his initial implementation of the medium-grain PCDM algorithm and the anonymous referees for their valuable comments.  ... 
doi:10.1145/1088149.1088198 dblp:conf/ics/AntonopoulosDCBNC05 fatcat:2rfdnb2w75ewfmuv7n2cyvfbmm

Scheduling DAG's for asynchronous multiprocessor execution

B.A. Malloy, E.L. Lloyd, M.L. Soffa
1994 IEEE Transactions on Parallel and Distributed Systems  
A new approach is given for scheduling a sequential instruction stream for execution '4n parallel" on asynchronous multiprocessors.  ...  Together, our results establish that fine grained parallelism can be exploited in a substantial manner when scheduling a sequential instruction stream for execution "ln parallel" on asynchronous multiprocessors  ...  Simons provided this version for the NP-completeness proof; our original proof, found in [ 161, is based on a reduction for 3-SAT. F.  ... 
doi:10.1109/71.282560 fatcat:axgziqquvbgdzj3o65ojocubry

Accelerating Video Captioning on Heterogeneous System Architectures

Horng-Ruey Huang, Ding-Yong Hong, Jan-Jan Wu, Kung-Fu Chen, Pangfeng Liu, Wei-Chung Hsu
2022 ACM Transactions on Architecture and Code Optimization (TACO)  
In this work, we propose a fine-grained scheduling scheme for mapping computation and devices within a video frame, and a pipeline scheduling scheme for exploiting maximum parallelism between the execution  ...  Accelerating such a hybrid model on a heterogeneous system is challenging because (1) CNN and RNN exhibit very different computing behaviors, making the mapping between computation and heterogeneous devices  ...  The computation behaves very diferently if we schedule one model at a time (coarse-grained) or one operation at a time (ine-grained).  ... 
doi:10.1145/3527609 fatcat:hzaazpn7dvgcbc26a6f7kf2iuy

Partitioning Strategy Selection for In-Memory Graph Pattern Matching on Multiprocessor Systems [chapter]

Alexander Krause, Thomas Kissinger, Dirk Habich, Hannes Voigt, Wolfgang Lehner
2017 Lecture Notes in Computer Science  
up on modern multiprocessor systems.  ...  To tackle these aspects, a fine-grained graph partitioning becomes increasingly important.  ...  For both dimensions those units are either fine-grained edges (E), vertices (V), or coarse-grained components (C) naming a connected set of vertices.  ... 
doi:10.1007/978-3-319-64203-1_11 fatcat:5wgamlxqinanlphktayjn5ooh4

Page 6199 of Mathematical Reviews Vol. , Issue 95j [page]

1995 Mathematical Reviews  
threshold functions (584-592); Hong Zhou Li and Guan Ying Li, Nonuniform lowness and strong nonuniform lowness (593-599); Xiao Tie Deng, A convex hull algorithm on coarse-grained multiprocessors (634-  ...  and f -coloring for various classes of graphs (199-207); C.  ... 

Parallelization of a dynamic unstructured application using three leading paradigms

Leonid Oliker, Rupak Biswas
1999 Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99  
The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures.  ...  We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2000, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2000, and a  ...  Since optimal graph coloring is NP-complete, we use a simple greedy algorithm. The processors can then work simultaneously on all triangles of the same color.  ... 
doi:10.1145/331532.331571 dblp:conf/sc/OlikerB99 fatcat:3fbdskwh3bb3dnpwm73zag4xlm

Region scheduling: an approach for detecting and redistributing parallelism

R. Gupta, M.L. Soffa
1990 IEEE Transactions on Software Engineering  
Absfruct-In dekeloping compiler techniques f o r p r o g r a m s targeted  ...  Ottenstein for their comments and suggestions on this work. We also thank the referees for their suggestions in improving this paper.  ...  These techniques include the detection of coarse grain parallelism useful in generation of code for loosely coupled multiprocessor systems.  ... 
doi:10.1109/32.54294 fatcat:idjth4ssbbcy7ave5vw4kasca4

The MIT Alewife machine

Anant Agarwal, Ricardo Bianchini, David Chaiken, Kirk L. Johnson, David Kranz, John Kubiatowicz, Beng-Hong Lim, Kenneth Mackenzie, Donald Yeung
1995 Proceedings of the 22nd annual international symposium on Computer architecture - ISCA '95  
Alewife is a multiprocessor architecture that supports up to 512 processing nodes connected over a scalable and cost-effective mesh network at a constant cost per node.  ...  to provide efficient communication and synchronization; support for fine-grain computation allows many processors to cooperate on small problem sizes; and latency tolerance mechanisms -including block  ...  The Alewife project is funded in part by ARPA contract # N00014-87-K-0825, in part by a NSF Experimental Systems grant # MIP-9012773, and in part by NSF Presidential Young Investigator Award.  ... 
doi:10.1145/223982.223985 dblp:conf/isca/AgarwalBCJKKLMY95 fatcat:6bhv57cqzvdw5pvjswfrf2k6mu

Parallelization of a dynamic unstructured algorithm using three leading programming paradigms

L. Oliker, R. Biswas
2000 IEEE Transactions on Parallel and Distributed Systems  
The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures.  ...  We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2000, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2000, and a  ...  The first strategy (GRAPH COLOR) uses graph coloring to form independent sets, where two triangles have different colors if they share a vertex.  ... 
doi:10.1109/71.879776 fatcat:6gjamrhcmrb7biabyr2yguzwrq
« Previous Showing results 1 — 15 out of 702 results