Filters








8,608 Hits in 2.3 sec

On the Design and Analysis of Irregular Algorithms on the Cell Processor: A Case Study of List Ranking

David A. Bader, Virat Agarwal, Kamesh Madduri
2007 2007 IEEE International Parallel and Distributed Processing Symposium  
We present a complexity model for designing algorithms on the Cell processor, along with a systematic procedure for algorithm analysis.  ...  Figure 1 . 1 List ranking for ordered and random list Figure 3 . 3 Step 2 of List ranking on Cell. (a) Linked list for which list ranking is to be done.  ...  Fig. 7 shows that for random lists we get an overall speedup of 8.34 (1 million vertices), and even for ordered lists we get a speedup of 1.5.  ... 
doi:10.1109/ipdps.2007.370266 dblp:conf/ipps/BaderAM07 fatcat:q3oayt732rhd5fcflc2ufwgnzi

High performance combinatorial algorithm design on the Cell Broadband Engine processor

David A. Bader, Virat Agarwal, Kamesh Madduri, Seunghwa Kang
2007 Parallel Computing  
For instance, on a 3.2 GHz IBM QS20 Cell/B.E. blade, for a random linked list of 1 million nodes, we achieve an overall speedup of 8.34 over a PPE-only implementation.  ...  In this article, we present two case studies to illustrate the design and implementation of parallel combinatorial algorithms on Cell/B.E.: we discuss list ranking, a fundamental kernel for graph problems  ...  We would also like to thank Sidney Manning (IBM Corporation) and Vipin Sachdeva (IBM Research) for providing valuable inputs during the course of our research.  ... 
doi:10.1016/j.parco.2007.09.005 fatcat:lyg43stjnbapthceoyaw4h66su

On efficient posting list intersection with multicore processors

Shirish Tatikonda, Flavio Junqueira, B. Barla Cambazoglu, Vassilis Plachouras
2009 Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09  
Common documents are then fed to the ranking phase for further processing.  ...  Since document scores are independent of each other, we can easily parallelize the ranking phase -each core takes a document from rank pool, ranks it, and proceeds to the next one.  ... 
doi:10.1145/1571941.1572104 dblp:conf/sigir/TatikondaJCP09 fatcat:smtceasrmrhcpkb4imfacdprei

A New Duplication Task Scheduling Algorithm in Heterogeneous Distributed Computing Systems

Aida A Nasr, Nirmeen A EL-Bahnasawy, Ayman EL-Sayed
2016 Bulletin of Electrical Engineering and Informatics  
An example for duplication algorithms is CPOP with duplication [18] .  ...  Examples of list-based algorithms are Heterogeneous Earliest Finish Time (HEFT) and Critical Path On Processor (CPOP) [17] .  ...  Random Graph Generator For building random DAGs the program requires the following input parameters.  N is the number of DAG tasks, where N {20,40,60,80,100,120}.  α (parallelism)is the shape parameter  ... 
doi:10.11591/559 fatcat:aflvfjq2x5fnxpteyfp45h6yzq

A New Duplication Task Scheduling Algorithm in Heterogeneous Distributed Computing Systems

Aida A Nasr, Nirmeen A EL-Bahnasawy, Ayman EL-Sayed
2016 Bulletin of Electrical Engineering and Informatics  
An example for duplication algorithms is CPOP with duplication [18] .  ...  Examples of list-based algorithms are Heterogeneous Earliest Finish Time (HEFT) and Critical Path On Processor (CPOP) [17] .  ...  Random Graph Generator For building random DAGs the program requires the following input parameters. • N is the number of DAG tasks, where N {20,40,60,80,100,120}. • α (parallelism)is the shape parameter  ... 
doi:10.11591/eei.v5i3.559 fatcat:zupqcrpw2jftdfdx33hfd4taue

Implementation of Computational Algorithms using Parallel Programming

Youssef Bassil
2019 International Journal of Trend in Scientific Research and Development  
There are several different types of parallel computing.  ...  This paper implements several computational algorithms using parallel programming techniques namely distributed message passing.  ...  Acknowledgment This research was funded by the Lebanese Association for Computational Sciences (LACSC), Beirut, Lebanon, under the "Parallel Programming Algorithms Research Project -PPARP2019".  ... 
doi:10.31142/ijtsrd22947 fatcat:cbiuohkqnbbudemqeznra4ayle

Fast and scalable list ranking on the GPU

M. Suhail Rehman, Kishore Kothapalli, P. J. Narayanan
2009 Proceedings of the 23rd international conference on Conference on Supercomputing - ICS '09  
Our RHJ implementation can rank a random list of 32 million elements in about a second and achieves a speedup of about 8-9 over a CPU implementation as well as a speedup of 3-4 over the best reported implementation  ...  In this paper, we describe two implementations of List Ranking, a traditional irregular algorithm that is difficult to parallelize on such massively multi-threaded hardware.  ...  Figure 7 : 7 Comparison of list ranking on GTX 280 against other architectures [2, 3] to rank a random list of 8 million nodes. Speedup on a GTX 280 is denoted over the respective bars.  ... 
doi:10.1145/1542275.1542311 dblp:conf/ics/RehmanKN09 fatcat:tmch6lf4c5f2hjs3sbwj2ugnru

A stochastic scheduling algorithm for precedence constrained tasks on Grid

Xiaoyong Tang, Kenli Li, Guiping Liao, Kui Fang, Fan Wu
2011 Future generations computer systems  
This paper addresses the problems in scheduling a precedence constrained tasks of parallel application with random tasks processing time and edges communication time on Grid computing systems so as to  ...  scheduling algorithms in terms of makespan, speedup, and makespan standard deviation.  ...  Acknowledgements The authors would like to thank the anonymous reviewers for their valuable comments and suggestions. This  ... 
doi:10.1016/j.future.2011.04.007 fatcat:eb6yqbctxraqvlk6hmkuxrk2ve

Fast Matlab compatible sparse assembly on multicore computers

Stefan Engblom, Dimitar Lukarski
2016 Parallel Computing  
We show how to do this, and moreover, we show how our implementation can be parallelized to utilize the power of modern multicore computers.  ...  Our freely available code, fully Matlab compatible, achieves about a factor of 5 times in speedup on a typical 6-core machine and 10 times on a dual-socket 16 core machine compared to the built-in serial  ...  Acknowledgment The authors would like to thank master's students Aidin Dadashzadeh and Simon Ternsjö who programmed an early version of the parallel fsparse code [5] .  ... 
doi:10.1016/j.parco.2016.04.001 fatcat:5gptshnxsfa6vkfdkngahn3i74

A Survey of Scheduling Algorithms for Heterogeneous Systems and Comparative Study of HEFT and CPOP Algorithms

Baldeep Singh, Priyanka Mehta
2016 International Journal of Engineering Research and  
List scheduling has constantly been a topic of conversation for the researchers due to its nature of solving high complexity troubles with minimum complexity and to estimate the additional scheduling problems  ...  for the applied matrix.  ...  Initialize the priority list with starting node. While There are unscheduled nodes in list do Select the highest rank node from list and ready to schedule then remove from list.  ... 
doi:10.17577/ijertv5is050045 fatcat:a54x365zpbhvzi5br2gc3xviwu

Adaptive Quality Equalizing: High-performance load balancing for parallel branch-and-bound across applications and computing systems

Nihar R. Mahapatra, Shantanu Dutt
2004 Parallel Computing  
AQE yields speedup improvements of up to 80%, and 15% on the average, compared to that provided by QE for several real-world mixed-integer programming (MIP) problems, and near-ideal speedups for two of  ...  In this paper, we present an adaptive version of our previously proposed quality equalizing (QE) load balancing strategy that attempts to maximize the performance of parallel branch-and-bound (B&B) by  ...  For a set S i of processors, the S-rank of a node u, denoted rank S u, in processor i is the position of the first equal-cost node v in a non-decreasing cost ordering of nodes in the OPEN lists of those  ... 
doi:10.1016/j.parco.2004.05.001 fatcat:3tlpn6hxk5h3vftqfvymno5p7i

An experimental validation of the PRO model for parallel and distributed computation

M. Essaidi, J. Gustedt
2006 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP'06)  
In this paper we present experimental results on parallel algorithms, designed using the PRO model, for two representative problems: list ranking and sorting.  ...  The Parallel Resource-Optimal (PRO) computation model was introduced by Gebremedhin et al. [2002] as a framework for the design and analysis of efficient parallel algorithms.  ...  ACKNOWLEDGMENTS The authors would like to thank Assefaw Hadish Gebremedhin and the anonymous referees for their valuable remarks and suggestions concerning the contents and the form of this paper.  ... 
doi:10.1109/pdp.2006.21 dblp:conf/pdp/EssadiG06 fatcat:todhd7qn2bforesi7pfkpyn66i

HordeQBF: A Modular and Massively Parallel QBF Solver [chapter]

Tomáš Balyo, Florian Lonsing
2016 Lecture Notes in Computer Science  
HordeQBF achieves superlinear average and median speedup on the hard application instances of the 2014 QBF Gallery.  ...  We integrated the QCDCL-based quantified Boolean formula (QBF) solver DepQBF in HordeSAT to obtain a massively parallel QBF solver---HordeQBF.  ...  We compute speedups for all the instances solved by the parallel solver.  ... 
doi:10.1007/978-3-319-40970-2_33 fatcat:rspy7uzybbgmba4pjijspnyg2a

Task Scheduling in Heterogeneous Multiprocessor Environments – An Efficient ACO-Based Approach

Nekiesha Edward, Jeffrey Elcock
2018 Indonesian Journal of Electrical Engineering and Computer Science  
For several types of applications, the task scheduling problem is crucial, and across the literature, a number of algorithms with several different approaches have been proposed.  ...  Our algorithm utilizes pheromone and a priority-based heuristic, known as the upward rank value, as well as an insertion-based policy and a pheromone aging mechanism to guide the ants to high quality solutions  ...  Speedup: The ratio between the sequential time and the parallel execution time of a process is defined as the speedup.  ... 
doi:10.11591/ijeecs.v10.i1.pp320-329 fatcat:7hpgjbwtpvdolgc3364ftoyiqi

Parallel algorithms for switching edges in heterogeneous graphs

Hasanuzzaman Bhuiyan, Maleq Khan, Jiangzhuo Chen, Madhav Marathe
2017 Journal of Parallel and Distributed Computing  
This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors.  ...  In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors.  ...  We are grateful to Anil Vullikanti for interesting discussions and helpful comments on a draft of this paper.  ... 
doi:10.1016/j.jpdc.2016.12.005 pmid:28757680 pmcid:PMC5526649 fatcat:pyazmucbbfbpvjcmslm7i6wwf4
« Previous Showing results 1 — 15 out of 8,608 results