2,618 Hits in 4.4 sec

High Performance Detection of Strongly Connected Components in Sparse Graphs on GPUs

Pingfan Li, Xuhao Chen, Jie Shen, Jianbin Fang, Tao Tang, Canqun Yang
2017 Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM'17  
Detecting strongly connected components (SCC) has been broadly used in many real-world applications.  ...  In this paper, we present a parallel SCC detection implementation on GPUs that achieves high performance on both synthetic and real-world graphs.  ...  INTRODUCTION Strongly connected component (SCC) detection is a fundamental graph analysis problem that is pervasively present in many application domains.  ... 
doi:10.1145/3026937.3026941 dblp:conf/ppopp/LiCSFTY17 fatcat:2rnsb7br5fgrfmgmy2vvkuymq4

MGSim + MGMark: A Framework for Multi-GPU System Research [article]

Yifan Sun, Trinayan Baruah, Saiful A. Mojumder, Shi Dong, Rafael Ubal, Xiang Gong, Shane Treadway, Yuhui Bao, Vincent Zhao, José L. Abellán, John Kim, Ajay Joshi, David Kaeli
2018 arXiv   pre-print
We complement MGSim with MGMark, a suite of multi-GPU workloads that explores multi-GPU collaborative execution patterns.  ...  We illustrate the novel simulation capabilities provided by our simulator through a case study exploring programming models based on a unified multi-GPU system (U-MGPU) and a discrete multi-GPU system  ...  We also create a cross-GPU connection that connects the ACE of the first GPU with the CUs of all other GPUs. The DRAM banks in the multi-GPU systems are interleaved with a granularity of 4KB.  ... 
arXiv:1811.02884v3 fatcat:uqzjyera75dnnnpfeduess7qtq

Parallel Breadth First Search on GPU clusters

Zhisong Fu, Harish Kumar Dasari, Bradley Bebee, Martin Berzins, Bryan Thompson
2014 2014 IEEE International Conference on Big Data (Big Data)  
Hardware trends and manufacturing limits strongly imply that many-core devices, such as NVIDIA R GPUs and the Intel R Xeon Phi R , will become central components of such future systems.  ...  We extend previous research on GPUs and on scalable graph processing on supercomputers and demonstrate that a high-performance parallel graph machine can be created using commodity GPUs and networking  ...  -A strong and weak scaling study on up to 64 GPUs with an analysis of the strengths and weaknesses of our multi-GPU approach with respect to scalability.  ... 
doi:10.1109/bigdata.2014.7004219 dblp:conf/bigdataconf/FuDBBT14 fatcat:4znzla75ezho7fmry4xqtrtslq


Giorgos Vasiliadis, Michalis Polychronakis, Sotiris Ioannidis
2011 Proceedings of the 18th ACM conference on Computer and communications security - CCS '11  
In this paper, we present a multi-parallel intrusion detection architecture tailored for high speed networks.  ...  To cope with the increased processing throughput requirements, our system parallelizes network traffic processing and analysis at three levels, using multi-queue NICs, multiple CPUs, and multiple GPUs.  ...  Even though we focused on the parallelization of an intrusion detection system, we strongly believe that the proposed model can benefit a variety of other network monitoring applications, such as traffic  ... 
doi:10.1145/2046707.2046741 dblp:conf/ccs/VasiliadisPI11 fatcat:3ru3c7yct5bjblslqk3334kzwu

Towards Parallel Programming Models for Predictability

Björn Lisper, Marc Herbstritt
2012 Worst-Case Execution Time Analysis  
Thus, a promising route for the future is to develop WCET analyses for data-parallel software running on GPUs. ACM Subject Classification C.3 Special-Purpose and Application-Based Systems  ...  GPUs are increasingly used for high performance applications: we discuss a current GPU architecture, and we argue that it offers a parallel platform for compute-intensive applications for which it seems  ...  A possible development of future hardware architecture is that we will see heterogenous processors with a few general purpose multi-cores, and an on-chip massively parallel coprocessor, building on GPU  ... 
doi:10.4230/oasics.wcet.2012.48 dblp:conf/wcet/Lisper12 fatcat:jjt46t6ofbcmxpavfje2y4aq6y

Computing Strongly Connected Components in Parallel on CUDA

Jiri Barnat, Petr Bauch, Lubos Brim, Milan Ceška
2011 2011 IEEE International Parallel & Distributed Processing Symposium  
The problem of decomposing a directed graph into its strongly connected components is a fundamental graph problem inherently present in many scientific and commercial applications.  ...  We also experimentally demonstrate that with a single GTX 480 GPU card we can easily outperform the optimal serial CPU implementation by an order of magnitude in most cases, 40 times on some sufficiently  ...  A strongly connected component (SCC) is a maximal strongly connected set C ⊆ V , i.e. such that no C with C C ⊆ V is strongly connected.  ... 
doi:10.1109/ipdps.2011.59 dblp:conf/ipps/BarnatBBC11 fatcat:zt2bc6paqbfotk3zduuinkqfi4

Elastic deep learning in multi-tenant GPU cluster [article]

Yidi Wu, Kaihao Ma, Xiao Yan, Zhi Liu, Zhenkun Cai, Yuzhen Huang, James Cheng, Han Yuan, Fan Yu
2019 arXiv   pre-print
Elasticity can benefit multi-tenant GPU cluster management in many ways, e.g., achieving various scheduling objectives (e.g., job throughput, job completion time, GPU efficiency) according to cluster load  ...  ., the ability to dynamically adjust the parallelism (number of GPUs), for deep neural network (DNN) training.  ...  Experimental Results We evaluated EDL on a cluster with 8 machines each with a 96-core Intel CPU, 8 NVIDIA Tesla V100 SMX2 GPUs and 256 GB RAM. e machines are connected with 100 Gbps in niband.  ... 
arXiv:1909.11985v2 fatcat:sxtjszs2dnag5k63eczyeusub4

A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks

Marco Frasca, Giuliano Grossi, Jessica Gliozzo, Marco Mesiti, Marco Notaro, Paolo Perlasca, Alessandro Petrini, Giorgio Valentini
2018 BMC Bioinformatics  
The key strategy to speed up the computations is to partition nodes into independent sets so as to process each set in parallel by exploiting the power of GPU accelerators.  ...  Connections represent functional or genetic similarity between entities, while the labellings often are highly unbalanced, that is one class is largely under-represented: for instance in the automated  ...  All networks have one large connected component, with human and mouse networks with one or more smaller connected components.  ... 
doi:10.1186/s12859-018-2301-4 pmid:30367594 fatcat:gzrxnqd2r5deljttytqthe5oai

Scalable high-performance parallel design for Network Intrusion Detection Systems on many-core processors

Haiyang Jiang, Guangxing Zhang, Gaogang Xie, Kave Salamatian, Laurent Mathy
2013 Architectures for Networking and Communications Systems  
Both hardware accelerated and parallel software-based NIDS solutions, based on commodity multi-core and GPU processors, have been proposed to overcome these challenges.  ...  We also design a hybrid load balancing scheme, using both ruleset and flow space partitioning.  ...  Comparing with current proposed NIDS systems on GPU, our parallel design on TILERAGX36 achieve a throughput per dollar that is three fold. The obtained performance looks very promising.  ... 
doi:10.1109/ancs.2013.6665196 dblp:conf/ancs/JiangZXSM13 fatcat:gwjyh6lvyvdjvojipv5z34vvxy

Parallelizing Combinatorial Optimization Heuristics with GPUs

Mohammad Harun Rashid, Lixin Tao
2018 Advances in Science, Technology and Engineering Systems  
Our proposed work aims to design an efficient GPU framework for parallelizing optimization heuristics by focusing on the followings: distribution of data processing efficiently between GPU and CPU, efficient  ...  with our GPU framework provides with higher quality solutions within a reasonable time.  ...  number of connections between the components.  ... 
doi:10.25046/aj030635 fatcat:566v2rey2bgyhfrjqqjvdnpihq

Midpoint routing algorithms for Delaunay triangulations

Weisheng Si, Albert Y. Zomaya
2010 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)  
The system efficiency is strongly connected to the global object sharing profile that determines the overall communication cost.  ...  All components are implemented by scalable parallel algorithms.  ...  We also study the problems of computing the connected and biconnected components of a graph, minimum spanning tree of a connected graph and ear decomposition of a biconnected graph.  ... 
doi:10.1109/ipdps.2010.5470471 dblp:conf/ipps/SiZ10 fatcat:yuchdc4zp5borm5vs7j4rqgmzy

A Perspective on Safety and Real-Time Issues for GPU Accelerated ADAS

Ignacio Sanudo Olmedo, Nicola Capodieci, Roberto Cavicchioli
2018 IECON 2018 - 44th Annual Conference of the IEEE Industrial Electronics Society  
This is made possible by exploiting the general purpose nature of today's GPUs, as such devices are known to express unprecedented performance per watt on generic embarrassingly parallel workloads (as  ...  For many critical applications such as pedestrian detection, line following, and path planning the Graphic Processing Unit (GPU) is the most popular choice for obtaining orders of magnitude increases in  ...  Generally, achieving FFI between partitions allows deploying software components with different ASIL (Automotive Safety Integrity Level) grades.  ... 
doi:10.1109/iecon.2018.8591540 dblp:conf/iecon/OlmedoCC18 fatcat:ih6yuaqxsvhypo2dzneoql6g3a

GPU-Based Graph Decomposition into Strongly Connected and Maximal End Components [chapter]

Anton Wijs, Joost-Pieter Katoen, Dragan Bošnački
2014 Lecture Notes in Computer Science  
This paper presents parallel algorithms for component decomposition of graph structures on General Purpose Graphics Processing Units (GPUs).  ...  In particular, we consider the problem of decomposing sparse graphs into strongly connected components, and decomposing stochastic games (such as Markov decision processes) into maximal end components.  ...  The set of nodes C ⊆ V is a strongly connected component (SCC) of G iff G restricted to C, denoted G↑C, i.e., the graph G↑C = (C, (C × Δ(C) × C) ∩ E), is strongly connected.  ... 
doi:10.1007/978-3-319-08867-9_20 fatcat:phxsozhcq5ea7gtaczbfp4wqcq

Performance Analysis and Efficient Execution on Systems with multi-core CPUs, GPUs and MICs [article]

George Teodoro, Tahsin Kurc, Guilherme Andrade, Jun Kong, Renato Ferreira, Joel Saltz
2015 arXiv   pre-print
The performances of operations with regular data access are comparable or sometimes better on a MIC than that on a GPU.  ...  We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core - MIC) with a microscopy image analysis application.  ...  pixel (8-connected) Component Label with the same value Irregular, global Low Union-find Oliveira (29) Labeling each component of the mask Feature Computation Stage Color Decon- Separate multi-stained  ... 
arXiv:1505.03819v1 fatcat:g5x5gwaubrc5bbfl7pmlbepzb4

Application performance analysis and efficient execution on systems with multi-core CPUs, GPUs and MICs: a case study with microscopy image analysis

George Teodoro, Tahsin Kurc, Guilherme Andrade, Jun Kong, Renato Ferreira, Joel Saltz
2016 The international journal of high performance computing applications  
The performances of operations with regular data access are comparable or sometimes better on a MIC than that on a GPU.  ...  We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core-MIC) with a microscopy image analysis application.  ...  Section 3 presents the the programming models and parallelization strategies used to implement the core operations on GPUs, MICs, and multi-core CPUs.  ... 
doi:10.1177/1094342015594519 pmid:28239253 pmcid:PMC5319667 fatcat:tchtmfozwfc57hosfna43behoy
« Previous Showing results 1 — 15 out of 2,618 results