A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
High Performance Detection of Strongly Connected Components in Sparse Graphs on GPUs
2017
Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM'17
Detecting strongly connected components (SCC) has been broadly used in many real-world applications. ...
In this paper, we present a parallel SCC detection implementation on GPUs that achieves high performance on both synthetic and real-world graphs. ...
INTRODUCTION Strongly connected component (SCC) detection is a fundamental graph analysis problem that is pervasively present in many application domains. ...
doi:10.1145/3026937.3026941
dblp:conf/ppopp/LiCSFTY17
fatcat:2rnsb7br5fgrfmgmy2vvkuymq4
MGSim + MGMark: A Framework for Multi-GPU System Research
[article]
2018
arXiv
pre-print
We complement MGSim with MGMark, a suite of multi-GPU workloads that explores multi-GPU collaborative execution patterns. ...
We illustrate the novel simulation capabilities provided by our simulator through a case study exploring programming models based on a unified multi-GPU system (U-MGPU) and a discrete multi-GPU system ...
We also create a cross-GPU connection that connects the ACE of the first GPU with the CUs of all other GPUs. The DRAM banks in the multi-GPU systems are interleaved with a granularity of 4KB. ...
arXiv:1811.02884v3
fatcat:uqzjyera75dnnnpfeduess7qtq
Parallel Breadth First Search on GPU clusters
2014
2014 IEEE International Conference on Big Data (Big Data)
Hardware trends and manufacturing limits strongly imply that many-core devices, such as NVIDIA R GPUs and the Intel R Xeon Phi R , will become central components of such future systems. ...
We extend previous research on GPUs and on scalable graph processing on supercomputers and demonstrate that a high-performance parallel graph machine can be created using commodity GPUs and networking ...
-A strong and weak scaling study on up to 64 GPUs with an analysis of the strengths and weaknesses of our multi-GPU approach with respect to scalability. ...
doi:10.1109/bigdata.2014.7004219
dblp:conf/bigdataconf/FuDBBT14
fatcat:4znzla75ezho7fmry4xqtrtslq
In this paper, we present a multi-parallel intrusion detection architecture tailored for high speed networks. ...
To cope with the increased processing throughput requirements, our system parallelizes network traffic processing and analysis at three levels, using multi-queue NICs, multiple CPUs, and multiple GPUs. ...
Even though we focused on the parallelization of an intrusion detection system, we strongly believe that the proposed model can benefit a variety of other network monitoring applications, such as traffic ...
doi:10.1145/2046707.2046741
dblp:conf/ccs/VasiliadisPI11
fatcat:3ru3c7yct5bjblslqk3334kzwu
Towards Parallel Programming Models for Predictability
2012
Worst-Case Execution Time Analysis
Thus, a promising route for the future is to develop WCET analyses for data-parallel software running on GPUs. ACM Subject Classification C.3 Special-Purpose and Application-Based Systems ...
GPUs are increasingly used for high performance applications: we discuss a current GPU architecture, and we argue that it offers a parallel platform for compute-intensive applications for which it seems ...
A possible development of future hardware architecture is that we will see heterogenous processors with a few general purpose multi-cores, and an on-chip massively parallel coprocessor, building on GPU ...
doi:10.4230/oasics.wcet.2012.48
dblp:conf/wcet/Lisper12
fatcat:jjt46t6ofbcmxpavfje2y4aq6y
Computing Strongly Connected Components in Parallel on CUDA
2011
2011 IEEE International Parallel & Distributed Processing Symposium
The problem of decomposing a directed graph into its strongly connected components is a fundamental graph problem inherently present in many scientific and commercial applications. ...
We also experimentally demonstrate that with a single GTX 480 GPU card we can easily outperform the optimal serial CPU implementation by an order of magnitude in most cases, 40 times on some sufficiently ...
A strongly connected component (SCC) is a maximal strongly connected set C ⊆ V , i.e. such that no C with C C ⊆ V is strongly connected. ...
doi:10.1109/ipdps.2011.59
dblp:conf/ipps/BarnatBBC11
fatcat:zt2bc6paqbfotk3zduuinkqfi4
Elastic deep learning in multi-tenant GPU cluster
[article]
2019
arXiv
pre-print
Elasticity can benefit multi-tenant GPU cluster management in many ways, e.g., achieving various scheduling objectives (e.g., job throughput, job completion time, GPU efficiency) according to cluster load ...
., the ability to dynamically adjust the parallelism (number of GPUs), for deep neural network (DNN) training. ...
Experimental Results We evaluated EDL on a cluster with 8 machines each with a 96-core Intel CPU, 8 NVIDIA Tesla V100 SMX2 GPUs and 256 GB RAM. e machines are connected with 100 Gbps in niband. ...
arXiv:1909.11985v2
fatcat:sxtjszs2dnag5k63eczyeusub4
A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks
2018
BMC Bioinformatics
The key strategy to speed up the computations is to partition nodes into independent sets so as to process each set in parallel by exploiting the power of GPU accelerators. ...
Connections represent functional or genetic similarity between entities, while the labellings often are highly unbalanced, that is one class is largely under-represented: for instance in the automated ...
All networks have one large connected component, with human and mouse networks with one or more smaller connected components. ...
doi:10.1186/s12859-018-2301-4
pmid:30367594
fatcat:gzrxnqd2r5deljttytqthe5oai
Scalable high-performance parallel design for Network Intrusion Detection Systems on many-core processors
2013
Architectures for Networking and Communications Systems
Both hardware accelerated and parallel software-based NIDS solutions, based on commodity multi-core and GPU processors, have been proposed to overcome these challenges. ...
We also design a hybrid load balancing scheme, using both ruleset and flow space partitioning. ...
Comparing with current proposed NIDS systems on GPU, our parallel design on TILERAGX36 achieve a throughput per dollar that is three fold. The obtained performance looks very promising. ...
doi:10.1109/ancs.2013.6665196
dblp:conf/ancs/JiangZXSM13
fatcat:gwjyh6lvyvdjvojipv5z34vvxy
Parallelizing Combinatorial Optimization Heuristics with GPUs
2018
Advances in Science, Technology and Engineering Systems
Our proposed work aims to design an efficient GPU framework for parallelizing optimization heuristics by focusing on the followings: distribution of data processing efficiently between GPU and CPU, efficient ...
with our GPU framework provides with higher quality solutions within a reasonable time. ...
number of connections between the components. ...
doi:10.25046/aj030635
fatcat:566v2rey2bgyhfrjqqjvdnpihq
Midpoint routing algorithms for Delaunay triangulations
2010
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
The system efficiency is strongly connected to the global object sharing profile that determines the overall communication cost. ...
All components are implemented by scalable parallel algorithms. ...
We also study the problems of computing the connected and biconnected components of a graph, minimum spanning tree of a connected graph and ear decomposition of a biconnected graph. ...
doi:10.1109/ipdps.2010.5470471
dblp:conf/ipps/SiZ10
fatcat:yuchdc4zp5borm5vs7j4rqgmzy
A Perspective on Safety and Real-Time Issues for GPU Accelerated ADAS
2018
IECON 2018 - 44th Annual Conference of the IEEE Industrial Electronics Society
This is made possible by exploiting the general purpose nature of today's GPUs, as such devices are known to express unprecedented performance per watt on generic embarrassingly parallel workloads (as ...
For many critical applications such as pedestrian detection, line following, and path planning the Graphic Processing Unit (GPU) is the most popular choice for obtaining orders of magnitude increases in ...
Generally, achieving FFI between partitions allows deploying software components with different ASIL (Automotive Safety Integrity Level) grades. ...
doi:10.1109/iecon.2018.8591540
dblp:conf/iecon/OlmedoCC18
fatcat:ih6yuaqxsvhypo2dzneoql6g3a
GPU-Based Graph Decomposition into Strongly Connected and Maximal End Components
[chapter]
2014
Lecture Notes in Computer Science
This paper presents parallel algorithms for component decomposition of graph structures on General Purpose Graphics Processing Units (GPUs). ...
In particular, we consider the problem of decomposing sparse graphs into strongly connected components, and decomposing stochastic games (such as Markov decision processes) into maximal end components. ...
The set of nodes C ⊆ V is a strongly connected component (SCC) of G iff G restricted to C, denoted G↑C, i.e., the graph G↑C = (C, (C × Δ(C) × C) ∩ E), is strongly connected. ...
doi:10.1007/978-3-319-08867-9_20
fatcat:phxsozhcq5ea7gtaczbfp4wqcq
Performance Analysis and Efficient Execution on Systems with multi-core CPUs, GPUs and MICs
[article]
2015
arXiv
pre-print
The performances of operations with regular data access are comparable or sometimes better on a MIC than that on a GPU. ...
We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core - MIC) with a microscopy image analysis application. ...
pixel
(8-connected)
Component
Label with the same value
Irregular, global
Low
Union-find Oliveira (29)
Labeling
each component of the mask
Feature Computation Stage
Color Decon-
Separate multi-stained ...
arXiv:1505.03819v1
fatcat:g5x5gwaubrc5bbfl7pmlbepzb4
Application performance analysis and efficient execution on systems with multi-core CPUs, GPUs and MICs: a case study with microscopy image analysis
2016
The international journal of high performance computing applications
The performances of operations with regular data access are comparable or sometimes better on a MIC than that on a GPU. ...
We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core-MIC) with a microscopy image analysis application. ...
Section 3 presents the the programming models and parallelization strategies used to implement the core operations on GPUs, MICs, and multi-core CPUs. ...
doi:10.1177/1094342015594519
pmid:28239253
pmcid:PMC5319667
fatcat:tchtmfozwfc57hosfna43behoy
« Previous
Showing results 1 — 15 out of 2,618 results