A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2011; you can also visit the original URL.
The file type is application/pdf
.
Filters
Patterns on the Connected Components of Terabyte-Scale Graphs
2010
2010 IEEE International Conference on Data Mining
In this work, we study patterns in connected components of large, real-world graphs. ...
First, we study one of the largest static Web graphs with billions of nodes and edges and analyze the regularities among the connected components using GFD(Graph Fractal Dimension) as our main tool. ...
The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on. ...
doi:10.1109/icdm.2010.121
dblp:conf/icdm/KangMAF10
fatcat:6s4whqtxfndufjaz44r3rvz3ty
Pegasus: Mining billion-scale graphs in the cloud
2012
2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
We ran experiments for PEGASUS on M45, one of the largest HADOOP clusters in the world. We report our findings on several real graphs with billions of nodes and edges. ...
We present PEGASUS, the first opensource, peta-scale graph mining library, for the HADOOP platform (open-source implementation of MAPREDUCE). ...
As we are only at the dawn of the era of big data, many exciting research directions await us. ...
doi:10.1109/icassp.2012.6289127
dblp:conf/icassp/KangCF12
fatcat:gigzjxt5zbbdlint5omcmc4tgm
A Multiscale Parallel Computing Architecture for Automated Segmentation of the Brain Connectome
2012
IEEE Transactions on Biomedical Engineering
The Brain Connectome project is a multi-institution project aimed at creating a graph of all connections between brain cells to better understand the brain circuitry and to explore the causes of neurodegenerative ...
We detail the methodology that has led to our computational architecture and report our first results on our 19-Terabyte 3D dataset of the visual cortex. ...
ACKNOWLEDGMENT The authors would like to thank W.C. Lee Allen, D.G.C. Hildebrand, H.S. Kim, and S. Butterfield for valuable discussions regarding transmission electron microscopy, and V. Bonin, M.L. ...
doi:10.1109/tbme.2011.2168396
pmid:21926011
pmcid:PMC4518548
fatcat:lndcrv7utzfbtdkf2ihqlxcaqe
OPAvion consists of three modules: (1) The Summarization module (Pegasus) operates off-line on massive, diskresident graphs and computes graph statistics, like PageRank scores, connected components, degree ...
Given a large graph with millions or billions of nodes and edges, like a who-follows-whom Twitter graph, how do we scalably compute its statistics, summarize its patterns, spot anomalies, visualize and ...
The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory ...
doi:10.1145/2213836.2213941
dblp:conf/sigmod/AkogluCKKF12
fatcat:oj7arrwbvzf33ivfphz2llib4e
Big graph mining
2013
SIGKDD Explorations
Our findings include anomalous spikes in the connected component size distribution, the 7 degrees of separation in a Web graph, and anomalous adult advertisers in the who-follows-whom Twitter social network ...
How do we find patterns and anomalies in very large graphs with billions of nodes and edges? How to mine such big graphs efficiently? ...
The views and conclusions are those of the authors and should not be interpreted as representing the official policies, of the U.S. ...
doi:10.1145/2481244.2481249
fatcat:fzidqzmctndj3nxh2qw55txyuu
8 of a Terabyte), one of the largest public graphs ever analyzed. ...
graphs, that runs on the top of the HADOOP/MAPREDUCE system, with excellent scale-up on the number of available machines (b) We run HADI on several real world datasets including YahooWeb (6B edges, 1/ ...
for the web graph and access to the M45, and Adriano A. Paterlini for feedback. The opinions expressed are those of the authors and do not necessarily reflect the views of the funding agencies. ...
doi:10.1145/1921632.1921634
fatcat:lari6gn7p5gpbeag4lfl6a4kxm
Radius Plots for Mining Tera-byte Scale Graphs: Algorithms, Patterns, and Observations
[chapter]
2010
Proceedings of the 2010 SIAM International Conference on Data Mining
runs on the top of the HADOOP /MAPREDUCE system, with excellent scale-up on the number of available machines (b) We run HADI on several real world datasets including YahooWeb (6B edges, 1/8 of a Terabyte ...
), one of the largest public graphs ever analyzed. ...
Acknowledgments This work was partially funded by the National Science ...
doi:10.1137/1.9781611972801.48
dblp:conf/sdm/KangTAFL10
fatcat:iysiqlff7fekzj5w63czqxvs4m
Polonium: Tera-Scale Graph Mining and Inference for Malware Detection
[chapter]
2011
Proceedings of the 2011 SIAM International Conference on Data Mining
We evaluated Polonium with a billion-node graph constructed from the largest file submissions dataset ever published (60 terabytes). ...
We present Polonium, a novel Symantec technology that detects malware through large-scale graph inference. ...
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding ...
doi:10.1137/1.9781611972818.12
dblp:conf/sdm/ChauNWWF11
fatcat:t7tqk7oe3zddbg6gwrh6n44noe
Scalable graph analysis tools for the connectomics community
[article]
2022
bioRxiv
pre-print
Existing community tools can perform such queries and analysis on smaller scale datasets, which can fit locally in memory, but the path to scaling remains unclear. ...
As dataset size and tissue diversity have grown, there is increasing interest in conducting comparative connectomics research, including rapidly querying and searching for recurring patterns of connectivity ...
ACKNOWLEDGEMENTS The authors thank the creators of the connectome datasets discussed here. ...
doi:10.1101/2022.06.01.494307
fatcat:366mxffddra6ji46v6qrudp67m
A Performance Prediction Framework for Data Intensive Applications on Large Scale Parallel Machines
[chapter]
1998
Lecture Notes in Computer Science
Application emulators provide a parameterized model of data access and computation patterns of the applications and enable changing of critical application components (input data partitioning, data declustering ...
Our suite of simulators model the I/O and communication subsystems with good accuracy and execute quickly on a high-performance workstation to allow performance prediction of large scale parallel machine ...
Acknowledgements We would like to thank Je Hollingsworth and Hyeonsang Eom for their invaluable discussions about performance prediction on large scale machines. ...
doi:10.1007/3-540-49530-4_18
fatcat:fzfwnlmvg5esrbsyfy3k6uufhu
Planetary-Scale Views on an Instant-Messaging Network
[article]
2008
arXiv
pre-print
We report on multiple aspects of the dataset and synthesized graph. We find that the graph is well-connected and robust to node removal. ...
We investigate on a planetary-scale the oft-cited report that people are separated by "six degrees of separation" and find that the average path length among Messenger users is 6.6. ...
Figure 19 : 19 Figure 19: (a) Clustering coefficient; (b) distribution of connected components. 99.9% of the nodes belong to the largest connected component. ...
arXiv:0803.0939v1
fatcat:afr5fj7vmnexbdbeza5f7xex2y
Mining large graphs: Algorithms, inference, and discoveries
2011
2011 IEEE 27th International Conference on Data Engineering
How do we find patterns and anomalies, on graphs with billions of nodes and edges, which do not fit in memory? How to use parallelism for such terabyte-scale graphs? ...
scales up well with the number of edges, as well as with the number of machines; and (c) experimental results on two private, as well as two of the largest publicly available graphs -the Web Graphs from ...
ACKNOWLEDGMENT The authors would like to thank YAHOO! for providing us with the web graph and access to the M45, and Brendan Meeder in CMU for providing Twitter data. ...
doi:10.1109/icde.2011.5767883
dblp:conf/icde/KangCF11
fatcat:5fkqg3g43fgunl2vh2w5f45yoi
Big graph mining for the web and social media
2014
Proceedings of the 7th ACM international conference on Web search and data mining - WSDM '14
Then we describe how to scale up these techniques to massive graphs with billions of nodes. ...
What are the patterns and anomalies in such massive graphs? How to design scalable algorithms to find them? What visual analytics techniques to use to make sense of such massive graphs? ...
Her research interests are in data mining, machine learning, and applied statistics with a focus on pattern mining, and anomaly and event detection in large dynamic data using graph mining and compression ...
doi:10.1145/2556195.2556198
dblp:conf/wsdm/KangAC14
fatcat:fbe7ciirlzd3xm42h3y67bw77i
Partitioning Strategy Selection for In-Memory Graph Pattern Matching on Multiprocessor Systems
[chapter]
2017
Lecture Notes in Computer Science
Pattern matching on large graphs is the foundation for a variety of application domains. ...
The continuously increasing size of the underlying graphs requires highly parallel in-memory graph processing engines that need to consider non-uniform memory access (NUMA) and concurrency issues to scale ...
Acknowledgments This work is partly funded within the Collaborative Research Center SFB 912 (HAEC). ...
doi:10.1007/978-3-319-64203-1_11
fatcat:5wgamlxqinanlphktayjn5ooh4
The parallelism motifs of genomic data analysis
2020
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
We identify some of the common computational patterns or 'motifs' that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns ...
Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory ...
A depth-first traversal starting from arbitrary k-mers compute the connected components of the graph which are linear sequences called contigs. ...
doi:10.1098/rsta.2019.0394
pmid:31955674
fatcat:kzujmq5u2refvhoovtb2ap5vha
« Previous
Showing results 1 — 15 out of 3,084 results