3,084 Hits in 2.4 sec

Patterns on the Connected Components of Terabyte-Scale Graphs

U. Kang, Mary McGlohon, Leman Akoglu, Christos Faloutsos
2010 2010 IEEE International Conference on Data Mining  
In this work, we study patterns in connected components of large, real-world graphs.  ...  First, we study one of the largest static Web graphs with billions of nodes and edges and analyze the regularities among the connected components using GFD(Graph Fractal Dimension) as our main tool.  ...  The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.  ... 
doi:10.1109/icdm.2010.121 dblp:conf/icdm/KangMAF10 fatcat:6s4whqtxfndufjaz44r3rvz3ty

Pegasus: Mining billion-scale graphs in the cloud

U Kang, Duen Horng Chau, Christos Faloutsos
2012 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
We ran experiments for PEGASUS on M45, one of the largest HADOOP clusters in the world. We report our findings on several real graphs with billions of nodes and edges.  ...  We present PEGASUS, the first opensource, peta-scale graph mining library, for the HADOOP platform (open-source implementation of MAPREDUCE).  ...  As we are only at the dawn of the era of big data, many exciting research directions await us.  ... 
doi:10.1109/icassp.2012.6289127 dblp:conf/icassp/KangCF12 fatcat:gigzjxt5zbbdlint5omcmc4tgm

A Multiscale Parallel Computing Architecture for Automated Segmentation of the Brain Connectome

S. Jaume, K. Knobe, Ryan R. Newton, F. Schlimbach, M. Blower, R. C. Reid
2012 IEEE Transactions on Biomedical Engineering  
The Brain Connectome project is a multi-institution project aimed at creating a graph of all connections between brain cells to better understand the brain circuitry and to explore the causes of neurodegenerative  ...  We detail the methodology that has led to our computational architecture and report our first results on our 19-Terabyte 3D dataset of the visual cortex.  ...  ACKNOWLEDGMENT The authors would like to thank W.C. Lee Allen, D.G.C. Hildebrand, H.S. Kim, and S. Butterfield for valuable discussions regarding transmission electron microscopy, and V. Bonin, M.L.  ... 
doi:10.1109/tbme.2011.2168396 pmid:21926011 pmcid:PMC4518548 fatcat:lndcrv7utzfbtdkf2ihqlxcaqe


Leman Akoglu, Duen Horng Chau, U. Kang, Danai Koutra, Christos Faloutsos
2012 Proceedings of the 2012 international conference on Management of Data - SIGMOD '12  
OPAvion consists of three modules: (1) The Summarization module (Pegasus) operates off-line on massive, diskresident graphs and computes graph statistics, like PageRank scores, connected components, degree  ...  Given a large graph with millions or billions of nodes and edges, like a who-follows-whom Twitter graph, how do we scalably compute its statistics, summarize its patterns, spot anomalies, visualize and  ...  The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory  ... 
doi:10.1145/2213836.2213941 dblp:conf/sigmod/AkogluCKKF12 fatcat:oj7arrwbvzf33ivfphz2llib4e

Big graph mining

U. Kang, Christos Faloutsos
2013 SIGKDD Explorations  
Our findings include anomalous spikes in the connected component size distribution, the 7 degrees of separation in a Web graph, and anomalous adult advertisers in the who-follows-whom Twitter social network  ...  How do we find patterns and anomalies in very large graphs with billions of nodes and edges? How to mine such big graphs efficiently?  ...  The views and conclusions are those of the authors and should not be interpreted as representing the official policies, of the U.S.  ... 
doi:10.1145/2481244.2481249 fatcat:fzidqzmctndj3nxh2qw55txyuu


U. Kang, Charalampos E. Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec
2011 ACM Transactions on Knowledge Discovery from Data  
8 of a Terabyte), one of the largest public graphs ever analyzed.  ...  graphs, that runs on the top of the HADOOP/MAPREDUCE system, with excellent scale-up on the number of available machines (b) We run HADI on several real world datasets including YahooWeb (6B edges, 1/  ...  for the web graph and access to the M45, and Adriano A. Paterlini for feedback. The opinions expressed are those of the authors and do not necessarily reflect the views of the funding agencies.  ... 
doi:10.1145/1921632.1921634 fatcat:lari6gn7p5gpbeag4lfl6a4kxm

Radius Plots for Mining Tera-byte Scale Graphs: Algorithms, Patterns, and Observations [chapter]

U Kang, Charalampos E. Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec
2010 Proceedings of the 2010 SIAM International Conference on Data Mining  
runs on the top of the HADOOP /MAPREDUCE system, with excellent scale-up on the number of available machines (b) We run HADI on several real world datasets including YahooWeb (6B edges, 1/8 of a Terabyte  ...  ), one of the largest public graphs ever analyzed.  ...  Acknowledgments This work was partially funded by the National Science  ... 
doi:10.1137/1.9781611972801.48 dblp:conf/sdm/KangTAFL10 fatcat:iysiqlff7fekzj5w63czqxvs4m

Polonium: Tera-Scale Graph Mining and Inference for Malware Detection [chapter]

Duen Horng "Polo" Chau, Carey Nachenberg, Jeffrey Wilhelm, Adam Wright, Christos Faloutsos
2011 Proceedings of the 2011 SIAM International Conference on Data Mining  
We evaluated Polonium with a billion-node graph constructed from the largest file submissions dataset ever published (60 terabytes).  ...  We present Polonium, a novel Symantec technology that detects malware through large-scale graph inference.  ...  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding  ... 
doi:10.1137/1.9781611972818.12 dblp:conf/sdm/ChauNWWF11 fatcat:t7tqk7oe3zddbg6gwrh6n44noe

Scalable graph analysis tools for the connectomics community [article]

Jordan K. Matelsky, Erik C. Johnson, Brock Wester, William Gray-Roncal
2022 bioRxiv   pre-print
Existing community tools can perform such queries and analysis on smaller scale datasets, which can fit locally in memory, but the path to scaling remains unclear.  ...  As dataset size and tissue diversity have grown, there is increasing interest in conducting comparative connectomics research, including rapidly querying and searching for recurring patterns of connectivity  ...  ACKNOWLEDGEMENTS The authors thank the creators of the connectome datasets discussed here.  ... 
doi:10.1101/2022.06.01.494307 fatcat:366mxffddra6ji46v6qrudp67m

A Performance Prediction Framework for Data Intensive Applications on Large Scale Parallel Machines [chapter]

Mustafa Uysal, Tahsin M. Kurc, Alan Sussman, Joel Saltz
1998 Lecture Notes in Computer Science  
Application emulators provide a parameterized model of data access and computation patterns of the applications and enable changing of critical application components (input data partitioning, data declustering  ...  Our suite of simulators model the I/O and communication subsystems with good accuracy and execute quickly on a high-performance workstation to allow performance prediction of large scale parallel machine  ...  Acknowledgements We would like to thank Je Hollingsworth and Hyeonsang Eom for their invaluable discussions about performance prediction on large scale machines.  ... 
doi:10.1007/3-540-49530-4_18 fatcat:fzfwnlmvg5esrbsyfy3k6uufhu

Planetary-Scale Views on an Instant-Messaging Network [article]

Jure Leskovec, Eric Horvitz
2008 arXiv   pre-print
We report on multiple aspects of the dataset and synthesized graph. We find that the graph is well-connected and robust to node removal.  ...  We investigate on a planetary-scale the oft-cited report that people are separated by "six degrees of separation" and find that the average path length among Messenger users is 6.6.  ...  Figure 19 : 19 Figure 19: (a) Clustering coefficient; (b) distribution of connected components. 99.9% of the nodes belong to the largest connected component.  ... 
arXiv:0803.0939v1 fatcat:afr5fj7vmnexbdbeza5f7xex2y

Mining large graphs: Algorithms, inference, and discoveries

U Kang, Duen Horng Chau, Christos Faloutsos
2011 2011 IEEE 27th International Conference on Data Engineering  
How do we find patterns and anomalies, on graphs with billions of nodes and edges, which do not fit in memory? How to use parallelism for such terabyte-scale graphs?  ...  scales up well with the number of edges, as well as with the number of machines; and (c) experimental results on two private, as well as two of the largest publicly available graphs -the Web Graphs from  ...  ACKNOWLEDGMENT The authors would like to thank YAHOO! for providing us with the web graph and access to the M45, and Brendan Meeder in CMU for providing Twitter data.  ... 
doi:10.1109/icde.2011.5767883 dblp:conf/icde/KangCF11 fatcat:5fkqg3g43fgunl2vh2w5f45yoi

Big graph mining for the web and social media

U. Kang, Leman Akoglu, Duen Horng Chau
2014 Proceedings of the 7th ACM international conference on Web search and data mining - WSDM '14  
Then we describe how to scale up these techniques to massive graphs with billions of nodes.  ...  What are the patterns and anomalies in such massive graphs? How to design scalable algorithms to find them? What visual analytics techniques to use to make sense of such massive graphs?  ...  Her research interests are in data mining, machine learning, and applied statistics with a focus on pattern mining, and anomaly and event detection in large dynamic data using graph mining and compression  ... 
doi:10.1145/2556195.2556198 dblp:conf/wsdm/KangAC14 fatcat:fbe7ciirlzd3xm42h3y67bw77i

Partitioning Strategy Selection for In-Memory Graph Pattern Matching on Multiprocessor Systems [chapter]

Alexander Krause, Thomas Kissinger, Dirk Habich, Hannes Voigt, Wolfgang Lehner
2017 Lecture Notes in Computer Science  
Pattern matching on large graphs is the foundation for a variety of application domains.  ...  The continuously increasing size of the underlying graphs requires highly parallel in-memory graph processing engines that need to consider non-uniform memory access (NUMA) and concurrency issues to scale  ...  Acknowledgments This work is partly funded within the Collaborative Research Center SFB 912 (HAEC).  ... 
doi:10.1007/978-3-319-64203-1_11 fatcat:5wgamlxqinanlphktayjn5ooh4

The parallelism motifs of genomic data analysis

Katherine Yelick, Aydın Buluç, Muaaz Awan, Ariful Azad, Benjamin Brock, Rob Egan, Saliya Ekanayake, Marquita Ellis, Evangelos Georganas, Giulia Guidi, Steven Hofmeyr, Oguz Selvitopi (+2 others)
2020 Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences  
We identify some of the common computational patterns or 'motifs' that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns  ...  Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory  ...  A depth-first traversal starting from arbitrary k-mers compute the connected components of the graph which are linear sequences called contigs.  ... 
doi:10.1098/rsta.2019.0394 pmid:31955674 fatcat:kzujmq5u2refvhoovtb2ap5vha
« Previous Showing results 1 — 15 out of 3,084 results