BD-CATS

Md. Mostofa Ali Patwary, Pradeep Dubey, Suren Byna, Nadathur Rajagopalan Satish, Narayanan Sundaram, Zarija Lukić, Vadim Roytershteyn, Michael J. Anderson, Yushu Yao, Prabhat
2015 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15  
Modern cosmology and plasma physics codes are now capable of simulating trillions of particles on petascale systems. Each timestep output from such simulations is on the order of 10s of TBs. Summarizing and analyzing raw particle data is challenging, and scientists often focus on density structures, whether in the real 3D space, or a high-dimensional phase space. In this work, we develop a highly scalable version of the clustering algorithm DBSCAN, and apply it to the largest datasets produced
more » ... y state-of-the-art codes. Our system, called BD-CATS, is the first one capable of performing end-to-end analysis at trillion particle scale (including: loading the data, geometric partitioning, computing kd-trees, performing clustering analysis, and storing the results). We show analysis of 1.4 trillion particles from a plasma physics simulation, and a 10,240 3 particle cosmological simulation, utilizing ∼100,000 cores in 30 minutes. BD-CATS is helping infer mechanisms behind particle acceleration in plasma physics and holds promise for qualitatively superior clustering in cosmology. Both of these results were previously intractable at the trillion particle scale.
doi:10.1145/2807591.2807616 dblp:conf/sc/PatwaryBSSLRAYP15 fatcat:xyxhtzg22bex5nch6aa5fbbo7u