7,589 Hits in 5.2 sec

Perspectives in astrophysical databases

Marco Frailis, Alessandro De Angelis, Vito Roberto
2004 Physica A: Statistical Mechanics and its Applications  
This asks for an approach to data management emphasizing the efficiency and simplicity of data access; efficiency is obtained using multidimensional access methods and simplicity is achieved by properly  ...  Astrophysics has become a domain extremely rich of scientific data. Data mining tools are needed for information extraction from such large datasets.  ...  An important issue, in large datasets, is the efficiency and scalability of the clustering algorithms with respect to the dataset size.  ... 
doi:10.1016/j.physa.2004.02.024 fatcat:yldcq4xirrhr3auwkf3tin5ofy

Data Management and Mining in Astrophysical Databases

M. Frailis, A. De Angelis, V. Roberto
2003 arXiv   pre-print
An essential role in the astrophysical research will be assumed by automatic tools for information extraction from large datasets, i.e. data mining techniques, such as clustering and classification algorithms  ...  Clustering and classification techniques, on large datasets, pose additional requirements: computational and memory scalability with respect to the data size, interpretability and objectivity of clustering  ...  An important issue, in large datasets, is the efficiency and scalability of the clustering algorithms with respect to the dataset size.  ... 
arXiv:cs/0307032v2 fatcat:aau2mrliqjgovbkcfqzw77vxwi


Bin Fu, Kai Ren, Julio López, Eugene Fink, Garth Gibson
2010 Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10  
DiscFinder is a scalable approach for identifying large-scale astronomical structures, such as galaxy clusters, in massive observation and simulation astrophysics datasets.  ...  It is designed to operate on datasets with tens of billions of astronomical objects, even in the case when the dataset is much larger than the aggregate memory of compute cluster used for the processing  ...  Frameworks, such as Hadoop and Dryad, are commonly used for data-intensive applications on Internet-scale data, such as text analysis and indexing of web pages.  ... 
doi:10.1145/1851476.1851527 dblp:conf/hpdc/FuRLFG10 fatcat:ubwac4vekbb6lea4tth2gy3j2y

Massive Science with VO and Grids [article]

Robert Nichol, Christopher Miller , Brent Bryan, Alexander Gray, Jeff Schneider, Andrew Moore
2005 arXiv   pre-print
There is a growing need for massive computational resources for the analysis of new astronomical datasets.  ...  We also discuss other applications including the determination of the XMM Cluster Survey selection function and the construction of new WMAP maps.  ...  GS thanks the VOTech and University of Edinburgh for his funding (see for details).  ... 
arXiv:astro-ph/0510844v1 fatcat:hkhrl4v6dbaqdbkysbfwfwbxwi

Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey [article]

Alexander S. Szalay, Peter Kunszt, Ani Thakar, Jim Gray
1999 arXiv   pre-print
Hashing techniques allow efficient clustering and pair-wise comparison algorithms that parallelize nicely. Randomly sampled subsets allow debugging otherwise large queries at the desktop.  ...  The archive will enable astronomers to explore the data interactively. Data access will be aided by a multidimensional spatial index and other indices. The data will be partitioned in many ways.  ...  Acknowledgements We would like to acknowledge support from the Astrophysical Research Consortium, the HSF, NASA and Intel's Technology for Education 2000 program, in particular George Bourianoff (Intel  ... 
arXiv:cs/9907009v1 fatcat:wor6w277angadcr32jkx4iz2bm

Data-Mining a Large Digital Sky Survey: From the Challenges to the Scientific Results [article]

S. G. Djorgovski, R. R. Gal Palomar Observatory, Observatorio Nacional CNPq, Rio de Janeiro)
1997 arXiv   pre-print
We describe some of the specific scientific problems posed by the data, including searches for distant quasars and clusters of galaxies, and the data-mining techniques we are exploring in addressing them  ...  In addition to the searches for known types of objects in this data base, these techniques may also offer the possibility of discovering previously unknown, rare types of astronomical objects.  ...  Weir and U. Fayyad made important initial contributions to this project. We also thank J. Kennefick, J. Darling, and V. Desai for their contributions to the quasar search project.  ... 
arXiv:astro-ph/9708218v1 fatcat:yriagjorx5cxthtcyd3zozc2va

The Sloan Digital Sky Survey and its Archive

Alexander S. Szalay, Peter Kunszt, Anirudha Thakar, Jim Gray, Don Slutz
1999 arXiv   pre-print
Hashing techniques allow efficient clustering and pairwise comparison algorithms. Randomly sampled subsets allow debugging otherwise large queries at the desktop.  ...  Central servers will operate a data pump that supports sweeping searches that touch most of the data.  ...  We would like to acknowledge support from the Astrophysical Research Consortium, the HSF, NASA and Intel's Technology for Education 2000 program, in particular George Bourianoff (Intel).  ... 
arXiv:astro-ph/9912382v1 fatcat:sowjqxp4dfamzhrpwzehqbpp3a

ASTROIDE: A Unified Astronomical Big Data Processing Engine over Spark

Mariem BRAHEM, Karine Zeitouni, Laurent Yeh
2018 IEEE Transactions on Big Data  
The next decade promises to be an exciting time for astronomers. Large volumes of astronomical data are continuously collected from highly productive space missions.  ...  Recognizing the need to better handle astronomical datasets, we designed ASTROIDE, a distributed data server for astronomical data.  ...  This work has made use of data from the European Space Agency (ESA) mission GAIA (https://www.cosmos., processed by the GAIA Data Processing and Analysis Consortium (DPAC, https://www.cosmos.esa  ... 
doi:10.1109/tbdata.2018.2873749 fatcat:vda3gtes45g7bgm53kc33kzm3m

Scalable Clustering Algorithm for N-Body Simulations in a Shared-Nothing Cluster [chapter]

YongChul Kwon, Dylan Nunley, Jeffrey P. Gardner, Magdalena Balazinska, Bill Howe, Sarah Loebman
2010 Lecture Notes in Computer Science  
In this paper, we address the above two challenges by describing dFoF, an algorithm for scalable, parallel clustering of N-body simulation results in astrophysics.  ...  Clustering algorithms in particular have been difficult to adapt to these shared nothing parallel data processing frameworks, for two reasons.  ...  We are also grateful to the Dryad and DryadLINQ teams, MSR-SV, and Microsoft External Research for providing us with an alpha release of Dryad and DryadLINQ and for their support in installing and running  ... 
doi:10.1007/978-3-642-13818-8_11 fatcat:2akf3lx5qffupirf4hkul7ol7a

Clustered Hierarchical Entropy-Scaling Search of Astronomical and Biological Data [article]

Najib Ishaq, George Student, Noah M. Daniels
2019 arXiv   pre-print
We present a hierarchical search algorithm for such data sets that takes advantage of particular geometric properties apparent in both astronomical and biological data sets, namely the metric entropy and  ...  CHESS also allows for implicit data compression, which we demonstrate on the APOGEE data set. We also discuss an extension allowing for efficient k-nearest neighbors search.  ...  William Yu for suggesting the extension of entropy-scaling search to a hierarchical paradigm, and Tom Howard and Matthew Daily for helpful discussions.  ... 
arXiv:1908.08551v2 fatcat:rxatdd263jc6nhxfxttz7cd4y4

Searching the sky with CONFIGR-STARS

Gail A. Carpenter, Arun K. Ravindran
2011 Neural Networks  
Further studies would test CONFIGR-STARS and algorithm variations applied to very large starmaps and to other technologies that may employ geometric signatures.  ...  Open-source code, data, and demos are available from .  ...  by CELEST, an NSF Science of Learning Center (SBE-0354378).  ... 
doi:10.1016/j.neunet.2010.10.007 pmid:21094022 fatcat:cfrymfmilraphpuk4t7s5v6ufy

Pattern Recognition in Time Series [chapter]

2012 Advances in Machine Learning and Data Mining for Astronomy  
Most classic data mining algorithms do not perform or scale well on time series data.  ...  , pose challenges that render classic data mining algorithms ineffective and inefficient for time series.  ...  It plays an important role in indexing and similarity search, since it's the essential key to guarantee no false dismissal of results.  ... 
doi:10.1201/b11822-36 fatcat:yfcgyog5cvg6dcopfv2sq6jsvy

Survey of Object-Based Data Reduction Techniques in Observational Astronomy

Szymon Łukasik, André Moitinho, Piotr A. Kowalski, António Falcão, Rita A. Ribeiro, Piotr Kulczycki
2016 Open Physics  
The main goal of this article is to describe existing datasets on which algorithms are frequently tested, to characterize and classify available data reduction algorithms and identify promising solutions  ...  AbstractDealing with astronomical observations represents one of the most challenging areas of big data analytics.  ...  Volume corresponds to both large number of instances and characteristics (features), velocity is related to dynamics of the data flow, and finally, variety stands for the broad range of data types and  ... 
doi:10.1515/phys-2016-0064 fatcat:sara5rtxgvbznhnxsynpqkst3m

Execution Analysis of Spatial Data Storage Indexing on Cloud Environment

Karthi S, Prabu S
2018 Scalable Computing : Practice and Experience  
Global index decreases the number of data accesses for range queries and thus improves efficiency.  ...  Bloom filter R-tree index in the Map-reduce for providing more efficiency than the existing approaches.  ...  The Map-Reduce technology has proved very effective for large scale structured, semi-structured and unstructured data, for information processing and retrieval.  ... 
doi:10.12694/scpe.v19i4.1421 fatcat:ewe52drvqjgldoqu3vhipujzta

A Survey on Trajectory Big Data Processing

Amina Belhassena
2018 International Journal of Performability Engineering  
Therefore, large-scale trajectory data has received increasing attention in research fields as well as in industry.  ...  MapReduce, Hadoop, and Spark. Furthermore, this paper reviews an extensive collection of existing applications of movement objects, including trajectory data mining and frequent trajectory.  ...  of Heilongjiang Providence LC2016026 and MOE-Microsoft Key Laboratory of Natural Language Processing and Speech, Harbin Institute of Technology.  ... 
doi:10.23940/ijpe.18.02.p13.320333 fatcat:m74w3cfajrbzpamzpghfyrm6am
« Previous Showing results 1 — 15 out of 7,589 results