Filters








5,400 Hits in 9.3 sec

Approximate ad-hoc query engine for simulation data

Ghaleb Abdulla, Chuck Baldwin, Terence Critchlow, Roy Kamimura, Ida Lozares, Ron Musick, Nu Ai Tang, Byung S. Lee, Robert Snapp
2001 Proceedings of the first ACM/IEEE-CS joint conference on Digital libraries - JCDL '01  
To support queries over the spatial-temporal mesh structured data we are in the process of defining and implementing a grammar for MeshSQL  ...  The goal of this project is to reduce data storage requirements and access times while permitting ad-hoc queries using statistical and mathematical models of the data.  ...  Results imply that we can create a system that will support approximate ad-hoc queries over large data sets.  ... 
doi:10.1145/379437.379673 dblp:conf/jcdl/AbdullaBCKLMTLS01 fatcat:7e2ef4g3irckvhwnbv5rklvvje

BigSur: A System For the Management of Earth Science Data

Paul Brown, Michael Stonebraker
1995 Very Large Data Bases Conference  
Our prototype --called "BigSur" --is shown'in the context of its use by two geographically distributed scientific groups with demanding data storage and processing requirements.  ...  In this paper we present a prototype system for the management of earth science dam 'which is novel in that it takes a DBMS centric view of the the task.  ...  We intend to continue with the incremental development of BigSur, adding a number' of other user groups, and revisiting the vexatious to&of schema' design.  ... 
dblp:conf/vldb/BrownS95 fatcat:7tjiwrsfbnhp5ocxrmbqwbqqt4

Automatic Data Standardization for the Global Cryosphere Watch Data Portal

Mathias Bavay, Joel Fiddes, Øystein Godøy
2020 Data Science Journal  
The data portal web front end harvests the metadata necessary for its search engine through an OPeNDAP server so no manual editing of the medatadata is necessary.  ...  A processing engine converts raw data provided by the data producers into NetCDF-CF standard files with NetCDF Attribute Convention for Dataset Discovery (ACDD) metadata.  ...  Acknowledgements The authors would like to thank Dr. Charles Fierz and Ms. Rodica Nitu for their very valuable and long standing support.  ... 
doi:10.5334/dsj-2020-006 fatcat:36c5swbicncvxnerijm5mty52i

Organic Data Publishing: A Novel Approach to Scientific Data Sharing

Yolanda Gil, Varun Ratnakar, Paul C. Hanson
2012 International Semantic Web Conference  
reduces the burden of data sharing by enabling any scientist to contribute metadata, and 3) tracks and exposes credit for all contributors.  ...  Many scientists do not share their data due to the cost and lack of incentives of traditional approaches to data sharing.  ...  This research was supported in part by a grant from the National Science Foundation through award number IIS-1117281.  ... 
dblp:conf/semweb/GilRH12 fatcat:n2bsa7r4e5h3df256umiovvwsq

Toward data lakes as central building blocks for data management and analysis

Philipp Wieder, Hendrik Nolte
2022 Frontiers in Big Data  
Storing such massive amounts of raw data, however, has its very own challenges, spanning from the general data modeling, and indexing for concise querying to the integration of suitable and scalable compute  ...  To achieve this, contributions to data lake architectures, metadata models, data provenance, workflow support, and FAIR principles are investigated.  ...  Although the idea to build up a data lake in a post-hoc manner seems very promising for any larger institution, it is connected with large challenges.  ... 
doi:10.3389/fdata.2022.945720 pmid:36072823 pmcid:PMC9442782 fatcat:zt2c37h5jnct7hjxc4lepeo3ga

Managing scientific data

Anastasia Ailamaki
2011 Proceedings of the 2011 international conference on Management of data - SIGMOD '11  
managing the enormous amount of scientific data being collected is the key to scientific progress. though technology allows for the extreme collection rates of scientific data, processing is still performed  ...  with stale techniques developed for small data sets; efficient processing is necessary to be able to exploit the value of huge scientific data collections.  ...  Attempts to support exploratory ad hoc OLAP queries on large data sets, including wavelets, promises to enable fast, powerful analysis of scientific data. 21 A frequently used type of scientific data  ... 
doi:10.1145/1989323.1989433 dblp:conf/sigmod/Ailamaki11 fatcat:3cxviarugnfoldnqz3csgjczqu

Managing scientific data

Anastasia Ailamaki, Verena Kantere, Debabrata Dash
2010 Communications of the ACM  
managing the enormous amount of scientific data being collected is the key to scientific progress. though technology allows for the extreme collection rates of scientific data, processing is still performed  ...  with stale techniques developed for small data sets; efficient processing is necessary to be able to exploit the value of huge scientific data collections.  ...  Attempts to support exploratory ad hoc OLAP queries on large data sets, including wavelets, promises to enable fast, powerful analysis of scientific data. 21 A frequently used type of scientific data  ... 
doi:10.1145/1743546.1743568 fatcat:vw57d23aorchtntng6jlrccs6y

Data Vault: providing simple web access to NRAO data archives

Ron DuPlain, John Benson, Eric Sessoms
2008 Advanced Software and Control for Astronomy II  
This application supports plug-ins for linking data to additional web tools and services, including Google Sky.  ...  In late 2007, the National Radio Astronomy Observatory (NRAO) launched Data Vault, a feature-rich web application for simplified access to NRAO data archives.  ...  Remijan at NRAO for scientific support.  ... 
doi:10.1117/12.789402 fatcat:jugjzfxmubh6hozuhre2a5xvr4

An extensible information model for shared scientific data collections

Amarnath Gupta, Chaitanya Baru
1999 Future generations computer systems  
An information model is defined to support sharing of composite-media scientific data. The model consists of data objects and links.  ...  Data objects are associated with descriptors which contain all the metadata related to the object.  ...  ad hoc querying capability.  ... 
doi:10.1016/s0167-739x(99)00031-x fatcat:7ni22pf7v5d6dlldagwcwyuqde

ProvDB: Provenance-enabled Lifecycle Management of Collaborative Data Analysis Workflows

Hui Miao, Amol Deshpande
2018 IEEE Data Engineering Bulletin  
novel querying and analysis capabilities for simplifying bookkeeping and debugging tasks for data analysts; and enables a rich new set of capabilities like identifying flaws in the data science process  ...  Current data science systems mainly focus on specific steps in the process such as training machine learning models, scaling to large data volumes, or serving the data or the models, while the issues of  ...  This however is very challenging, because of the fundamentally ad hoc nature of collaborative data science.  ... 
dblp:journals/debu/0001D18 fatcat:ybx7j6hvnjanbjnmz7wyrmb2te

Data Profiling

Ziawasch Abedjan, Lukasz Golab, Felix Naumann
2017 Proceedings of the 2017 ACM International Conference on Management of Data - SIGMOD '17  
Profiling activities range from ad-hoc approaches, such as eye-balling random subsets of the data or formulating aggregation queries, to systematic inference of structural information and statistics of  ...  One of the crucial requirements before consuming datasets for any application is to understand the dataset at hand and its metadata. The process of metadata discovery is known as data profiling.  ...  The profiling tasks applied here are usually of linear complexity to cope with the very large volumes in typical data lakes.  ... 
doi:10.1145/3035918.3054772 dblp:conf/sigmod/AbedjanGN17 fatcat:dwqqb6w6pzfu7l5nkz3m67oxsq

ERMrest: an entity-relationship data storage service for web-based, data-oriented collaboration [article]

Karl Czajkowski, Carl Kesselman, Robert Schuler, Hongsuda Tangmunarunkit
2016 arXiv   pre-print
Scientific discovery is increasingly dependent on a scientist's ability to acquire, curate, integrate, analyze, and share large and diverse collections of data.  ...  Common systems for managing file or asset metadata hide their inherent relational structures, while traditional relational database systems do not extend to the distributed collaborative environment often  ...  ACKNOWLEDGMENT The authors would like to thank Serban Voinea for his contributions to ERMrest and IOBox development and Anoop Kumar and Alejendro Bugacov for their work on the GPCR project.  ... 
arXiv:1610.06044v1 fatcat:oipknmkxivbk3pif5i3wofhspu

Realising Data-Centric Scientific Workflows with Provenance-Capturing on Data Lakes

Hendrik Nolte, Philipp Wieder
2022 Data Intelligence  
Although the necessity for a logical and a physical organisation of data lakes in order to meet those requirements is widely recognized, no concrete guidelines are yet provided.  ...  This paper discusses how FAIR Digital Objects can be used in a novel approach to organize a data lake based on data types instead of zones, how they can be used to abstract the physical implementation,  ...  ACKNOWLEDGEMENTS We acknowledge funding by the "Niedersächsisches Vorab" funding line of the Volkswagen Foundation.  ... 
doi:10.1162/dint_a_00141 fatcat:t2w5qaoobfh6rnnofzr65lng4a

Principles of Distributed Data Management in 2020? [chapter]

Patrick Valduriez
2011 Lecture Notes in Computer Science  
Although they do well in terms of consistency/flexibility/performance trade-offs for specific applications, they seem to be ad-hoc and might hurt data interoperability.  ...  Today, to support the requirements of important data-intensive applications (e.g. social networks, web data analytics, scientific applications, etc.), new distributed data management techniques and systems  ...  Although they do well in terms of consistency-flexibility-performance trade-offs for specific applications, they seem to be ad-hoc and might hurt data interoperability.  ... 
doi:10.1007/978-3-642-23088-2_1 fatcat:vam6u36rozduhj7y6kuj6ephyu

A Mobile Data Management Architecture for Interoperability of Resource and Context Data

Andreas Brodt, Oliver Schiller, Sailesh Sathish, Bernhard Mitschang
2011 2011 IEEE 12th International Conference on Mobile Data Management  
Due to its robustness, we prefer Random Replication for medium and large ad-hoc smart spaces.  ...  Our evaluation showed that some of the modifications for our deep integration approach do create some extra overhead for queries selecting very large amounts of spatial features, but we observed excellent  ... 
doi:10.1109/mdm.2011.81 dblp:conf/mdm/BrodtSSM11 fatcat:l3z2apgqxffbbmmcsuztqozm54
« Previous Showing results 1 — 15 out of 5,400 results