13 Hits in 2.5 sec

Real-time collaborative analysis with (almost) pure SQL

Daniel Halperin, Francois Ribalet, Konstantin Weitz, Mak A. Saito, Bill Howe, E. Virginia Armbrust
2013 Proceedings of the 25th International Conference on Scientific and Statistical Database Management - SSDBM  
We consider a case study using SQL-as-a-Service to support "instant analysis" of weakly structured relational data at a multi-investigator science retreat.  ...  As a result, new science emerged from a meeting that was originally just a planning meeting.  ...  We would also like to thank David Meier for his helpful comments, as well as the anonymous referees.  ... 
doi:10.1145/2484838.2484880 dblp:conf/ssdbm/HalperinRWSHA13 fatcat:thyk4fjqjnhehfaleeabikmjea

Qualitative Analysis of the SQLShareWorkload for Session Segmentation

Verónika Peralta, Willeme Verdeaux, Yann Raimont, Patrick Marcel
2019 International Workshop on Data Warehousing and OLAP  
SQLShare is database-asa-service platform targeting scientists and data scientists with minimal database experience, whose workload was made available to the research community.  ...  We ran a few test over various query workloads to validate empirically our approach.  ...  The SQLShare workload is the result of a Multi-Year SQL-as-a-Service Experiment [9] , allowing any user with minimal database experience to upload their datasets on-line and manipulate them via SQL queries  ... 
dblp:conf/dolap/PeraltaVRM19 fatcat:hf2tuprzcbecdkri3y3ch4xt3a

Learning Analysis Behavior in SQL Workloads

Clement Moreau, Verónika Peralta
2021 International Workshop on Data Warehousing and OLAP  
SQLShare is a database-as-a-service platform targeting scientists and data scientists with minimal database experience, whose workload was made available to the research community.  ...  This paper presents a set of analyses aiming at better understanding the SQLShare workload [13] and learning users' analysis behavior.  ...  SQLShare The SQLShare workload is the result of a Multi-Year SQL-as-a-Service Experiment [13] , allowing any user with minimal database experience to upload their datasets on-line and manipulate them  ... 
dblp:conf/dolap/MoreauP21 fatcat:yejv6wtlungo5njeuwqmym364a

Facilitating SQL Query Composition and Analysis [article]

Zainab Zolaktaf, Mostafa Milani, Rachel Pottinger
2020 arXiv   pre-print
Empirical results show that the neural network models are more accurate in predicting the query error class, achieving a higher F-measure on classes with fewer samples as well as performing better on other  ...  These results are encouraging and confirm that SQL query workloads and data-driven machine learning methods can be leveraged to facilitate query composition and analysis.  ...  SQLShare Workload The SQLShare query workload [23] is the result of a multiyear deployment of a database-as-a-service platform, where users upload their data, write queries, and share their results.  ... 
arXiv:2002.09091v1 fatcat:bvvvrugpbvh2tmmohz32anf47m

ERMrest: an entity-relationship data storage service for web-based, data-oriented collaboration [article]

Karl Czajkowski, Carl Kesselman, Robert Schuler, Hongsuda Tangmunarunkit
2016 arXiv   pre-print
We present the design criteria, architecture, and service implementation, as well as describe an ecosystem of tools and services that we have created to integrate metadata into an end-to-end scientific  ...  To address these issues, we introduce ERMrest, a collaborative data management service which allows general entity-relationship modeling of metadata manipulated by RESTful access methods.  ...  It is perhaps more accurate to think of each data URL as representing a "query" resource, the results of which can be Columns projected from the joined tables "Experiment" AS "E" JOIN "Sample" AS "S" ON  ... 
arXiv:1610.06044v1 fatcat:oipknmkxivbk3pif5i3wofhspu

Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics [article]

Shrainik Jain, Bill Howe, Jiaqi Yan, Thierry Cruanes
2018 arXiv   pre-print
We find that these general approaches, when trained on a large corpus of SQL queries, provides a robust foundation for a variety of workload analysis tasks and database features, without requiring application-specific  ...  For index recommendation, we cluster the vector representations to compress large workloads with no loss in performance from the recommended index.  ...  We would also like to thank Louis M Burger and Doug Brown from Teradata for their feedback and helpful discussion on the topics covered in this paper.  ... 
arXiv:1801.05613v2 fatcat:ozdcnfrgpbci5fgjz42rqbhvt4

Building an Urban Data Science Summer Program at the University of Washington eScience Institute

Ariel Rokem, Cecilia Aragon, Anthony Arendt, Brittany Fiore-Gartland, Bryna Hazelton, Joseph Hellerstein, Bernease Herman, Bill Howe, Ed Lazowska, Micaela Parker, Valentina Staneva, Sarah Stone (+2 others)
2015 Zenodo  
In addition, we included six high school students who joined us from a separate program designed to expose young people to research activities and an undergraduate student who had already started working  ...  The teams worked in a shared studio space designed in part for this purpose, and participated in tutorials on relevant tools an [...]  ...  Students also learned about SQLShare: a Database-as-a-Service environment developed at UW aiming to increase uptake of relational database technology in the sciences.  ... 
doi:10.5281/zenodo.3934842 fatcat:jczru74d35djzoirxn3bvsbxqy

Demonstration of the Myria big data management service

Daniel Halperin, Andrew Whitaker, Shengliang Xu, Magdalena Balazinska, Bill Howe, Dan Suciu, Victor Teixeira de Almeida, Lee Lee Choo, Shumo Chu, Paraschos Koutris, Dominik Moritz, Jennifer Ortiz (+2 others)
2014 Proceedings of the 2014 ACM SIGMOD international conference on Management of data - SIGMOD '14  
From a web browser, Myria users can upload data, author efficient queries to process and explore the data, and debug correctness and performance issues.  ...  In this demonstration, we will showcase Myria, our novel cloud service for big data management and analytics designed to improve productivity.  ...  Based on our experience with SQLShare [3] , we believe that science users can write data analysis tasks in SQL.  ... 
doi:10.1145/2588555.2594530 dblp:conf/sigmod/HalperinACCKMORWWXBHS14 fatcat:goxwwe6yrnajpeaxpon33wwwau

Science in the cloud

Dennis Gannon, Dan Fay, Daron Green, Kenji Takeda, Wenming Yi
2014 Proceedings of the 5th ACM workshop on Scientific cloud computing - ScienceCloud '14  
Cloud computing, map reduce, scalable systems, platform as a service, infrastructure as a service, cloud programming models.  ...  Microsoft Research is now in its fourth year of awarding Windows Azure cloud resources to the academic community. As of April 2014, over 200 research projects have started.  ...  Lessons from streaming data to the cloud. Most of the early experience with streaming has been positive, but it is still early and more research and experimentation is needed.  ... 
doi:10.1145/2608029.2608030 dblp:conf/hpdc/GannonFGTY14 fatcat:oqyh3g66njamvdbwltdgf3lxim

HyperBench: A Benchmark and Tool for Hypergraphs and Empirical Findings [article]

Wolfgang Fischl, Georg Gottlob, Davide Mario Longo, Reinhard Pichler
2020 arXiv   pre-print
Given the increasing interest in using such decomposition methods in practice, a publicly accessible repository of decomposition software, as well as a large set of benchmarks, and a web-accessible workbench  ...  In addition, we describe a number of actual experiments we carried out with this new infrastructure.  ...  Georg Gottlob is a Royal Society Research Professor and acknowledges support by the Royal Society for the present work in the context of the project "RAISON DATA" (Project reference: RP\R1\201074).  ... 
arXiv:2009.01769v1 fatcat:sap3dxbrsfcnblj3amrrl7vyvm

QueryVis: Logic-based diagrams help users understand complicated SQL queries faster [article]

Aristotelis Leventidis, Jiahui Zhang, Cody Dunne, Wolfgang Gatterbauer, H.V. Jagadish, Mirek Riedewald
2020 pre-print
Moreover, we have evidence that our visual diagrams result in participants making fewer errors than with SQL.  ...  Understanding the meaning of existing SQL queries is critical for code maintenance and reuse. Yet SQL can be hard to read, even for expert users or the original creator of a query.  ...  ACKNOWLEDGMENTS This work was supported in part by a Khoury seed grant program, the National Science Foundation (NSF) under award numbers CAREER IIS-1762268 and ACI-1640575, and by U.S.  ... 
doi:10.1145/3318464.3389767 arXiv:2004.11375v1 fatcat:h5kyves6xjadtnql6qqyucfnka

Compilation-assisted performance acceleration for data analytics

Craig Mustard
Cross program memoization (CPM) is a technique to re-use results of prior computations across programs and users.  ...  However the sheer volume of data to be analyzed, demands of a multi-user operating environment, and limitations of general purpose processors make it challenging to perform these operations efficiently  ...  SQLShare [83] ran a long-term study on usage patterns of SQL-as-a-service and found that CPM could reduce total execution time by 37%, but most queries could either significantly benefit from sharing  ... 
doi:10.14288/1.0394560 fatcat:nqwrotqwm5arle7wdzyofu7n6e

Query-driven learning for automating exploratory analytics in large-scale data management systems

Fotis Savva
This dissertation is a first account of how the Query-Driven methodology can be effectively used to expedite the data exploration process focusing solely on extracting knowledge from queries and not from  ...  This work describes how Machine Learning can be used to expedite the data exploration process by (a) estimating the results of aggregate queries (b) explaining data spaces through interpretable Machine  ...  Acknowledgements Doing a PhD has been a long and fun journey, with moments of despair and moments of joy. It would not have been possible to complete this journey without my supervisors Dr.  ... 
doi:10.5525/gla.thesis.81907 fatcat:bykqzmdwp5d3dbiettxnfvndz4