512 Hits in 3.1 sec

Enabling SQL-based Training Data Debugging for Federated Learning [article]

Yejia Liu, Weiyuan Wu, Lampros Flokas, Jiannan Wang, Eugene Wu
2021 arXiv   pre-print
The SQL-based training data debugging framework has proved effective to fix this kind of issue in a non-federated learning setting.  ...  To overcome these limitations, we redesign our security protocol and propose Frog, a novel SQL-based training data debugging framework tailored for federated learning.  ...  The following summarizes our contributions: • We are the rst to study how to enable SQL-based training data debugging for federated learning.  ... 
arXiv:2108.11884v1 fatcat:veq3cxlyajf5zbmgspy2csx6cm

SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle [article]

Matthias Boehm, Iulian Antonov, Sebastian Baunsgaard, Mark Dokter, Robert Ginthoer, Kevin Innerebner, Florijan Klezin, Stefanie Lindstaedt, Arnab Phani, Benjamin Rath, Berthold Reinwald, Shafaq Siddiqi, Sebastian Benjamin Wrede
2020 arXiv   pre-print
training, to debugging and serving.  ...  In this paper, we introduce SystemDS, an open source ML system for the end-to-end data science lifecycle from data integration, cleaning, and preparation, over local, distributed, and federated ML model  ...  Acknowledgements We thank the entire Apache SystemML team for the initial code base of SystemDS, especially Shivakumar Vaithyanathan, Douglas R.  ... 
arXiv:1909.02976v2 fatcat:hdd36ca7jze7figqzaybfgdqra

The next database revolution

Jim Gray
2004 Proceedings of the 2004 ACM SIGMOD international conference on Management of data - SIGMOD '04  
Data cubes and online analytic processing are now baked into most DBMSs. Beyond that, DBMSs have a framework for data mining and machine learning algorithms.  ...  Allowing approximate and probabilistic answers is essential for many applications. Many believe that XML and xQuery will be the main data structure and access pattern.  ...  Then one inserts training data into the table T, and the data mining algorithm builds a decision tree or Bayes net or time series model for the data.  ... 
doi:10.1145/1007568.1007570 dblp:conf/sigmod/Gray04 fatcat:lnn2ffgli5gwdk7gjttq7by67m

Device-centric Federated Analytics At Ease [article]

Li Zhang, Junji Qiu, Shangguang Wang, Mengwei Xu
2022 arXiv   pre-print
In this paper, we propose a data querying system, Deck, that enables flexible device-centric federated analytics.  ...  However, data analysts still lack a uniform way to harness such distributed on-device data.  ...  learning [50] is a special paradigm of device-centric federated analytics, where the devices collaboratively train a machine learning model without sharing the raw training data.  ... 
arXiv:2206.11491v2 fatcat:lbcm5fswgbggpfht2fys4wbdbi

Generative Models for Effective ML on Private, Decentralized Datasets [article]

Sean Augenstein, H. Brendan McMahan, Daniel Ramage, Swaroop Ramaswamy, Peter Kairouz, Mingqing Chen, Rajiv Mathews, Blaise Aguera y Arcas
2020 arXiv   pre-print
This paper demonstrates that generative models - trained using federated methods and with formal differential privacy guarantees - can be used effectively to debug many commonly occurring data issues even  ...  Furthermore, manual data inspection is impossible in the increasingly important setting of federated learning, where raw examples are stored at the edge and the modeler may only access aggregated outputs  ...  AN APPLICATION TO DEBUGGING DURING TRAINING WITH RNNS DP Federated RNNs for Generating Natural Language Data Recurrent Neural Networks (RNNs) are a ubiquitous form of deep network, used to learn sequential  ... 
arXiv:1911.06679v2 fatcat:qdupc7zyh5gwpgu5yj2fim2kdu

The Impedance Imperative Tuples + Objects + Infosets = Too Much Stuff!

Dave Thomas
2003 Journal of Object Technology  
Using these tools it was straightforward for a businessperson with minimal training to develop useful robust applications.  ...  SQL is quite good for simple CRUD applications on normalized tables.  ...  has defined a semantic model for XML based on Infoset [9] .  ... 
doi:10.5381/jot.2003.2.5.c1 fatcat:dbzooregnvh6nerqv5whmzfkeu

ECO: Harmonizing Edge and Cloud with ML/DL Orchestration

Nisha Talagala, Swaminathan Sundararaman, Vinay Sridhar, Dulcardo Arteaga, Qianmei Luo, Sriram Subramanian, Sindhu Ghanta, Lior Khermosh, Drew S. Roselli
2018 USENIX Workshop on Hot Topics in Edge Computing  
We present Edge Cloud Orchestrator (ECO), an architecture for enabling realistic ML deployments that leverage both edge and cloud by providing an abstraction to orchestrate, manage, and automate ML pipelines  ...  Edge Computing and Machine Learning are complementary advances: edge devices drive volumes of rich data that benefit ML, and ML drives insights that can justify edge investments and create killer applications  ...  Federated Learning Federated Learning avoids transferring data to the cloud and leverages independent edge models [42] .  ... 
dblp:conf/hotedge/TalagalaSSALSGK18 fatcat:gpstya6d75hzndz73g5rql3ovi

Spark SQL

Michael Armbrust, Ali Ghodsi, Matei Zaharia, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin
2015 Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data - SIGMOD '15  
Using Catalyst, we have built a variety of features (e.g., schema inference for JSON, machine learning types, and query federation to external databases) tailored for the complex needs of modern data analysis  ...  libraries in Spark (e.g., machine learning).  ...  Acknowledgments We would like to thank Cheng Hao, Tayuka Ueshin, Tor Myklebust, Daoyuan Wang, and the rest of the Spark SQL contributors so far.  ... 
doi:10.1145/2723372.2742797 dblp:conf/sigmod/ArmbrustXLHLBMK15 fatcat:hjyxwbr6hfgrrmqa5lpi2evgcm

Federated Learning for Big Data: A Survey on Opportunities, Applications, and Future Directions [article]

Thippa Reddy Gadekallu, Quoc-Viet Pham, Thien Huynh-The, Sweta Bhattacharya, Praveen Kumar Reddy Maddikunta, Madhusanka Liyanage
2021 arXiv   pre-print
To overcome this challenge, federated learning (FL) appeared to be a promising learning technique.  ...  The potential of big data can be realized via analytic and learning techniques, in which the data from various sources is transferred to a central cloud for central storage, processing, and training.  ...  Acknowledgement We acknowledge the authors (Dinh, Fang, Pubudu) for the contribution of our (blockchain -big data) development.  ... 
arXiv:2110.04160v2 fatcat:3y2kmamdbrfmrjdxv3zh47yphu

Preface to the Special Issue on Data Management and Analysis Technique Supporting AI

Lei Chen, Department of Computer Science and Engineering, the Hong Kong University of Science and Technology, Hong Kong 999077, China, Hongzhi Wang, Yongxin Tong, Hong Gao
2021 International Journal of Software and Informatics  
and the training sub-processes are adjusted and iterated for multiple rounds based on data analysis and artificial experience.  ...  promote the development of AI technology based on big data and its wider application.  ...  ., professor of Beihang University, Ph.D. supervisor, senior member of CCF, is mainly engaged in the research on big data, databases, federated learning, spatiotemporal big data computing and crowd intelligence  ... 
doi:10.21655/ijsi.1673-7288.00244 fatcat:kgspofndnvacdafol3dm5wdmhu

Bridging the Gap between Data Integration and ML Systems [article]

Rihan Hai, Yan Kang, Christos Koutras, Andra Ionescu, Asterios Katsifodimos
2022 arXiv   pre-print
The data needed for machine learning (ML) model training and inference, can reside in different separate sites often termed data silos.  ...  In this work, we propose three matrix-based dataset relationship representations, which bridge the classical data integration (DI) techniques with the requirements of modern machine learning.  ...  With our matrix- based representations, we enlighten the new opportunities for linear algebra rewriting in model factorization, and feature engi- neering and model training in federated learning (Sec.  ... 
arXiv:2205.09681v1 fatcat:w4qc37liszcbhes7wy5ykeed3e

Predicting SPARQL Query Performance and Explaining Linked Data [chapter]

Rakebul Hasan
2014 Lecture Notes in Computer Science  
Moreover, consumers of the Semantic Web data may need explanations for debugging or understanding the reasoning behind producing the data.  ...  As the complexity of the Semantic Web increases, efficient ways to query the Semantic Web data is becoming increasingly important.  ...  In consuming Linked Data, we explain how a given piece of data was derived. Users can use such explanations to understand and debug Linked Data.  ... 
doi:10.1007/978-3-319-07443-6_53 fatcat:h7xxf6wr4bfcjmysyiruee22im

Technical Report on Data Integration and Preparation [article]

El Kindi Rezig, Michael Cafarella, Vijay Gadepally
2021 arXiv   pre-print
suited for the application.  ...  These challenges, often referred to as the three Vs (volume, velocity, variety) of Big Data, require low-level tools for data management, preparation and integration.  ...  The authors wish to thank the following individuals for their support in developing This material is based upon work supported by the National Science Foundation under Grant No. 1636788 and by the United  ... 
arXiv:2103.01986v1 fatcat:3mjxtmcm3nh4pdz7c3nfd4kldq

Magpie: Python at Speed and Scale using Cloud Backends

Alekh Jindal, K. Venkatesh Emani, Maureen Daum, Olga Poppe, Brandon Haynes, Anna Pavlenko, Ayushi Gupta, Karthik Ramachandra, Carlo Curino, Andreas Mueller, Wentao Wu, Hiren Patel
2021 Conference on Innovative Data Systems Research  
Python has become overwhelmingly popular for ad-hoc data analysis, and Pandas dataframes have quickly become the de facto standard API for data science.  ...  Magpie assists the data scientist by automatically selecting the most efficient engine (e.g., SQL DW, SCOPE, Spark) in cloud environments that offer multiple engines atop a data lake.  ...  We would like to thank the following teams and individuals for their invaluable assistance, insight, and support: Milos Sukovic and Brenden Niebruegge for discussions on Arrow based processing, Christopher  ... 
dblp:conf/cidr/JindalEDPHPG0CM21 fatcat:2u57cgl4wfar5bepjispc3on4y

Serverless Architectures Review, Future Trend and the Solutions to Open Problems

Manoj Kumar
2019 American Journal of Software Engineering  
Also provides comparative analysis on available serverless architectures for the most common use cases within cloud provider's environment.  ...  Amazon Athena enables SQL querying for data in S3. Azure Stream Analytics: Real-time streaming data and SQL like language querying Event Hubs: to process, route, and store IOT devices data.  ...  file storage Cloud Storage Analytics warehouse (SQL) BigQuery Personalization Cloud Machine Learning Engine © The Author(s) 2019.  ... 
doi:10.12691/ajse-6-1-1 fatcat:j7ufufymdrf2tbfedjryi2hq24
« Previous Showing results 1 — 15 out of 512 results