Filters








6,444 Hits in 5.8 sec

Automating model search for large scale machine learning

Evan R. Sparks, Ameet Talwalkar, Daniel Haas, Michael J. Franklin, Michael I. Jordan, Tim Kraska
2015 Proceedings of the Sixth ACM Symposium on Cloud Computing - SoCC '15  
tuning techniques, bandit resource allocation via runtime algorithm introspection, and physical optimization via batching and optimal resource allocation.  ...  In this work, we build upon these recent efforts and propose an architecture for automatic machine learning at scale comprised of a cost-based cluster resource allocation estimator, advanced hyperparameter  ...  Each model family has a number of hyperparameters, such as degree of regularization or learning rate, and each of these must be tuned to an appropriate value.  ... 
doi:10.1145/2806777.2806945 dblp:conf/cloud/SparksTHFJK15 fatcat:y4mzheh2ejf5fmimcauahaif5i

TuPAQ: An Efficient Planner for Large-scale Predictive Analytic Queries [article]

Evan R. Sparks, Ameet Talwalkar, Michael J. Franklin, Michael I. Jordan, Tim Kraska
2015 arXiv   pre-print
baseline approach, and can scale to models trained on terabytes of data across hundreds of machines.  ...  , and physical optimization via batching.  ...  There are also several proprietary and open-source systems providing machine learning functionality with varying degrees of automation.  ... 
arXiv:1502.00068v2 fatcat:l5ane47jazgq7cm3w7wh5cmho4

A Comparative Exploration of ML Techniques for Tuning Query Degree of Parallelism [article]

Zhiwei Fan, Rathijit Sen, Paraschos Koutris, Aws Albarghouthi
2020 arXiv   pre-print
In this paper, we take a first step towards filling this gap by studying the problem of \textit{tuning the degree of parallelism (DOP) via ML techniques} in Microsoft SQL Server, a popular commercial RDBMS  ...  There is a large body of recent work applying machine learning (ML) techniques to query optimization and query performance prediction in relational database management systems (RDBMSs).  ...  We thank Carlo Curino and other members of GSL, and members of the SQL Server team for discussions and feedback on this work.  ... 
arXiv:2005.08439v2 fatcat:wyemxj2qj5bdjhhwatlm5sg4p4

Usability and design considerations for an autonomic relational database management system

R. Telford, R. Horman, S. Lightstone, N. Markov, S. O'Connell, G. Lohman
2003 IBM Systems Journal  
resulted in system alerts, and the learning, by the system, of actions taken by the administrator.  ...  This paper examines the ease-of-use ramifications of autonomic computing in the context of relational databases in general, and of the IBM ® DB2 ® Universal Database TM Version 8.1 autonomic computing  ...  This dynamic ability to determine a near-optimal degree of parallelism for query execution makes much of the past literature on load-balancing obsolete. Load utility automatic tuning.  ... 
doi:10.1147/sj.424.0568 fatcat:jbllg6et4rfkxnbegmwn7qzmia

Toward autonomic computing with DB2 universal database

Sam S. Lightstone, Guy Lohman, Danny Zilio
2002 SIGMOD record  
As the cost of both hardware and software falls due to technological advancements and economies of scale, the cost of ownership for database applications is increasingly dominated by the cost of people  ...  ownership (TCO) of DBMSs and improve system performance.  ...  Automatic query parallelism selection At run-time, DB2 UDB can automatically determine the most effective degree of query parallelism to use for query performance across SMP CPUs as a maintenance task.  ... 
doi:10.1145/601858.601873 fatcat:3ptnwhni2bepnf7ebhxgc5z3na

Self-Tuning Transactional Data Grids: The Cloud-TM Approach

Diego Didona, Paolo Romano
2014 2014 IEEE 3rd Symposium on Network Cloud Computing and Applications (ncca 2014)  
From a methodological perspective, this is achieved by relying on the innovative idea of exploiting the diversity of different modelling approaches, including analytical models, machine-learning and simulations  ...  Cloud-TM takes a holistic approach to self-tuning and elastic scaling, treating them as strongly intertwined problems with the ultimate goals of i) achieving optimal efficiency at any scale of the platform  ...  charge of automating the tuning of the platform's scale, degree of replication, and of the choice of its replication protocol. • the AutoPlacer Optimizer, which monitors the quality of the current data  ... 
doi:10.1109/ncca.2014.26 dblp:conf/ncca/DidonaR14 fatcat:t5nlxdkxqnduhietjrm7kxpky4

LIFT: Reinforcement Learning in Computer Systems by Learning From Demonstrations [article]

Michael Schaarschmidt, Alexander Kuhnle, Ben Ellis, Kai Fricke, Felix Gessert, Eiko Yoneki
2018 arXiv   pre-print
We demonstrate the utility of LIFT in two case studies in database compound indexing and resource management in stream processing.  ...  However, practical solutions remain elusive due to large training data requirements, algorithmic instability, and lack of standard tools.  ...  DRL algorithms require more configuration and hyper-parameter tuning than other machine learning approaches, as users need to tune neural network hyper-parameters, design of states/actions and rewards,  ... 
arXiv:1808.07903v1 fatcat:wdzhvtufmnggthgzjxcllqxriy

The NLP Cookbook: Modern Recipes for Transformer Based Deep Learning Architectures

Sushant Singh, Ausif Mahmood
2021 IEEE Access  
Consequently, some of the recent NLP architectures have utilized concepts of transfer learning, pruning, quantization, and knowledge distillation to achieve moderate model sizes while keeping nearly similar  ...  retrieval via Natural Language Understanding (NLU), and Natural Language Generation (NLG).  ...  It was trained in an unsupervised manner capable of learning complex tasks including Machine Translation, reading comprehension, and summarization without explicit fine-tuning.  ... 
doi:10.1109/access.2021.3077350 fatcat:gchmms4m2ndvzdowgrvro3w6z4

Distributed Framework for Automating Opinion Discretization from Text Corpora on Facebook

Hiep Xuan Huynh, Vu Tuan Nguyen, Nghia Duong-Trung, Van-Huy Pham, Cang Thuong Phan
2019 IEEE Access  
There are unfortunately few studies that have applied the combination of convolutional neural networks (CNN) and Apache Spark to the task of automating opinion discretization.  ...  It covers all the steps and components that are usually part of a completely practical text mining pipeline: acquiring input data, processing, tokenizing it into a vectorial representation, applying machine  ...  NGHIA DUONG-TRUNG received the Ph.D. degree in machine learning from Information Systems and Machine Learning Lab (ISMLL), Hildesheim University, Germany, in 2017.  ... 
doi:10.1109/access.2019.2922427 fatcat:vwqqknsocbc2bbdqdrns7ykopm

Mining for strong gravitational lenses with self-supervised learning [article]

George Stein, Jacqueline Blaum, Peter Harrington, Tomislav Medan, Zarija Lukic
2021 arXiv   pre-print
We employ self-supervised representation learning to distill information from 76 million galaxy images from the Dark Energy Spectroscopic Instrument (DESI) Legacy Imaging Surveys' Data Release 9.  ...  Targeting the identification of new strong gravitational lens candidates, we first create a rapid similarity search tool to discover new strong lenses given only a single labelled example.  ...  help on using the DESI Legacy Survey data, Dustin Lang for providing access to the image-cutout service at NERSC, and Md Abul Hayat and Mustafa Mustafa for their pioneering efforts on self-supervised learning  ... 
arXiv:2110.00023v1 fatcat:fyvr7mekgjfqtofclgx4ewsunq

Guest Editors' Introduction to the Special Section on the 26th International Conference on Data Engineering

Shahram Ghandeharizadeh, Jayant R. Haritsa, Gerhard Weikum
2011 IEEE Transactions on Knowledge and Data Engineering  
It presents principled and machine-learning-inspired techniques to address the crucial but largely unexplored problem of assuring data quality right at its very root, when humans enter data via forms.  ...  His work on automatic database tuning received the 2002 VLDB 10-Year Award.  ... 
doi:10.1109/tkde.2011.135 fatcat:76a7aorphvgf7lyjpnqrulnhbq

Lessons Learned from Challenging Data Science Case Studies [chapter]

Kurt Stockinger, Martin Braschler, Thilo Stadelmann
2019 Applied Data Science  
In this chapter, we revisit the conclusions and lessons learned of the chapters presented in Part II of this book and analyze them systematically.  ...  stage of the knowledge discovery process or in a certain data science method or application area.  ...  and the impact of data distribution on the runtime of SQL queries or machine learning algorithms.  ... 
doi:10.1007/978-3-030-11821-1_24 fatcat:6azhc4aon5eofi572joxaas5xq

Auto-tuning Similarity Search Algorithms on Multi-core Architectures

Buğra Gedik
2013 International journal of parallel programming  
In this paper, we present (1) a detailed study of the various tuning knobs and their contributions on increasing the query throughput for parallelized versions of the two most common classes of high-dimensional  ...  We show experimentally that our auto-tuner reaches near-optimal performance and significantly outperforms un-tuned versions of parallel multi-NN algorithms for real video repository data on a variety of  ...  Figure 4b shows the query performance for different degrees of parallelism. The scan algorithm performance improves until about 8 threads, after that it degrades.  ... 
doi:10.1007/s10766-013-0239-8 fatcat:2gpdx6e62zfazl2eeykobwsrhe

Sequence-to-Set Semantic Tagging: End-to-End Multi-label Prediction using Neural Attention for Complex Query Reformulation and Automated Text Categorization [article]

Manirupa Das, Juanxi Li, Eric Fosler-Lussier, Simon Lin, Soheil Moosavinasab, Steve Rust, Yungui Huang, Rajiv Ramnath
2019 arXiv   pre-print
attention for learning document representations that can effect term transfer within the corpus, for semantically tagging a large collection of documents.  ...  Our approach to generate document encodings employing our sequence-to-set models for inference of semantic tags, gives to the best of our knowledge, the state-of-the-art for both, the unsupervised query  ...  over-tuned on these queries.  ... 
arXiv:1911.04427v1 fatcat:gb7fiztkgveuhnsit2x6zh57hy

A Survey on Automatic Parameter Tuning for Big Data Processing Systems

Herodotos Herodotou, Yuxing Chen, Jiaheng Lu
2020 ACM Computing Surveys  
, machine learning, and adaptive tuning.  ...  The use of automated parameter tuning techniques is a promising, yet challenging approach for optimizing system performance.  ...  Partition tuning: The number of tasks (i.e., the degree of parallelism) in a Spark application is determined based on the number of partitions from input RDD.  ... 
doi:10.1145/3381027 fatcat:7aglimtuwze25boptuano4ufdy
« Previous Showing results 1 — 15 out of 6,444 results