Filters








4,548 Hits in 4.6 sec

Parameter Re-Initialization through Cyclical Batch Size Schedules [article]

Norman Mu and Zhewei Yao and Amir Gholami and Kurt Keutzer and Michael Mahoney
2018 arXiv   pre-print
We implement this through a cyclical batch size schedule motivated by a Bayesian perspective of neural network training.  ...  Optimal parameter initialization remains a crucial problem for neural network training. A poor weight initialization may take longer to train and/or converge to sub-optimal solutions.  ...  The motivation for this is that larger batch sizes allow for parallel execution which can accelerate training. We implement weight re-initialization through cyclical batch size schedules.  ... 
arXiv:1812.01216v1 fatcat:mkk5auxhunerdhd6u5zrrgbjmq

The effectiveness of intelligent scheduling for multicast video-on-demand

Vaneet Aggarwal, Robert Caldebank, Vijay Gopalakrishnan, Rittwik Jana, K. K. Ramakrishnan, Fang Yu
2009 Proceedings of the seventeen ACM international conference on Multimedia - MM '09  
We show through analysis that this approach is optimal in terms of the data transmitted by the server.  ...  Today's delivery mechanisms, especially unicast, require resources to scale linearly with the number of receivers and library sizes.  ...  Figure 14 : 14 Batching effectiveness of cyclic multicast: Avg. # of receivers of each individual cyclic stream Table 1 : 1 Parameters values used in experiments  ... 
doi:10.1145/1631272.1631330 dblp:conf/mm/AggarwalCGJRY09 fatcat:duklwdmg25gkvl7obkeyv6jjsq

Hyper-Learning for Gradient-Based Batch Size Adaptation [article]

Calum Robert MacLellan, Feng Dong
2022 arXiv   pre-print
We demonstrate Arbiter's effectiveness in several illustrative experiments: to act as a stand-alone batch size scheduler; to complement fixed batch size schedules with greater flexibility; and to promote  ...  Scheduling the batch size to increase is an effective strategy to control gradient noise when training deep neural networks.  ...  Within the context of batch size scheduling, we should therefore select a 'sufficiently' small initial batch size, and choose a scheduling heuristic for increasing the batch size as training progresses  ... 
arXiv:2205.08231v1 fatcat:egkcy6patjajvmjqny24x3nstq

Automated learning rate search using batch-level cross-validation

Duygu KABAKÇI, Emre AKBAŞ
2021 Sakarya University Journal of Computer and Information Sciences  
Hyper-parameters of deep neural networks, especially the learning rate and its (decay) schedule, highly affect the network's final performance.  ...  The advantage of batch-level or micro CV methods is that the gradient computed during training is re-used to evaluate several different learning rates.  ...  Training set 𝑻, initial network weights 𝜽, momentum parameter 𝜶, initial velocity 𝝑, mini-batch size 𝒎, training-vs-validation trade-off parameter 𝝀.  ... 
doi:10.35377/saucis...935353 fatcat:huyulxlpijeufka5x46vavf4ji

Snapshot Ensembles: Train 1, get M for free [article]

Gao Huang, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E. Hopcroft, Kilian Q. Weinberger
2017 arXiv   pre-print
To obtain repeated rapid convergence, we leverage recent work on cyclic learning rate schedules.  ...  We achieve this goal by training a single neural network, converging to several local minima along its optimization path and saving the model parameters.  ...  ACKNOWLEDGEMENTS We thank Ilya Loshchilov and Frank Hutter for their insightful comments on the cyclic cosineshaped learning rate.  ... 
arXiv:1704.00109v1 fatcat:653mnwhqbzfg7hapzygzc6rkqa

Stochastic cyclic scheduling problem in synchronous assembly and production lines

S Karabati, B Tan
1998 Journal of the Operational Research Society  
For the second approximate solution procedure let us use schedule S=(1, 4, 3, 2) as the initial solution. The expected batch cycle time of this schedule is equal to 36.128.  ...  Opns Res 37: 925-935. Karabati S and Kouvelis P (1996). Cyclic scheduling in flow lines: Modeling observations, effective heuristics and an opti- mal solution procedure. Naval Res Logist 43: 211-231.  ... 
doi:10.1057/palgrave.jors.2600625 fatcat:rznxdux7sjd75npiqhvcvx52xi

Stochastic cyclic scheduling problem in synchronous assembly and production lines

S Karabati, B Tan
1998 Journal of the Operational Research Society  
For the second approximate solution procedure let us use schedule S=(1, 4, 3, 2) as the initial solution. The expected batch cycle time of this schedule is equal to 36.128.  ...  Opns Res 37: 925-935. Karabati S and Kouvelis P (1996). Cyclic scheduling in flow lines: Modeling observations, effective heuristics and an opti- mal solution procedure. Naval Res Logist 43: 211-231.  ... 
doi:10.1038/sj.jors.2600625 fatcat:ariyzvw5dfh3beqyjto4dqid5i

Faucet

Andrea Lattuada, Frank McSherry, Zaheer Chothia
2016 Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond - BeyondMR '16  
CCS Concepts •Software and its engineering → Scheduling; Data flow architectures;  ...  This document presents Faucet, a modular flow control approach for distributed data-parallel dataflow engines with support for arbitrary (cyclic) topologies.  ...  Controlled memory usage may also enable performance gains through buffer re-use in a broader set of topologies.  ... 
doi:10.1145/2926534.2926544 dblp:conf/sigmod/LattuadaMC16 fatcat:72uh4l6hsfhbnpr7mi57jg7orm

Online Embedding Compression for Text Classification using Low Rank Matrix Factorization [article]

Anish Acharya, Rahul Goel, Angeliki Metallinou, Inderjit Dhillon
2018 arXiv   pre-print
Our models are trained, compressed and then further re-trained on the downstream task to recover accuracy while maintaining the reduced size.  ...  Finally, we introduce a novel learning rate schedule, the Cyclically Annealed Learning Rate (CALR), which we empirically demonstrate to outperform other popular adaptive learning rate algorithms on a sentence  ...  Figure 2 : 2 Comparison of CLR and CALR update Policy Algorithm 1 Cyclically Annealed Learning Rate Schedule 1: procedure CALR(Iteration, Step Size, LR LB , LR U B ) 2: Table 2 : 2 Compression and accuracy  ... 
arXiv:1811.00641v1 fatcat:7ywish3ymjerhmoforabop6rym

A Markov Decision Process Approach to Active Meta Learning [article]

Bingjia Wang, Alec Koppel, Vikram Krishnamurthy
2020 arXiv   pre-print
One challenge in meta-learning is how to exploit relationships between tasks and classes, which is overlooked by commonly used random or cyclic passes through data.  ...  We develop scheduling schemes based on Upper Confidence Bound (UCB), Gittins Index and tabular Markov Decision Problems (MDPs) solved with linear programming, where the reward is the scaled statistical  ...  To evaluate the performance, we vary the batch size B ∈ {1, 20, 100}.  ... 
arXiv:2009.04950v1 fatcat:ifofp6moi5h2dhbrbe6qt6c6eu

Hybridizing discrete- and continuous-time models for batch sizing and scheduling problems

Siqun Wang, Monique Guignard
2006 Computers & Operations Research  
This paper proposes a new hybrid technique called "partial parameter uniformization" to hybrid discrete-and continuous-time models for batch sizing and scheduling problems.  ...  Various discrete-time and continuous-time MILP (Mixed Integer Linear Programming) formulations have been built and utilized for capacitated batch sizing and scheduling problems in process industries.  ...  For the batch sizing and scheduling problems, all the values of di¤erent processing times are the "parameters" we make uniform.  ... 
doi:10.1016/j.cor.2004.11.013 fatcat:werx4dxqibetrbcc24mkn54squ

Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters [article]

Zhengda Bian and Shenggui Li and Wei Wang and Yang You
2021 arXiv   pre-print
It determines the batch size for each job through an online evolutionary search that can continuously optimize the scheduling decisions.  ...  To address the problem, we propose ONES, an ONline Evolutionary Scheduler for elastic batch size orchestration.  ...  Batch Size Scaling Mechanism. The batch size scaling re-configures a job with a new batch size.  ... 
arXiv:2108.03645v1 fatcat:sftetwudifazrntmsowd42pinq

Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates [article]

Leslie N. Smith, Nicholay Topin
2018 arXiv   pre-print
[2017] independently) showed an equivalence of increasing batch sizes instead of a decreasing learning rate schedule.  ...  Figure 5 : Comparisons of super-convergence to over a range of batch sizes. These results show that a large batch size is more effective than a small batch size for super-convergence training.  ... 
arXiv:1708.07120v3 fatcat:ff5p24boonbzjm7qf4pt4vabqq

Cyclical Pruning for Sparse Neural Networks [article]

Suraj Srinivas, Andrey Kuzmin, Markus Nagel, Mart van Baalen, Andrii Skliar, Tijmen Blankevoort
2022 arXiv   pre-print
To enable weight recovery, we propose a simple strategy called cyclical pruning which requires the pruning schedule to be periodic and allows for weights pruned erroneously in one cycle to recover in subsequent  ...  Current methods for pruning neural network weights iteratively apply magnitude-based pruning on the model weights and re-train the resulting model to recover lost accuracy.  ...  completing 75% of the allocated epochs, and use a batch size of 256.  ... 
arXiv:2202.01290v1 fatcat:pynpn53jr5bx3nrkb7gghtquu4

Automated Pavement Crack Segmentation Using Fully Convolutional U-Net with a Pretrained ResNet-34 Encoder [article]

Stephen L. H. Lau, Xin Wang, Xu Yang, Edwin K. P. Chong
2020 arXiv   pre-print
We used a "one-cycle" training schedule based on cyclical learning rates to speed up the convergence.  ...  To minimize the dice coefficient loss function, we optimize the parameters in the neural network by using an adaptive moment optimizer called AdamW.  ...  We used a "one-cycle" training schedule based on cyclical learning rates to speed up the convergence.  ... 
arXiv:2001.01912v3 fatcat:cgsmdhiktnesxkopelvn2nbm44
« Previous Showing results 1 — 15 out of 4,548 results