A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Parameter Re-Initialization through Cyclical Batch Size Schedules
[article]
2018
arXiv
pre-print
We implement this through a cyclical batch size schedule motivated by a Bayesian perspective of neural network training. ...
Optimal parameter initialization remains a crucial problem for neural network training. A poor weight initialization may take longer to train and/or converge to sub-optimal solutions. ...
The motivation for this is that larger batch sizes allow for parallel execution which can accelerate training. We implement weight re-initialization through cyclical batch size schedules. ...
arXiv:1812.01216v1
fatcat:mkk5auxhunerdhd6u5zrrgbjmq
The effectiveness of intelligent scheduling for multicast video-on-demand
2009
Proceedings of the seventeen ACM international conference on Multimedia - MM '09
We show through analysis that this approach is optimal in terms of the data transmitted by the server. ...
Today's delivery mechanisms, especially unicast, require resources to scale linearly with the number of receivers and library sizes. ...
Figure 14 : 14 Batching effectiveness of cyclic multicast: Avg. # of receivers of each individual cyclic stream
Table 1 : 1 Parameters values used in experiments ...
doi:10.1145/1631272.1631330
dblp:conf/mm/AggarwalCGJRY09
fatcat:duklwdmg25gkvl7obkeyv6jjsq
Hyper-Learning for Gradient-Based Batch Size Adaptation
[article]
2022
arXiv
pre-print
We demonstrate Arbiter's effectiveness in several illustrative experiments: to act as a stand-alone batch size scheduler; to complement fixed batch size schedules with greater flexibility; and to promote ...
Scheduling the batch size to increase is an effective strategy to control gradient noise when training deep neural networks. ...
Within the context of batch size scheduling, we should therefore select a 'sufficiently' small initial batch size, and choose a scheduling heuristic for increasing the batch size as training progresses ...
arXiv:2205.08231v1
fatcat:egkcy6patjajvmjqny24x3nstq
Automated learning rate search using batch-level cross-validation
2021
Sakarya University Journal of Computer and Information Sciences
Hyper-parameters of deep neural networks, especially the learning rate and its (decay) schedule, highly affect the network's final performance. ...
The advantage of batch-level or micro CV methods is that the gradient computed during training is re-used to evaluate several different learning rates. ...
Training set 𝑻, initial network weights 𝜽, momentum parameter 𝜶, initial velocity 𝝑, mini-batch size 𝒎, training-vs-validation trade-off parameter 𝝀. ...
doi:10.35377/saucis...935353
fatcat:huyulxlpijeufka5x46vavf4ji
Snapshot Ensembles: Train 1, get M for free
[article]
2017
arXiv
pre-print
To obtain repeated rapid convergence, we leverage recent work on cyclic learning rate schedules. ...
We achieve this goal by training a single neural network, converging to several local minima along its optimization path and saving the model parameters. ...
ACKNOWLEDGEMENTS We thank Ilya Loshchilov and Frank Hutter for their insightful comments on the cyclic cosineshaped learning rate. ...
arXiv:1704.00109v1
fatcat:653mnwhqbzfg7hapzygzc6rkqa
Stochastic cyclic scheduling problem in synchronous assembly and production lines
1998
Journal of the Operational Research Society
For the second approximate solution procedure let us use schedule S=(1, 4, 3, 2) as the initial solution. The expected batch cycle time of this schedule is equal to 36.128. ...
Opns Res 37: 925-935.
Karabati S and Kouvelis P (1996). Cyclic scheduling in flow lines: Modeling observations, effective heuristics and an opti- mal solution procedure. Naval Res Logist 43: 211-231. ...
doi:10.1057/palgrave.jors.2600625
fatcat:rznxdux7sjd75npiqhvcvx52xi
Stochastic cyclic scheduling problem in synchronous assembly and production lines
1998
Journal of the Operational Research Society
For the second approximate solution procedure let us use schedule S=(1, 4, 3, 2) as the initial solution. The expected batch cycle time of this schedule is equal to 36.128. ...
Opns Res 37: 925-935.
Karabati S and Kouvelis P (1996). Cyclic scheduling in flow lines: Modeling observations, effective heuristics and an opti- mal solution procedure. Naval Res Logist 43: 211-231. ...
doi:10.1038/sj.jors.2600625
fatcat:ariyzvw5dfh3beqyjto4dqid5i
Faucet
2016
Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond - BeyondMR '16
CCS Concepts •Software and its engineering → Scheduling; Data flow architectures; ...
This document presents Faucet, a modular flow control approach for distributed data-parallel dataflow engines with support for arbitrary (cyclic) topologies. ...
Controlled memory usage may also enable performance gains through buffer re-use in a broader set of topologies. ...
doi:10.1145/2926534.2926544
dblp:conf/sigmod/LattuadaMC16
fatcat:72uh4l6hsfhbnpr7mi57jg7orm
Online Embedding Compression for Text Classification using Low Rank Matrix Factorization
[article]
2018
arXiv
pre-print
Our models are trained, compressed and then further re-trained on the downstream task to recover accuracy while maintaining the reduced size. ...
Finally, we introduce a novel learning rate schedule, the Cyclically Annealed Learning Rate (CALR), which we empirically demonstrate to outperform other popular adaptive learning rate algorithms on a sentence ...
Figure 2 : 2 Comparison of CLR and CALR update Policy Algorithm 1 Cyclically Annealed Learning Rate Schedule 1: procedure CALR(Iteration, Step Size, LR LB , LR U B ) 2:
Table 2 : 2 Compression and accuracy ...
arXiv:1811.00641v1
fatcat:7ywish3ymjerhmoforabop6rym
A Markov Decision Process Approach to Active Meta Learning
[article]
2020
arXiv
pre-print
One challenge in meta-learning is how to exploit relationships between tasks and classes, which is overlooked by commonly used random or cyclic passes through data. ...
We develop scheduling schemes based on Upper Confidence Bound (UCB), Gittins Index and tabular Markov Decision Problems (MDPs) solved with linear programming, where the reward is the scaled statistical ...
To evaluate the performance, we vary the batch size B ∈ {1, 20, 100}. ...
arXiv:2009.04950v1
fatcat:ifofp6moi5h2dhbrbe6qt6c6eu
Hybridizing discrete- and continuous-time models for batch sizing and scheduling problems
2006
Computers & Operations Research
This paper proposes a new hybrid technique called "partial parameter uniformization" to hybrid discrete-and continuous-time models for batch sizing and scheduling problems. ...
Various discrete-time and continuous-time MILP (Mixed Integer Linear Programming) formulations have been built and utilized for capacitated batch sizing and scheduling problems in process industries. ...
For the batch sizing and scheduling problems, all the values of di¤erent processing times are the "parameters" we make uniform. ...
doi:10.1016/j.cor.2004.11.013
fatcat:werx4dxqibetrbcc24mkn54squ
Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters
[article]
2021
arXiv
pre-print
It determines the batch size for each job through an online evolutionary search that can continuously optimize the scheduling decisions. ...
To address the problem, we propose ONES, an ONline Evolutionary Scheduler for elastic batch size orchestration. ...
Batch Size Scaling Mechanism. The batch size scaling re-configures a job with a new batch size. ...
arXiv:2108.03645v1
fatcat:sftetwudifazrntmsowd42pinq
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
[article]
2018
arXiv
pre-print
[2017] independently) showed an equivalence of increasing batch sizes instead of a decreasing learning rate schedule. ...
Figure 5 : Comparisons of super-convergence to over a range of batch sizes. These results show that a large batch size is more effective than a small batch size for super-convergence training. ...
arXiv:1708.07120v3
fatcat:ff5p24boonbzjm7qf4pt4vabqq
Cyclical Pruning for Sparse Neural Networks
[article]
2022
arXiv
pre-print
To enable weight recovery, we propose a simple strategy called cyclical pruning which requires the pruning schedule to be periodic and allows for weights pruned erroneously in one cycle to recover in subsequent ...
Current methods for pruning neural network weights iteratively apply magnitude-based pruning on the model weights and re-train the resulting model to recover lost accuracy. ...
completing 75% of the allocated epochs, and use a batch size of 256. ...
arXiv:2202.01290v1
fatcat:pynpn53jr5bx3nrkb7gghtquu4
Automated Pavement Crack Segmentation Using Fully Convolutional U-Net with a Pretrained ResNet-34 Encoder
[article]
2020
arXiv
pre-print
We used a "one-cycle" training schedule based on cyclical learning rates to speed up the convergence. ...
To minimize the dice coefficient loss function, we optimize the parameters in the neural network by using an adaptive moment optimizer called AdamW. ...
We used a "one-cycle" training schedule based on cyclical learning rates to speed up the convergence. ...
arXiv:2001.01912v3
fatcat:cgsmdhiktnesxkopelvn2nbm44
« Previous
Showing results 1 — 15 out of 4,548 results