A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Joint Multi-Dimension Pruning via Numerical Gradient Update
[article]
2021
arXiv
pre-print
Then we optimize the pruning vector with gradient update and model joint pruning as a numerical gradient optimization process. ...
We present joint multi-dimension pruning (abbreviated as JointPruning), an effective method of pruning a network on three crucial aspects: spatial, depth and channel simultaneously. ...
+ ' 0 ) Fig. 1: The overall framework of the proposed joint multi-dimensional pruning. ...
arXiv:2005.08931v2
fatcat:yvnze4n7kzh43hnpj7emq5ye6u
UMEC: Unified model and embedding compression for efficient recommendation systems
2021
International Conference on Learning Representations
.,. 2 s (l) ,2 + z (R Flops (s) − R budget ) , and we can adopt the gradient ascent method as the update rule. Update s The optimization on s relies on both sparsity and resource loss. ...
For a pruned layer l, the input and output dimension are restricted by the number of pruned neurons, annotated as s (l) and s (l+1) . ...
dblp:conf/iclr/ShenWGTWL21
fatcat:bjhf7ynftnahlfoj6qrcqfkpla
Unified Visual Transformer Compression
[article]
2022
arXiv
pre-print
However, the computational overhead of ViTs remains prohibitive, due to stacking multi-head self-attention modules and else. ...
This paper proposes a unified ViT compression framework that seamlessly assembles three effective techniques: pruning, layer skipping, and knowledge distillation. ...
to the updating policy of gt, one gradient term w.r.t. s and r are ∇s z (R Flops (s, r, gt) − R budget ), ∇r z (R Flops (s, r, gt) − R budget ) respectively. ...
arXiv:2203.08243v1
fatcat:5rrj5vn53zdahejoxtfaoda6me
Only Train Once: A One-Shot Neural Network Training And Pruning Framework
[article]
2021
arXiv
pre-print
a structured-sparsity optimization problem and propose a novel optimization algorithm, Half-Space Stochastic Projected Gradient (HSPG), to solve it, which outperforms the standard proximal methods on ...
Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices. ...
., the tensors fed into the fully connected layer, and project them onto a 2-dimensional space via PCA [40] . ...
arXiv:2107.07467v2
fatcat:cbsetynjo5cu3ojulf7azddlz4
A Fast Harmonic Mean Linear Discriminant Analysis for Dimensionality Reduction
2022
International Journal of Intelligent Engineering and Systems
As well, a first-order approximation of the inverse Eigenvector matrix and the complete matrix of Eigenvectors are updated at every iteration. ...
Dimensionality reduction is the most prominent process in artificial intelligence and data science because of using a massive amount of high-dimensional information. ...
The gradient of Eq. ( 29 ) is: Determine the Stiefel manifold gradient using Eq. ( 11 ); Update 𝒢 using Eq. ( 12 ); Get back 𝒢 to the manifold using the joint diagonalization; Execute Algorithm 2; ...
doi:10.22266/ijies2022.0831.20
fatcat:tmoo7mw5ibblre6xazi3v556im
Towards Structured Dynamic Sparse Pre-Training of BERT
[article]
2021
arXiv
pre-print
In this work, we develop and study a straightforward, dynamic always-sparse pre-training approach for BERT language modeling task, which leverages periodic compression steps based on magnitude pruning ...
The dark horizontal blocks in the RigL updates indicate a collapse due to outliers along the input dimension, which indicates that the effect arises from the activation part of the dense gradient update ...
In Figure 11 , we show that for gradient-based re-allocation, the dense gradient is dominated by outliers in the activation, e.g., along the input dimension of each layer, which imposes a strong bias ...
arXiv:2108.06277v1
fatcat:sa7tnrabcrbmjhl2ewm25rqgny
Coarse-to-Fine Searching for Efficient Generative Adversarial Networks
[article]
2021
arXiv
pre-print
In addition, a fair supernet training approach is utilized to ensure that all sub-networks can be updated fairly and stably. ...
We first discover an intact search space of generator networks including three dimensionalities, i.e., path, operator, channel for fully excavating the network performance. ...
parameters and weights via gradient descent. ...
arXiv:2104.09223v1
fatcat:u3t62uvt7nhopipjeo2vfu24ty
Differentiable Neural Input Search for Recommender Systems
[article]
2020
arXiv
pre-print
For efficiency concern, these methods typically choose embedding dimensions from a restricted set of candidate dimensions. ...
Existing works have proposed heuristic or reinforcement learning-based methods to search for mixed feature embedding dimensions. ...
(b) The joint distribution plot of feature embedding dimensions and feature frequencies after dimension pruning. (c) Comparison of DNIS and network pruning performance over different pruning rates. ...
arXiv:2006.04466v2
fatcat:7kieg735j5enjlxtv255mdcyfu
Efficient Multi-objective Reinforcement Learning via Multiple-gradient Descent with Iteratively Discovered Weight-Vector Sets
2021
The Journal of Artificial Intelligence Research
via finding a minimum-norm point in the convex hull of the set of multiple policy gradients when the impact of one objective on others is unknown a priori. ...
In particular, we first propose a new PAOLS algorithm that integrates pruning and approximate optimistic linear support algorithm to efficiently discover the weight-vector sets of multiple gradients that ...
Formally, the model's parameters θ will be updated via the single objective gradient update as θ i = θ − α θ L T i (f θ ), ( where α is the step size. ...
doi:10.1613/jair.1.12270
fatcat:jrnf3b5ujbevnbysaln2jo626u
Motion Planning for a Humanoid Mobile Manipulator System
[article]
2018
arXiv
pre-print
Fourthly, an EEs' via-point-based multi-objective genetic algorithm is proposed to design the "human-like" via-poses by optimizing four objective functions. ...
In detail, an efficient direct-connect bidirectional RRT and gradient descent algorithm is proposed to reduce the sampled nodes largely, and a geometric optimization method is proposed for path pruning ...
Objective functions Due to the high redundancy, there exist numerous joints combinations given EEs' desired positions-orientations X EE , and there is always a preference. ...
arXiv:1806.07349v1
fatcat:rr73lp3ymnbs5ksyemxqq4zld4
Codebook Training for Trellis-Based Hierarchical Grassmannian Classification
2021
IEEE Wireless Communications Letters
Exploiting the similarity of the proposed trellis classifier with a neural network, we propose stochastic gradient-based training techniques. ...
We consider classification of points on a complexvalued Grassmann manifold of m-dimensional subspaces within the n-dimensional complex Euclidean space. ...
To train layer r, we update the corresponding codebook entry Q (r) j * r such as to increase the quantization metric U H Û 2 via a stochastic gradient step. ...
doi:10.1109/lwc.2021.3139166
fatcat:pddkobmdy5hxde4ifbxd6vhjzu
Machine Learning for Microcontroller-Class Hardware – A Review
[article]
2022
arXiv
pre-print
We present both qualitative and numerical insights into different stages of model development by showcasing several use cases. ...
T: Gradient norm for sample selection via uncertainty and diversity. ...
Models operating on intrinsic dimensions of the data are computationally tractable and mitigate the curse of dimensionality. ...
arXiv:2205.14550v3
fatcat:y272riitirhwfgfiotlwv5i7nu
Communication-Efficient Edge AI: Algorithms and Systems
[article]
2020
arXiv
pre-print
Based on over-the-air computation, Amiri and Gunduz [85] proposed a gradient sparsification and random linear projection method to reduce the dimension of gradients due to limited channel bandwidth. ...
methods for d−dimensional convex optimization problems. ...
arXiv:2002.09668v1
fatcat:nhasdzb7t5dt5brs2r7ocdzrnm
Effectively Subsampled Quadratures For Least Squares Polynomial Approximations
[article]
2017
arXiv
pre-print
We conclude with numerical experiments on an analytical function and a model piston problem that show the efficacy of our approach compared with randomized subsampling. ...
For polynomial approximation, we use a column pruning heuristic that removes columns based on the highest total orders and then solves the tall least squares problem. ...
Further pruning of the polynomial subspace is performed via heuristics. ...
arXiv:1601.05470v4
fatcat:ggr52udoizfhll5tq4s4f23wja
Energy-Aware Neural Architecture Optimization with Fast Splitting Steepest Descent
[article]
2020
arXiv
pre-print
Our fast algorithm allows us to reduce the computational cost of splitting to the same level of typical back-propagation updates and enables efficient implementation on GPU. ...
network architectures; 2) we substantially speed up the splitting process of Liu et al. (2019), which requires expensive eigen-decomposition, by proposing a highly scalable Rayleigh-quotient stochastic gradient ...
min v R S (v) := v Sv v v , v min ∝ arg min v R S (v), (7) which can be solved using gradient descent or other numerical methods. ...
arXiv:1910.03103v3
fatcat:qlo7iewv7zc43apspvy2opd5by
« Previous
Showing results 1 — 15 out of 2,336 results