15,540 Hits in 4.5 sec

Tensorized Embedding Layers for Efficient Model Compression [article]

Oleksii Hrinchuk, Valentin Khrulkov, Leyla Mirvakhabova, Elena Orlova, Ivan Oseledets
2020 arXiv   pre-print
We introduce a novel way of parametrizing embedding layers based on the Tensor Train (TT) decomposition, which allows compressing the model significantly at the cost of a negligible drop or even a slight  ...  The embedding layers transforming input words into real vectors are the key components of deep neural networks used in natural language processing.  ...  There is a plethora of prior work on compressing the embedding layers used in NLP models.  ... 
arXiv:1901.10787v2 fatcat:4hdlktk4xrcsji2eoanvcdluly

MARS: Masked Automatic Ranks Selection in Tensor Decompositions [article]

Maxim Kodryan, Dmitry Kropotov, Dmitry Vetrov
2021 arXiv   pre-print
Tensor decomposition methods are known to be efficient for compressing and accelerating neural networks.  ...  In this paper, we introduce MARS -- a new efficient method for the automatic selection of ranks in general tensor decompositions.  ...  SENTIMENT ANALYSIS WITH TT-EMBEDDINGS A recent work of Khrulkov et al. [2019] leverage Tensor Train decomposition for compressing embedding layers in various NLP models.  ... 
arXiv:2006.10859v2 fatcat:4utssbbxz5bmvd3wjepic2fn6i

Exploring Extreme Parameter Compression for Pre-trained Language Models [article]

Yuxin Ren, Benyou Wang, Lifeng Shang, Xin Jiang, Qun Liu
2022 arXiv   pre-print
In this work, we aim to explore larger compression ratios for PLMs, among which tensor decomposition is a potential but under-investigated one.  ...  ., less than 2M parameters excluding the embedding layer) and 2.7 × faster on inference.  ...  Table 12 : 12 Parameter compression ratios in various models We exclude embedding layer for compression, as did.  ... 
arXiv:2205.10036v1 fatcat:627jbvuf6rhldkkmtxvr6cohay

Neural Networks Compression for Language Modeling [chapter]

Artem M. Grachev, Dmitry I. Ignatov, Andrey V. Savchenko
2017 Lecture Notes in Computer Science  
In this paper, we consider several compression techniques for the language modeling problem based on recurrent neural networks (RNNs).  ...  By using the Penn Treebank (PTB) dataset we compare pruning, quantization, low-rank factorization, tensor train decomposition for LSTM networks in terms of model size and suitability for fast inference  ...  Savchenko is supported by the Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics.  ... 
doi:10.1007/978-3-319-69900-4_44 fatcat:lrrb2wzjdfem7lhx4xjndboedy

CPGAN : An Efficient Architecture Designing for Text-to-Image Generative Adversarial Networks Based on Canonical Polyadic Decomposition

Ruixin Ma, Junying Lou, Liang Zou
2021 Scientific Programming  
To solve this problem, we propose an efficient architecture CPGAN for text-to-image generative adversarial networks (GAN) based on canonical polyadic decomposition (CPD).  ...  Many interesting and meaningful text-to-image synthesis models have been put forward.  ...  It is necessary to compress text-to-image GAN. Canonical polyadic decomposition (CPD) is an easy and efficient way to compress and accelerate model in tensor decomposition.  ... 
doi:10.1155/2021/5573751 fatcat:wl6kqkn3cvbbrbq3fryq4lwdpy

Low-Rank Embedding of Kernels in Convolutional Neural Networks under Random Shuffling [article]

Chao Li, Zhun Sun, Jinshi Yu, Ming Hou, Qibin Zhao
2018 arXiv   pre-print
In the previous studies, tensor decomposition (TD) has achieved promising compression performance by embedding the kernel of a convolutional layer into a low-rank subspace.  ...  We demonstrate this by compressing the convolutional layers via randomly-shuffled tensor decomposition (RsTD) for a standard classification task using CIFAR-10.  ...  Using this model, we then propose the randomly-shuffled tensor decomposition (RsTD) based convolutional layer, which is used for CNN compression in Section 2.  ... 
arXiv:1810.13098v1 fatcat:2f6zaqq6znfjjaahqzzrx3dofy

Efficient On-Device Session-Based Recommendation [article]

Xin Xia, Junliang Yu, Qinyong Wang, Chaoqun Yang, Quoc Viet Hung Nguyen, Hongzhi Yin
2022 arXiv   pre-print
decomposing the embedding table into smaller tensors, showing great potential in compressing recommendation models.  ...  However, these model compression techniques significantly increase the local inference time due to the complex process of generating index lists and a series of tensor multiplications to form item embeddings  ...  TT-Rec [48] also applies tensor-train decomposition to the embedding layer for model compression and further improves model performance with a sampled Gaussian distribution for the weight initialization  ... 
arXiv:2209.13422v2 fatcat:vliymtk7nvcxfi5c72gfbjl23y

Iterative Low-Rank Approximation for CNN Compression [article]

Maksym Kholiavchenko
2019 arXiv   pre-print
Since classification and object detection are the most favored tasks for embedded devices, we demonstrate the effectiveness of our approach by compressing AlexNet, VGG-16, YOLOv2 and Tiny YOLO networks  ...  Deep convolutional neural networks contain tens of millions of parameters, making them impossible to work efficiently on embedded devices.  ...  But such networks contain tens of millions of parameters and cannot be efficiently deployed on embedded systems and mobile devices due to their computational and power limitations.  ... 
arXiv:1803.08995v2 fatcat:lstgp7l4gngbhgrfg547pec4qu

Compression of recurrent neural networks for efficient language modeling

Artem M. Grachev, Dmitry I. Ignatov, Andrey V. Savchenko
2019 Applied Soft Computing  
We propose a general pipeline for applying the most suitable methods to compress recurrent neural networks for language modeling.  ...  In this paper we consider several compression techniques for recurrent neural networks including Long-Short Term Memory models.  ...  The authors would like to thank Dmitriy Polubotko for his valuable help with the experiments on mobile devices.  ... 
doi:10.1016/j.asoc.2019.03.057 fatcat:msw6p77rlfamxc2xwvbl7eyk6q

On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation [article]

Xin Xia, Hongzhi Yin, Junliang Yu, Qinyong Wang, Guandong Xu, Nguyen Quoc Viet Hung
2022 arXiv   pre-print
In this paper, we explore ultra-compact models for next-item recommendation, by loosing the constraint of dimensionality consistency in tensor decomposition.  ...  Previous research mostly adopts tensor decomposition techniques to compress the regular recommendation model with limited compression ratio so as to avoid drastic performance degradation.  ...  efficiency [37] , we introduce the semi-tensor product (STP) operation [5] to tensor-train decomposition for the extreme compression of the embedding table.  ... 
arXiv:2204.11091v1 fatcat:56nykfz6pjdg5nh4ltpzxjovpe

Improving Word Embedding Factorization for Compression Using Distilled Nonlinear Neural Decomposition [article]

Vasileios Lioutas, Ahmad Rashid, Krtin Kumar, Md Akmal Haidar, Mehdi Rezagholizadeh
2020 arXiv   pre-print
We conduct extensive experiments with various compression rates on machine translation and language modeling, using different data-sets with a shared word-embedding matrix for both embedding and vocabulary  ...  Embedding matrices, typically, contain most of the parameters for language models and about a third for machine translation systems.  ...  Oseledets (2011) introduced an efficient algorithm Tensor Train (TT) for multilinear SVD Tensor.  ... 
arXiv:1910.06720v2 fatcat:7za6klgsrjdknbtc6da4hjoqtq

Evaluation of Deep Neural Network Compression Methods for Edge Devices Using Weighted Score-Based Ranking Scheme

Olutosin Ajibola Ademola, Mairo Leier, Eduard Petlenkov
2021 Sensors  
There exist several model compression methods; however, determining the most efficient method is of major concern.  ...  Our proposed method is extendable and can be used as a framework for the selection of suitable model compression methods for edge devices in different applications.  ...  The research on the design of efficient and lightweight CNN models has increased as a result of the exponential growth in the demand for real-time, efficient, and powerconsumption-aware embedded computer  ... 
doi:10.3390/s21227529 pmid:34833610 pmcid:PMC8622199 fatcat:7fghss2knnfo5nu26d5o4khvcy

Deep Compressed Pneumonia Detection for Low-Power Embedded Devices [article]

Hongjia Li, Sheng Lin, Ning Liu, Caiwen Ding, Yanzhi Wang
2019 arXiv   pre-print
Experiments show that we can achieve up to 36x compression ratio compared to the original model with 106 layers, while maintaining no accuracy degradation.  ...  We evaluate the proposed methods on an embedded low-power device, Jetson TX2, and achieve low power usage and high energy efficiency.  ...  In order to deploy DNNs on these embedded devices, DNN model compression techniques such as weight pruning, have been proposed for storage reduction and computation acceleration.  ... 
arXiv:1911.02007v1 fatcat:7vw4ubquvva3zny7qj5ly5i37y

TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models [article]

Chunxing Yin and Bilge Acun and Xing Liu and Carole-Jean Wu
2021 arXiv   pre-print
TT-Rec achieves 117 times and 112 times model size compression, for Kaggle and Terabyte, respectively.  ...  The memory capacity of embedding tables in deep learning recommendation models (DLRMs) is increasing dramatically from tens of GBs to TBs across the industry.  ...  While prior works have demonstrated tensor-train compression techniques for embedding layers in language models (Hrinchuk et al., 2020) , this paper is the first to explore and customize tensor-train  ... 
arXiv:2101.11714v1 fatcat:5h24xjmm4ffvfiph3j4uvek5tq

Towards Compact Neural Networks via End-to-End Training: A Bayesian Tensor Approach with Automatic Rank Determination [article]

Cole Hawkins, Xing Liu, Zheng Zhang
2021 arXiv   pre-print
We first develop a flexible Bayesian model that can handle various low-rank tensor formats (e.g., CP, Tucker, tensor train and tensor-train matrix) that compress neural network parameters in training.  ...  However, directly training a low-rank tensorized neural network is a very challenging task because it is hard to determine a proper tensor rank a priori, which controls the model complexity and compression  ...  In [37] a preliminary FPGA acceleration of our method demonstrates 123× gains in energy efficiency and 59× speedup on a simple two-layer neural network over non-tensorized training on embedded device  ... 
arXiv:2010.08689v3 fatcat:ofz3fiqckrd65e6mja4u3r2rdm
« Previous Showing results 1 — 15 out of 15,540 results