Filters








5,252 Hits in 5.8 sec

Dramatically Reducing Training Data Size Through Vocabulary Saturation

William Lewis, Sauleh Eetemadi
2013 Conference on Machine Translation  
We have developed a very simple n-gram counting method that reduces the size of data sets dramatically, as much as 90%, and is applicable independent of specific dev and test data.  ...  At the same time it reduces model sizes, improves training times, and, because it attempts to preserve contexts for all n-grams in a corpus, the cost in quality is minimal (as measured by BLEU ).  ...  One might say that the vocabulary of the phrase mappings derived from model training "saturate" as data size increases, since less and less novel information can be derived from each succeeding sentence  ... 
dblp:conf/wmt/LewisE13 fatcat:oiopc5tslrelron4hydj2bfnjy

Scaling Word2Vec on Big Corpus

Bofang Li, Aleksandr Drozd, Yuhe Guo, Tao Liu, Satoshi Matsuoka, Xiaoyong Du
2019 Data Science and Engineering  
To do this, one main challenge is reducing dependencies inside a large training batch.  ...  During batch training, we "freeze" the context part and update only on the nondependent part to reduce conflicts.  ...  Availability of Data and Materials The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.  ... 
doi:10.1007/s41019-019-0096-6 fatcat:lpdp6byjmnfaxc2ag7p3tcrssm

NodePiece: Compositional and Parameter-Efficient Representations of Large Knowledge Graphs [article]

Mikhail Galkin, Etienne Denis, Jiapeng Wu, William L. Hamilton
2022 arXiv   pre-print
Given such a fixed-size vocabulary, it is possible to bootstrap an encoding and embedding for any entity, including those unseen during training.  ...  To this end, we propose NodePiece, an anchor-based approach to learn a fixed-size entity vocabulary.  ...  The reported results (Table 15 ) demonstrate a competitive performance of NodePiecebased models with reduced vocabulary sizes bringing more than 97% Hits@10 across graphs of different sizes.  ... 
arXiv:2106.12144v2 fatcat:qdvxud32kvf5xiolyl7qs7xy2a

The devil is in the details: an evaluation of recent feature encoding methods

Ken Chatfield, Victor Lempitsky, Andrea Vedaldi, Andrew Zisserman
2011 Procedings of the British Machine Vision Conference 2011  
While several authors have reported very good results on the challenging PASCAL VOC classification data by means of these new techniques, differences in the feature computation and learning algorithms,  ...  the vocabulary size even further.  ...  Vocabulary size. The PASCAL experiments clearly demonstrate that larger vocabularies lead to higher accuracy.  ... 
doi:10.5244/c.25.76 dblp:conf/bmvc/ChatfieldLVZ11 fatcat:p57ppfrztvab5ps7nxewtu6bhy

Simple Open-Vocabulary Object Detection with Vision Transformers [article]

Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai (+2 others)
2022 arXiv   pre-print
For object detection, pre-training and scaling approaches are less well established, especially in the long-tailed and open-vocabulary setting, where training data is relatively scarce.  ...  Our analysis of the scaling properties of this setup shows that increasing image-level pre-training and model size yield consistent improvements on the downstream detection task.  ...  We would like to thank Sunayana Rane and Rianne van den Berg for help with the DETR implementation, Lucas Beyer for the data deduplication code, and Yi Tay for useful advice.  ... 
arXiv:2205.06230v2 fatcat:3vvmsbpiwng3lbk5kl3se2uv3q

N-gram Language Modeling using Recurrent Neural Network Estimation [article]

Ciprian Chelba, Mohammad Norouzi, Samy Bengio
2017 arXiv   pre-print
Experiments on a small corpus (UPenn Treebank, one million words of training data and 10k vocabulary) have found the LSTM cell with dropout to be the best model for encoding the n-gram state when compared  ...  Using multinomial distributions as targets in training instead of the usual one-hot target is only slightly beneficial for low n-gram orders.  ...  We have evaluated the impact of reducing the segment length dramatically, e.g. 4 instead of 35.  ... 
arXiv:1703.10724v2 fatcat:xfe4updqb5bx3bov4fxefq6nyq

Improved topic-dependent language modeling using information retrieval techniques

M. Mahajan, D. Beeferman, X.D. Huang
1999 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258)  
The proposed method can reduce the perplexity of the baseline language model by 37%, indicating the predictive power of the topic-dependent language model.  ...  This results in an increase in the size of the model, the data required to train it and the number of states the search algorithm must maintain while using the language model in search process.  ...  [13] are robust and can be automatically trained from a large text corpus. However, number of parameters in a Ngram language model increases dramatically with increasing N.  ... 
doi:10.1109/icassp.1999.758182 dblp:conf/icassp/MahajanBH99 fatcat:yhjsv7d7bzhilfsprv3gujkbgu

Simplifying long short-term memory acoustic models for fast training and decoding

Yajie Miao, Jinyu Li, Yongqiang Wang, Shi-Xiong Zhang, Yifan Gong
2016 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
In the experiments, model simplifications reduce the size of LSTM models by 26%, resulting in a simpler model structure.  ...  To accelerate decoding of LSTMs, we propose to apply frame skipping during training, and frame skipping and posterior copying (FSPC) during decoding.  ...  For each gate, we compute two statistics: the fraction of the data points on which this gate is right-saturated and the faction on which it is left-saturated.  ... 
doi:10.1109/icassp.2016.7472084 dblp:conf/icassp/MiaoLWZG16 fatcat:oewazbu2bna2fms7vtc7txvybu

High-precision phrase-based document classification on a modern scale

Ron Bekkerman, Matan Gavish
2011 Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11  
to grow linearly with the size of the vocabulary.  ...  We deployed the PBC system on the task of job title classification, as a part of LinkedIn's data standardization effort.  ...  The test set of the LinkedIn data (see Section 5) holds the near-sufficiency property, in the sense that I(T ;C) I(D;C) > 0.9 for a labeled data set of size 19000, controlled vocabulary of size 1485, and  ... 
doi:10.1145/2020408.2020449 dblp:conf/kdd/BekkermanG11 fatcat:ngcypvqvyrbl3goz3stxcdjbiu

ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models

Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel
2022 Transactions of the Association for Computational Linguistics  
As part of our contribution, we release a new set of pre-trained byte-level Transformer models based on the T5 architecture, as well as all code and data used in our experiments.1  ...  Most widely used pre-trained language models operate on sequences of tokens corresponding to word or subword units.  ...  On the most realistic in-language setting, where some gold training data is available in all languages, ByT5 surpasses the previous state-of-art mT5 on all tasks and model sizes.  ... 
doi:10.1162/tacl_a_00461 fatcat:ywt5xoc7hjclfheaw2kbi5hhre

Read and Attend: Temporal Localisation in Sign Language Videos

Gul Varol, Liliane Momeni, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman
2021 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
We show that through this training it acquires the ability to attend to a large vocabulary of sign instances in the input sequence, enabling their localisation.  ...  ; (3) we collect a set of 37K manually verified sign instances across a vocabulary of 950 sign classes to support our study of sign language recognition; (4) by training on the newly annotated data from  ...  Vocabulary size: We systematically change the train- ing vocabulary of stems by taking subsets of the full subtitle vo- cabulary.  ... 
doi:10.1109/cvpr46437.2021.01658 fatcat:mvq3rgdw4zgklmxe7c7b4ozazi

Transfer Learning for Digital Heritage Collections: Comparing Neural Machine Translation at the Subword-level and Character-level

Nikolay Banar, Karine Lasaracina, Walter Daelemans, Mike Kestemont
2020 Proceedings of the 12th International Conference on Agents and Artificial Intelligence  
Transfer learning via pre-training has become an important strategy for the efficient application of NLP methods in domains where only limited training data is available.  ...  Because unseen vocabulary is a significant issue in domain adaptation, BPE seems a better fit for transfer learning across text varieties.  ...  Hence, the length of a BPE token lies in a range from 1 character to several ones depending on the vocabulary size.  ... 
doi:10.5220/0009167205220529 dblp:conf/icaart/BanarLDK20 fatcat:67o7febh7bdrpfyx6oyx2orgoi

MusCaps: Generating Captions for Music Audio

Ilaria Manco, Emmanouil Benetos, Elio Quinton, Gyorgy Fazekas
2021 2021 International Joint Conference on Neural Networks (IJCNN)  
Through an ablation study, we unveil that this performance boost can be mainly attributed to pre-training of the audio encoder, while other design choices -modality fusion, decoding strategy and the use  ...  Our method combines convolutional and recurrent neural network architectures to jointly process audiotext inputs through a multimodal encoder and leverages pretraining on audio data to obtain representations  ...  Text Embedding The text input is tokenized and encoded through an embedding matrix of dimensions V × d, where V is the vocabulary size and d is the word embedding dimension.  ... 
doi:10.1109/ijcnn52387.2021.9533461 fatcat:vitpjpukovchvlmpmg7mz4ecry

ByT5: Towards a token-free future with pre-trained byte-to-byte models [article]

Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel
2022 arXiv   pre-print
As part of our contribution, we release a new set of pre-trained byte-level Transformer models based on the T5 architecture, as well as all code and data used in our experiments.  ...  Most widely-used pre-trained language models operate on sequences of tokens corresponding to word or subword units.  ...  On the most realistic in-language setting, where some gold training data is available in all languages, ByT5 surpasses the previous state-of-art mT5 on all tasks and model sizes.  ... 
arXiv:2105.13626v3 fatcat:y3qx7yefxnglvaiputlsex7afi

Read and Attend: Temporal Localisation in Sign Language Videos [article]

Gül Varol, Liliane Momeni, Samuel Albanie, Triantafyllos Afouras, Andrew Zisserman
2021 arXiv   pre-print
We show that through this training it acquires the ability to attend to a large vocabulary of sign instances in the input sequence, enabling their localisation.  ...  ; (3) we collect a set of 37K manually verified sign instances across a vocabulary of 950 sign classes to support our study of sign language recognition; (4) by training on the newly annotated data from  ...  Vocabulary size: We systematically change the train- ing vocabulary of stems by taking subsets of the full subtitle vo- cabulary.  ... 
arXiv:2103.16481v1 fatcat:g2rzo4sc5faozaxrvc7i2p7y2q
« Previous Showing results 1 — 15 out of 5,252 results