A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Model-based Word Embeddings from Decompositions of Count Matrices
2015
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
This work develops a new statistical understanding of word embeddings induced from transformed count data. ...
Using the class of hidden Markov models (HMMs) underlying Brown clustering as a generative model, we demonstrate how canonical correlation analysis (CCA) and certain count transformations permit efficient ...
This work was made possible by a research grant from Bloomberg's Knowledge Engineering team. ...
doi:10.3115/v1/p15-1124
dblp:conf/acl/StratosCH15
fatcat:mah6l5q4wnhs7jsmaaon33ijci
Continuous Word Embedding Fusion via Spectral Decomposition
2018
Proceedings of the 22nd Conference on Computational Natural Language Learning
In this paper, we present an efficient method for including new words from a specialized corpus, containing new words, into pre-trained generic word embeddings. ...
We build on the established view of word embeddings as matrix factorizations to present a spectral algorithm for this task. ...
The approach is based on a small corpus (containing the new words) and the pre-trained word embedding vectors. ...
doi:10.18653/v1/k18-1002
dblp:conf/conll/FuZM18
fatcat:llxnmhhtmbcnpc64j64wdamr4i
Improving Word Embedding Factorization for Compression Using Distilled Nonlinear Neural Decomposition
[article]
2020
arXiv
pre-print
In this paper, we propose Distilled Embedding, an (input/output) embedding compression method based on low-rank matrix decomposition and knowledge distillation. ...
Embedding matrices, typically, contain most of the parameters for language models and about a third for machine translation systems. ...
Step 2) Initializing the Weights of Funneling Decomposition Layer We extract the trained embedding matrix E from Step 1 and train our decomposed matrices U and V on reconstruction loss defined in Equation ...
arXiv:1910.06720v2
fatcat:7za6klgsrjdknbtc6da4hjoqtq
Learning Cross-lingual Word Embeddings via Matrix Co-factorization
2015
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
We explicitly define monolingual training objectives in the form of matrix decomposition, and induce cross-lingual constraints for simultaneously factorizing monolingual matrices. ...
The cross-lingual constraints can be derived from parallel corpora, with or without word alignments. ...
Acknowledgments This research is supported by the 973 Program (No. 2014CB340501) and the National Natural Science Foundation of China (NSFC No. 61133012, 61170196 & 61202140). ...
doi:10.3115/v1/p15-2093
dblp:conf/acl/ShiLLS15
fatcat:av5lk72avvex3ked7wsy6eivtq
Rotations and Interpretability of Word Embeddings: The Case of the Russian Language
[chapter]
2017
Lecture Notes in Computer Science
Consider a continuous word embedding model. Usually, the cosines between word vectors are used as a measure of similarity of words. ...
These cosines do not change under orthogonal transformations of the embedding space. ...
Singular vector decomposition is the core of count-based models. To our knowledge, the only paper where SVD was applied to predict-based word embedding matrices is [24] . ...
doi:10.1007/978-3-319-73013-4_11
fatcat:tjccfdyw6zh5des2onsn4x7quu
Translation Invariant Word Embeddings
2015
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
the number of languages being embedded). ...
This work focuses on the task of finding latent vector representations of the words in a corpus. In particular, we address the issue of what to do when there are multiple languages in the corpus. ...
IIS-1247489, IIS-1247632, and a gift from Google. This work was also supported in part by a fellowship to Kejun Huang from the University of Minnesota Informatics Institute. ...
doi:10.18653/v1/d15-1127
dblp:conf/emnlp/HuangGPFSMTF15
fatcat:nui7oghbwbhgjl2s7ac2idhuki
Semi-Supervised Multi-aspect Detection of Misinformation using Hierarchical Joint Decomposition
[article]
2021
arXiv
pre-print
2) We introduce a principled tensor based embedding framework that combines all those aspects effectively. ...
We propose HiJoD a 2-level decomposition pipeline which not only outperforms state-of-the-art methods with F1-scores of 74% and 81% on Twitter and Politifact datasets respectively but also is an order ...
In this approach, a bag of words embedding is used to model content-based information, while in this work, we leverage a tensor model i.e., TTA which not only enables us to model textual information, but ...
arXiv:2005.04310v2
fatcat:bhll6g5zznhqxjsdjs2kez7s5i
Low-Rank Tensors for Verbs in Compositional Distributional Semantics
2015
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Several compositional distributional semantic methods use tensors to model multi-way interactions between vectors. ...
In this paper, we investigate whether we can match the performance of full tensors with low-rank approximations that use a fraction of the original number of parameters. ...
., 2014) has used prediction-based vectors for words in a tensor-based CDS model, ours uses prediction-based vectors for both words and phrases to train a tensor regression model. ...
doi:10.3115/v1/p15-2120
dblp:conf/acl/FriedPC15
fatcat:p2x3ygjg6vdeleqk3evwwkctli
GMEmbeddings: An R Package to Apply Embedding Techniques to Microbiome Data
2022
Frontiers in Bioinformatics
In this study, we address these limitations by applying word embedding algorithms (GloVe) and PCA transformation to ASV data from the American Gut Project and generating translation matrices that can be ...
The GMEmbeddings R package contains GloVe and PCA embedding transformation matrices at 50, 100 and 250 dimensions, each learned using ∼15,000 samples from the American Gut Project. ...
F1Score 2*(recall)*(precision) (recall)+(precision)
RESULTS From the sequence counts from the American Gut Project (AGP), we created GloVe and PCA based embedding transformation matrices at 50, 100, ...
doi:10.3389/fbinf.2022.828703
fatcat:fbv5cda6mbda7ennjuloi2qqsi
Synonym Discovery with Etymology-based Word Embeddings
[article]
2017
arXiv
pre-print
We propose a novel approach to learn word embeddings based on an extended version of the distributional hypothesis. ...
its dimensionality, (3) using columns/rows of the resulting matrices as embedding vectors. ...
Learning word embeddings To obtain the word embeddings from the graphs, truncated Singular Value Decomposition (SVD) was applied to their biadjacency matrices [17] . ...
arXiv:1709.10445v2
fatcat:csqiasmqljhhtmyngyoparwlpu
Linear functional organization of the omic embedding space
2021
Bioinformatics
To decipher this new information, we introduce algorithms based on network embeddings. ...
Results We generate the embeddings by decomposing these matrices with Non-Negative Matrix Tri-Factorization. ...
order based on their cosine distance from the average embedding of cancer drivers. ...
doi:10.1093/bioinformatics/btab487
pmid:34213534
pmcid:PMC8570782
fatcat:ywbibdbmi5aj3ait3khnx6f5ce
Explaining and Generalizing Skip-Gram through Exponential Family Principal Component Analysis
2017
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
The popular skip-gram model induces word embeddings by exploiting the signal from word-context coocurrence. ...
We offer a new interpretation of skip-gram based on exponential family PCA-a form of matrix factorization. ...
The views and conclusions contained in this publication are those of the authors and should not be interpreted as representing official policies or endorsements of DARPA or the U.S. Government. ...
doi:10.18653/v1/e17-2028
dblp:conf/eacl/EisnerDCP17
fatcat:2iw6hdaplvfffip7nk54fr23ye
Learning Semantic Composition to Detect Non-compositionality of Multiword Expressions
2015
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
Non-compositionality of multiword expressions is an intriguing problem that can be the source of error in a variety of NLP tasks such as language generation, machine translation and word sense disambiguation ...
We explore a range of distributional vector-space models for semantic composition, empirically evaluate these models, and propose additional methods which improve results further. ...
All of the models mentioned so far are based on conventional 4 or count based vector space representation of the words. ...
doi:10.18653/v1/d15-1201
dblp:conf/emnlp/YazdaniFH15
fatcat:3yo5mml7rvfytee33mx5tdtwt4
Domain and Speaker Adaptation for Cortana Speech Recognition
2018
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
First, we present anchor-based speaker adaptation by extracting the speaker information, i-vector or d-vector embeddings, from the anchor segments of 'Hey Cortana'. ...
The anchor embeddings are mapped to layer-wise parameters to control the transformations of both weight matrices and biases of multiple layers. ...
Table 2 shows the list of top 10 words ranked by word count changes from the SI model to the model with layer L8 updated (ρ = 1) on the Hey Cortana desktop test set. ...
doi:10.1109/icassp.2018.8461553
dblp:conf/icassp/ZhaoLZCG18
fatcat:mirgz2h47fgolmgsyox3eoioju
Compression of recurrent neural networks for efficient language modeling
2019
Applied Soft Computing
We focus on effective compression methods in the context of their exploitation on devices: pruning, quantization, and matrix decomposition approaches (low-rank factorization and tensor train decomposition ...
It has been shown in the experimental study with the Penn Treebank (PTB) dataset that the most efficient results in terms of speed and compression-perplexity balance are obtained by matrix decomposition ...
The work of A.V. Savchenko and D.I. ...
doi:10.1016/j.asoc.2019.03.057
fatcat:msw6p77rlfamxc2xwvbl7eyk6q
« Previous
Showing results 1 — 15 out of 19,663 results