Filters








19,663 Hits in 4.9 sec

Model-based Word Embeddings from Decompositions of Count Matrices

Karl Stratos, Michael Collins, Daniel Hsu
2015 Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)  
This work develops a new statistical understanding of word embeddings induced from transformed count data.  ...  Using the class of hidden Markov models (HMMs) underlying Brown clustering as a generative model, we demonstrate how canonical correlation analysis (CCA) and certain count transformations permit efficient  ...  This work was made possible by a research grant from Bloomberg's Knowledge Engineering team.  ... 
doi:10.3115/v1/p15-1124 dblp:conf/acl/StratosCH15 fatcat:mah6l5q4wnhs7jsmaaon33ijci

Continuous Word Embedding Fusion via Spectral Decomposition

Tianfan Fu, Cheng Zhang, Stephan Mandt
2018 Proceedings of the 22nd Conference on Computational Natural Language Learning  
In this paper, we present an efficient method for including new words from a specialized corpus, containing new words, into pre-trained generic word embeddings.  ...  We build on the established view of word embeddings as matrix factorizations to present a spectral algorithm for this task.  ...  The approach is based on a small corpus (containing the new words) and the pre-trained word embedding vectors.  ... 
doi:10.18653/v1/k18-1002 dblp:conf/conll/FuZM18 fatcat:llxnmhhtmbcnpc64j64wdamr4i

Improving Word Embedding Factorization for Compression Using Distilled Nonlinear Neural Decomposition [article]

Vasileios Lioutas, Ahmad Rashid, Krtin Kumar, Md Akmal Haidar, Mehdi Rezagholizadeh
2020 arXiv   pre-print
In this paper, we propose Distilled Embedding, an (input/output) embedding compression method based on low-rank matrix decomposition and knowledge distillation.  ...  Embedding matrices, typically, contain most of the parameters for language models and about a third for machine translation systems.  ...  Step 2) Initializing the Weights of Funneling Decomposition Layer We extract the trained embedding matrix E from Step 1 and train our decomposed matrices U and V on reconstruction loss defined in Equation  ... 
arXiv:1910.06720v2 fatcat:7za6klgsrjdknbtc6da4hjoqtq

Learning Cross-lingual Word Embeddings via Matrix Co-factorization

Tianze Shi, Zhiyuan Liu, Yang Liu, Maosong Sun
2015 Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)  
We explicitly define monolingual training objectives in the form of matrix decomposition, and induce cross-lingual constraints for simultaneously factorizing monolingual matrices.  ...  The cross-lingual constraints can be derived from parallel corpora, with or without word alignments.  ...  Acknowledgments This research is supported by the 973 Program (No. 2014CB340501) and the National Natural Science Foundation of China (NSFC No. 61133012, 61170196 & 61202140).  ... 
doi:10.3115/v1/p15-2093 dblp:conf/acl/ShiLLS15 fatcat:av5lk72avvex3ked7wsy6eivtq

Rotations and Interpretability of Word Embeddings: The Case of the Russian Language [chapter]

Alexey Zobnin
2017 Lecture Notes in Computer Science  
Consider a continuous word embedding model. Usually, the cosines between word vectors are used as a measure of similarity of words.  ...  These cosines do not change under orthogonal transformations of the embedding space.  ...  Singular vector decomposition is the core of count-based models. To our knowledge, the only paper where SVD was applied to predict-based word embedding matrices is [24] .  ... 
doi:10.1007/978-3-319-73013-4_11 fatcat:tjccfdyw6zh5des2onsn4x7quu

Translation Invariant Word Embeddings

Kejun Huang, Matt Gardner, Evangelos Papalexakis, Christos Faloutsos, Nikos Sidiropoulos, Tom Mitchell, Partha P. Talukdar, Xiao Fu
2015 Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing  
the number of languages being embedded).  ...  This work focuses on the task of finding latent vector representations of the words in a corpus. In particular, we address the issue of what to do when there are multiple languages in the corpus.  ...  IIS-1247489, IIS-1247632, and a gift from Google. This work was also supported in part by a fellowship to Kejun Huang from the University of Minnesota Informatics Institute.  ... 
doi:10.18653/v1/d15-1127 dblp:conf/emnlp/HuangGPFSMTF15 fatcat:nui7oghbwbhgjl2s7ac2idhuki

Semi-Supervised Multi-aspect Detection of Misinformation using Hierarchical Joint Decomposition [article]

Sara Abdali, Neil Shah, Evangelos E. Papalexakis
2021 arXiv   pre-print
2) We introduce a principled tensor based embedding framework that combines all those aspects effectively.  ...  We propose HiJoD a 2-level decomposition pipeline which not only outperforms state-of-the-art methods with F1-scores of 74% and 81% on Twitter and Politifact datasets respectively but also is an order  ...  In this approach, a bag of words embedding is used to model content-based information, while in this work, we leverage a tensor model i.e., TTA which not only enables us to model textual information, but  ... 
arXiv:2005.04310v2 fatcat:bhll6g5zznhqxjsdjs2kez7s5i

Low-Rank Tensors for Verbs in Compositional Distributional Semantics

Daniel Fried, Tamara Polajnar, Stephen Clark
2015 Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)  
Several compositional distributional semantic methods use tensors to model multi-way interactions between vectors.  ...  In this paper, we investigate whether we can match the performance of full tensors with low-rank approximations that use a fraction of the original number of parameters.  ...  ., 2014) has used prediction-based vectors for words in a tensor-based CDS model, ours uses prediction-based vectors for both words and phrases to train a tensor regression model.  ... 
doi:10.3115/v1/p15-2120 dblp:conf/acl/FriedPC15 fatcat:p2x3ygjg6vdeleqk3evwwkctli

GMEmbeddings: An R Package to Apply Embedding Techniques to Microbiome Data

Christine Tataru, Austin Eaton, Maude M. David
2022 Frontiers in Bioinformatics  
In this study, we address these limitations by applying word embedding algorithms (GloVe) and PCA transformation to ASV data from the American Gut Project and generating translation matrices that can be  ...  The GMEmbeddings R package contains GloVe and PCA embedding transformation matrices at 50, 100 and 250 dimensions, each learned using ∼15,000 samples from the American Gut Project.  ...  F1Score 2*(recall)*(precision) (recall)+(precision) RESULTS From the sequence counts from the American Gut Project (AGP), we created GloVe and PCA based embedding transformation matrices at 50, 100,  ... 
doi:10.3389/fbinf.2022.828703 fatcat:fbv5cda6mbda7ennjuloi2qqsi

Synonym Discovery with Etymology-based Word Embeddings [article]

Seunghyun Yoon, Pablo Estrada, Kyomin Jung
2017 arXiv   pre-print
We propose a novel approach to learn word embeddings based on an extended version of the distributional hypothesis.  ...  its dimensionality, (3) using columns/rows of the resulting matrices as embedding vectors.  ...  Learning word embeddings To obtain the word embeddings from the graphs, truncated Singular Value Decomposition (SVD) was applied to their biadjacency matrices [17] .  ... 
arXiv:1709.10445v2 fatcat:csqiasmqljhhtmyngyoparwlpu

Linear functional organization of the omic embedding space

A Xenos, N Malod-Dognin, S Milinković, N Pržulj, Jonathan Wren
2021 Bioinformatics  
To decipher this new information, we introduce algorithms based on network embeddings.  ...  Results We generate the embeddings by decomposing these matrices with Non-Negative Matrix Tri-Factorization.  ...  order based on their cosine distance from the average embedding of cancer drivers.  ... 
doi:10.1093/bioinformatics/btab487 pmid:34213534 pmcid:PMC8570782 fatcat:ywbibdbmi5aj3ait3khnx6f5ce

Explaining and Generalizing Skip-Gram through Exponential Family Principal Component Analysis

Ryan Cotterell, Adam Poliak, Benjamin Van Durme, Jason Eisner
2017 Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers  
The popular skip-gram model induces word embeddings by exploiting the signal from word-context coocurrence.  ...  We offer a new interpretation of skip-gram based on exponential family PCA-a form of matrix factorization.  ...  The views and conclusions contained in this publication are those of the authors and should not be interpreted as representing official policies or endorsements of DARPA or the U.S. Government.  ... 
doi:10.18653/v1/e17-2028 dblp:conf/eacl/EisnerDCP17 fatcat:2iw6hdaplvfffip7nk54fr23ye

Learning Semantic Composition to Detect Non-compositionality of Multiword Expressions

Majid Yazdani, Meghdad Farahmand, James Henderson
2015 Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing  
Non-compositionality of multiword expressions is an intriguing problem that can be the source of error in a variety of NLP tasks such as language generation, machine translation and word sense disambiguation  ...  We explore a range of distributional vector-space models for semantic composition, empirically evaluate these models, and propose additional methods which improve results further.  ...  All of the models mentioned so far are based on conventional 4 or count based vector space representation of the words.  ... 
doi:10.18653/v1/d15-1201 dblp:conf/emnlp/YazdaniFH15 fatcat:3yo5mml7rvfytee33mx5tdtwt4

Domain and Speaker Adaptation for Cortana Speech Recognition

Yong Zhao, Jinyu Li, Shixiong Zhang, Liping Chen, Yifan Gong
2018 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
First, we present anchor-based speaker adaptation by extracting the speaker information, i-vector or d-vector embeddings, from the anchor segments of 'Hey Cortana'.  ...  The anchor embeddings are mapped to layer-wise parameters to control the transformations of both weight matrices and biases of multiple layers.  ...  Table 2 shows the list of top 10 words ranked by word count changes from the SI model to the model with layer L8 updated (ρ = 1) on the Hey Cortana desktop test set.  ... 
doi:10.1109/icassp.2018.8461553 dblp:conf/icassp/ZhaoLZCG18 fatcat:mirgz2h47fgolmgsyox3eoioju

Compression of recurrent neural networks for efficient language modeling

Artem M. Grachev, Dmitry I. Ignatov, Andrey V. Savchenko
2019 Applied Soft Computing  
We focus on effective compression methods in the context of their exploitation on devices: pruning, quantization, and matrix decomposition approaches (low-rank factorization and tensor train decomposition  ...  It has been shown in the experimental study with the Penn Treebank (PTB) dataset that the most efficient results in terms of speed and compression-perplexity balance are obtained by matrix decomposition  ...  The work of A.V. Savchenko and D.I.  ... 
doi:10.1016/j.asoc.2019.03.057 fatcat:msw6p77rlfamxc2xwvbl7eyk6q
« Previous Showing results 1 — 15 out of 19,663 results