Filters








2,930 Hits in 3.0 sec

Parameter-free Sentence Embedding via Orthogonal Basis [article]

Ziyi Yang, Chenguang Zhu, Weizhu Chen
2019 arXiv   pre-print
Following this motivation, we develop an innovative method based on orthogonal basis to combine pre-trained word embeddings into sentence representations.  ...  Inspired by the Gram-Schmidt Process in geometric theory, we build an orthogonal basis of the subspace spanned by a word and its surrounding context in a sentence.  ...  Our sentence embedding evolves from the new orthogonal basis vector brought in by each word, which represents novel semantic meaning.  ... 
arXiv:1810.00438v2 fatcat:ufed7mjvvvhzvouakbhlxq6czu

Parameter-free Sentence Embedding via Orthogonal Basis

Ziyi Yang, Chenguang Zhu, Weizhu Chen
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
Following this motivation, we develop an innovative method based on orthogonal basis to combine pre-trained word embeddings into sentence representations.  ...  Inspired by the Gram-Schmidt Process in geometric theory, we build an orthogonal basis of the subspace spanned by a word and its surrounding context in a sentence.  ...  Our sentence embedding evolves from the new orthogonal basis vector brought in by each word, which represents novel semantic meaning.  ... 
doi:10.18653/v1/d19-1059 dblp:conf/emnlp/YangZC19 fatcat:r7qq2t3w2bajhcmp7jr76hiipa

Convolution Aware Initialization [article]

Armen Aghajanyan
2017 arXiv   pre-print
The initialization scheme devised by He et al, allowed convolution activations to carry a constrained mean which allowed deep networks to be trained effectively (He et al., 2015a).  ...  Orthogonal initializations and more generally orthogonal matrices in standard recurrent networks have been proved to eradicate the vanishing and exploding gradient problem (Pascanu et al., 2012).  ...  Instead of forming an arbitrary basis we focus on forming an orthogonal basis using F(f i,j ).  ... 
arXiv:1702.06295v3 fatcat:3bhheda7rjh2npv2olamtnw4aa

Learning Efficient Task-Specific Meta-Embeddings with Word Prisms [article]

Jingyi He, KC Tsiolis, Kian Kenyon-Dean, Jackie Chi Kit Cheung
2020 arXiv   pre-print
Word prisms learn orthogonal transformations to linearly combine the input source embeddings, which allows them to be very efficient at inference time.  ...  Word embeddings are trained to predict word cooccurrence statistics, which leads them to possess different lexical properties (syntactic, semantic, etc.) depending on the notion of context defined at training  ...  Melamud et al. (2016) combine embeddings trained with different notions of context via concatenation, as well as via SVD and CCA, leading to improved performance in multiple downstream tasks.  ... 
arXiv:2011.02944v1 fatcat:hzceh676wzhrzan7bkbfcdhlmu

A Survey on Word Meta-Embedding Learning [article]

Danushka Bollegala, James O'Neill
2022 arXiv   pre-print
We classify ME learning methods according to multiple factors such as whether they (a) operate on static or contextualised embeddings, (b) trained in an unsupervised manner or (c) fine-tuned for a particular  ...  Meta-embedding (ME) learning is an emerging approach that attempts to learn more accurate word embeddings given existing (source) word embeddings as the sole input.  ...  Coates and Bollegala [2018] showed that when word embeddings in each source are approximately orthogonal, a condition that they empirically validate for pre-trained word embeddings, averaging can approximate  ... 
arXiv:2204.11660v1 fatcat:6qcysv7ubve3pmj5lqy7pfqmbe

A Quantum Expectation Value Based Language Model with Application to Question Answering

Qin Zhao, Chenguang Hou, Changjian Liu, Peng Zhang, Ruifeng Xu
2020 Entropy  
Words and sentences are viewed as different observables in this quantum model.  ...  While exciting progresses have been made, current studies mainly investigate the relationship between density matrices of difference sentence subspaces of a semantic Hilbert space.  ...  However, when {|ψ i } n i=1 are one-hot orthogonal basis, the density matrix reduces to a diagonal matrix with zero-valued off-diagonal elements, and this matrix corresponds to the probabilities of sememes  ... 
doi:10.3390/e22050533 pmid:33286305 fatcat:npusps54nbdv3ic7glcv7tvmze

SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models [article]

Bin Wang, C.-C. Jay Kuo
2020 arXiv   pre-print
Sentence embedding is an important research topic in natural language processing (NLP) since it can transfer knowledge to downstream tasks.  ...  Then, we propose a new sentence embedding method by dissecting BERT-based word models through geometric analysis of the space spanned by the word representation. It is called the SBERT-WK method.  ...  First, we decompose the matrix C in Eq. (4) to C = UΣV to find the orthogonal basis for the neighboring words. The orthogonal column basis for C is represented by matrix U.  ... 
arXiv:2002.06652v2 fatcat:2fbsrhqi2faq5kggngpmulhxme

SparseGAN: Sparse Generative Adversarial Network for Text Generation [article]

Liping Yuan, Jiehang Zeng, Xiaoqing Zheng
2021 arXiv   pre-print
The existing training strategies either suffer from unreliable gradient estimations or imprecise sentence representations.  ...  The key idea is that we treat an embedding matrix as an over-complete dictionary, and use a linear combination of very few selected word embeddings to approximate the output feature representation of the  ...  Low Self-BELU scores of MLE-based model reflects that the generated sentences via MLE-based training are more diverse than all GAN-based models.  ... 
arXiv:2103.11578v1 fatcat:nb5ejpmai5e2tfbz5fcpbtkrm4

Learning to Create Sentence Semantic Relation Graphs for Multi-Document Summarization [article]

Diego Antognini, Boi Faltings
2019 arXiv   pre-print
To overcome these limitations, we present a novel method, which makes use of two types of sentence embeddings: universal embeddings, which are trained on a large unrelated corpus, and domain-specific embeddings  ...  , which are learned during training.  ...  Furthermore, all edges have a weight above zero, since it is very unlikely that two sentence embeddings are completely orthogonal. To overcome this Figure 1 : Overview of SemSentSum.  ... 
arXiv:1909.12231v1 fatcat:juuj6yelfnhg3mqd76gemresym

Structure Inducing Pre-Training [article]

Matthew B. A. McDermott, Brendan Yap, Peter Szolovits, Marinka Zitnik
2022 arXiv   pre-print
on the distance or geometry between the pre-trained embeddings of two samples x⃗_i and x⃗_j.  ...  Language model pre-training and derived methods are incredibly impactful in machine learning.  ...  This is done to encourage the final per-sample representations of a single sentence embedded via two otherwise independently trained models to be similar and those of different sentences to be distinct  ... 
arXiv:2103.10334v3 fatcat:jlbzciqezjgs7a7tke4rhly4ty

A Comparative Study on Structural and Semantic Properties of Sentence Embeddings [article]

Alexander Kalinowski, Yuan An
2020 arXiv   pre-print
Sentence embeddings encode natural language sentences as low-dimensional dense vectors.  ...  A great deal of effort has been put into using sentence embeddings to improve several important natural language processing tasks.  ...  By analyzing the geometry of the subspace generated by building a matrix A = d × n for the n words in a given sentence, the authors are able to build a new orthogonal basis vector that captures the general  ... 
arXiv:2009.11226v1 fatcat:n4oqrvojgnfijjoxdo4pfbudee

Quantum-inspired Multimodal Fusion for Video Sentiment Analysis [article]

Qiuchi Li, Dimitris Gkoumas, Christina Lioma, Massimo Melucci
2021 arXiv   pre-print
When α 0 or α 1 is zero, then |φ = |0 or |1 is a basis state.  ...  On the other hand, it is not computationally affordable to ensure mutual orthogonality of measurement states during training, even though there are already algorithms for training mutually orthogonal vectors  ... 
arXiv:2103.10572v2 fatcat:hldyp5i35jhwrdtc7gvneb353m

Visualizing and Measuring the Geometry of BERT [article]

Andy Coenen, Emily Reif, Ann Yuan, Been Kim, Adam Pearce, Fernanda Viégas, Martin Wattenberg
2019 arXiv   pre-print
We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations  ...  ., e n−1 } be orthogonal unit basis vectors for R n−1 .  ...  With these labeled embeddings, we trained two L2 regularized linear classifiers via stochastic gradient descent, using [19] .  ... 
arXiv:1906.02715v2 fatcat:aydti652tfgildot2x52kc2itu

Continual and Multi-Task Architecture Search

Ramakanth Pasunuru, Mohit Bansal
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics  
on previously learned tasks (via blocksparsity and orthogonality constraints), thus enabling life-long learning.  ...  We empirically show the effectiveness of our sequential continual learning and parallel multi-task learning based architecture search approaches on diverse sentence-pair classification tasks (GLUE) and  ...  Both Condition 2.1 and 2.2 are mutually dependent, because for two matrices' product to be zero, they share basis vectors between them, i.e., for an n-dimensional space, there are n basis vectors and if  ... 
doi:10.18653/v1/p19-1185 dblp:conf/acl/PasunuruB19 fatcat:u6er62fb4zh7vnqt4fg3maikzy

ABSent: Cross-Lingual Sentence Representation Mapping with Bidirectional GANs

Zuohui Fu, Yikun Xian, Shijie Geng, Yingqiang Ge, Yuting Wang, Xin Dong, Guang Wang, Gerard De Melo
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
To this end, we propose an Adversarial Bi-directional Sentence Embedding Mapping (ABSent) framework, which learns mappings of cross-lingual sentence representations from limited quantities of parallel  ...  However, in many real-world settings, the size of parallel annotated training data is restricted. Additionally, prior cross-lingual mapping research has mainly focused on the word level.  ...  Subsequently, we evaluate the obtained transformation via a standard sentence retrieval task.  ... 
doi:10.1609/aaai.v34i05.6279 fatcat:47bpabkiezai3if73jfr4z6674
« Previous Showing results 1 — 15 out of 2,930 results