How Well Sentence Embeddings Capture Meaning

Lyndon White, Roberto Togneri, Wei Liu, Mohammed Bennamoun
2015 Proceedings of the 20th Australasian Document Computing Symposium on ZZZ - ADCS '15  
Several approaches for embedding a sentence into a vector space have been developed. However, it is unclear to what extent the sentence's position in the vector space reflects its semantic meaning, rather than other factors such as syntactic structure. Depending on the model used for the embeddings this will vary -different models are suited for different down-stream applications. For applications such as machine translation and automated summarization, it is highly desirable to have semantic
more » ... aning encoded in the embedding. We consider this to be the quality of semantic localization for the model -how well the sentences' meanings coincides with their embedding's position in vector space. Currently the semantic localization is assessed indirectly through practical benchmarks for specific applications. In this paper, we ground the semantic localization problem through a semantic classification task. The task is to classify sentences according to their meaning. A SVM with a linear kernel is used to perform the classification using the sentence vectors as its input. The sentences from subsets of two corpora, the Microsoft Research Paraphrase corpus and the Opinosis corpus, were partitioned according to their semantic equivalence. These partitions give the target classes for the classification task. Several existing models, including URAE, PV-DM and PV-DBOW, were assessed against a bag of words benchmark.
doi:10.1145/2838931.2838932 dblp:conf/adcs/WhiteTLB15 fatcat:b756omor4jbynpdq6dtt4qdlei