544 Hits in 8.7 sec

Building a Monolingual Parallel Corpus for Text Simplification Using Sentence Similarity Based on Alignment between Word Embeddings

Tomoyuki Kajiwara, Mamoru Komachi
2016 International Conference on Computational Linguistics  
To obviate the need for human annotation, we propose an unsupervised method that automatically builds the monolingual parallel corpus for text simplification using sentence similarity based on word embeddings  ...  For any sentence pair comprising a complex sentence and its simple counterpart, we employ a many-to-one method of aligning each word in the complex sentence with the most similar word in the simple sentence  ...  Acknowledgements This research was (partly) supported by Grant-in-Aid for Research on Priority Areas, Tokyo Metropolitan University, Research on social bigdata.  ... 
dblp:conf/coling/KajiwaraK16 fatcat:g5dx6ql3jvefzoiedft2pmzmjy

HCCL at SemEval-2017 Task 2: Combining Multilingual Word Embeddings and Transliteration Model for Semantic Similarity

Junqing He, Long Wu, Xuemin Zhao, Yonghong Yan
2017 Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)  
Our results are produced using only monolingual Wikipedia corpora and a limited amount of sentence-aligned data.  ...  In this paper, we introduce an approach to combining word embeddings and machine translation for multilingual semantic word similarity, the task2 of SemEval-2017.  ...  Acknowledgments We genuinely appreciate Omer Levy for his advice on the monolingual subtask.  ... 
doi:10.18653/v1/s17-2033 dblp:conf/semeval/HeWZY17 fatcat:e66q7na3nbdwbprczhlkcld5nu

An Unsupervised Method for Building Sentence Simplification Corpora in Multiple Languages [article]

Xinyu Lu and Jipeng Qiang and Yun Li and Yunhao Yuan and Yi Zhu
2021 arXiv   pre-print
The availability of parallel sentence simplification (SS) is scarce for neural SS modelings.  ...  The building SS corpora with an unsupervised approach can satisfy the expectations that the aligned sentences preserve the same meanings and have difference in text complexity levels.  ...  Some work Kajiwara and Komachi (2018) ; Martin et al. (2020b) built pseudo parallel corpora by searching the nearest neighbor sentence for each sentence based on embedding model from a largescale text  ... 
arXiv:2109.00165v1 fatcat:swgwctaobbek5dhktltcnrgryy

Parallel Sentence Retrieval From Comparable Corpora for Biomedical Text Simplification

Rémi Cardon, UMR CNRS 8163 – STL, Lille, France, Natalia Grabar
2019 Proceedings - Natural Language Processing in a Deep Learning World  
Parallel sentences provide semantically similar information which can vary on a given dimension, such as language or register.  ...  Our results show that the method we present here can be used to automatically generate a corpus of parallel sentences from our comparable corpus.  ...  The authors would like to thank the reviewers for their helpful comments.  ... 
doi:10.26615/978-954-452-056-4_020 dblp:conf/ranlp/CardonG19 fatcat:lxsomj3okvctzfwqeew5cgg4zm

A Survey on Text Simplification [article]

Punardeep Sikka, Vijay Mago
2022 arXiv   pre-print
We note that the research in the field has clearly shifted towards utilizing deep learning techniques to perform TS, with a specific focus on developing solutions to combat the lack of data available for  ...  We also include a discussion of datasets and evaluations metrics commonly used, along with discussion of related fields within Natural Language Processing (NLP), like semantic similarity.  ...  ACKNOWLEDGMENTS We would like to thank Canada Revenue Agency (CRA) for providing funding for our research in Text Simplification.  ... 
arXiv:2008.08612v3 fatcat:wcxgc6l4ajhtdjefl7pcpitnsu

Monolingual sentence matching for text simplification [article]

Yonghui Huang, Yunhui Li, Yi Luan
2018 arXiv   pre-print
This work improves monolingual sentence alignment for text simplification, specifically for text in standard and simple Wikipedia.  ...  We introduce a convolutional neural network structure to model similarity between two sentences.  ...  Based on this setting of using Wikipedia on text simplification, the goal of this project is to find sentence level alignment.  ... 
arXiv:1809.08703v1 fatcat:s54copfprvcope2odr32jmelw4

Towards Arabic Sentence Simplification via Classification and Generative Approaches [article]

Nouran Khallaf, Serge Sharoff
2022 arXiv   pre-print
well as a model of fastText word embeddings; and (ii) a generative approach, a Seq2Seq technique by applying a multilingual Text-to-Text Transfer Transformer mT5.  ...  This paper presents an attempt to build a Modern Standard Arabic (MSA) sentence-level simplification system.  ...  This research is a part of PhD project funded by Newton-Mosharafa Fund.  ... 
arXiv:2204.09292v1 fatcat:kw4r3dldercoji77lcu3n5ta2i

PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification

Dominique Brunato, Andrea Cimino, Felice Dell'Orletta, Giulia Venturi
2016 Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing  
To build the resource we develop a new method for automatically acquiring a corpus of complex-simple paired sentences able to intercept structural transformations and particularly suitable for text simplification  ...  In this paper we present PaCCSS-IT, a Parallel Corpus of Complex-Simple Sentences for ITalian.  ...  Introduction The availability of monolingual parallel corpora is a prerequisite for research on automatic text simplification (ATS), i.e. the task of reducing sentence complexity by preserving the original  ... 
doi:10.18653/v1/d16-1034 dblp:conf/emnlp/BrunatoCDV16 fatcat:kho736tzhbcmfpyvb32nzxalfy

A Survey Of Cross-lingual Word Embedding Models [article]

Sebastian Ruder, Ivan Vulić, Anders Søgaard
2019 arXiv   pre-print
Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models  ...  In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions.  ...  Acknowledgements We thank the anonymous reviewers for their valuable and comprehensive feedback.  ... 
arXiv:1706.04902v3 fatcat:lts6uop77zaazhzlbygqmdsama


Fatima Al-Raisi, Abdelwahab Bourai, Weijian Lin
2018 Figshare  
We share a large coverage phrase dictionary for Arabic and contribute a large parallel monolingual corpus that can be used in developing new seq-to-seq models for paraphrasing.  ...  This is the first work on sentence level paraphrase generation for Arabic and the first using neural models to generate paraphrased sentences for Arabic.  ...  We also plan to use it at word subunits such as morphemes and even character level especially for surface distance comparison in morphology rich languages like Arabic.  ... 
doi:10.6084/m9.figshare.6700718 fatcat:jcztjf7bdbc4xdpkc7kz7v4dsi

A Survey of Cross-lingual Word Embedding Models

Sebastian Ruder, Ivan Vulić, Anders Søgaard
2019 The Journal of Artificial Intelligence Research  
Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models  ...  In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions.  ...  Acknowledgements We thank the anonymous reviewers and the editors for their valuable and comprehensive feedback.  ... 
doi:10.1613/jair.1.11640 fatcat:vwlgtzzmhfdlnlyaokx2whxgva

Bilingual Language Model for English Arabic Technical Translation

Marwa Refaie, Ibrahim Imam, Ibrahim Eissa
2015 The Egyptian Journal of Language Engineering  
data, adjusting weights for the words on different domains to learn the model how to differentiate between different meaning of same word in different domains.  ...  The massive fast of new scientific publications increase the need to a reliable effective automatic machine translation (AMT) system, which translates from English, as the common language of publications  ...  The motivation based on that with accurate training of the word embedding, similar semantically or grammatically words will be mapped to similar points in the continuous space.  ... 
doi:10.21608/ejle.2015.60193 fatcat:eqo6sttq5jftrnkrpx2g6ftowa

Learning Simplifications for Specific Target Audiences

Carolina Scarton, Lucia Specia
2018 Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)  
Most recent work is based on sequence-to-sequence neural models similar to those used for machine translation (MT).  ...  Text simplification (TS) is a monolingual text-to-text transformation task where an original (complex) text is transformed into a target (simpler) text.  ...  In order to build MT-based models, a parallel corpus of original texts with their simplified counterparts is needed.  ... 
doi:10.18653/v1/p18-2113 dblp:conf/acl/ScartonS18 fatcat:a7thrip4pjcqdgl6chsp7c23d4

Simplification Using Paraphrases and Context-Based Lexical Substitution

Reno Kriz, Eleni Miltsakaki, Marianna Apidianaki, Chris Callison-Burch
2018 Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)  
We propose a complex word identification (CWI) model that exploits both lexical and contextual features, and a simplification mechanism which relies on a wordembedding lexical substitution model to replace  ...  The results show that our models are able to detect complex words with higher accuracy than other commonly used methods, and propose good simplification substitutes in context.  ...  Acknowledgements We would like to thank the anonymous reviewers for their helpful comments and feedback on this work, and Anne Cocos for sharing with us her implementation of the AddCos model with PPDB  ... 
doi:10.18653/v1/n18-1019 dblp:conf/naacl/KrizMAC18 fatcat:bqj5g2qzcfgxxdx72wkax6ahje

A multi-lingual and cross-domain analysis of features for text simplification

Regina Stodden, Laura Kallmeyer
2020 International Conference on Language Resources and Evaluation  
In text simplification and readability research, several features have been proposed to estimate or simplify a complex text, e.g., readability scores, sentence length, or proportion of POS tags.  ...  Our multi-lingual and multi-domain corpus analysis shows that the relevance of different features for text simplification is different per corpora, language, and domain.  ...  Word Embedding Features. The similarity between the complex and the simplified text (Martin et al., 2018) is measured using pre-trained FastText embeddings (Grave et al., 2018) .  ... 
dblp:conf/lrec/StoddenK20 fatcat:ux7wkyzqm5a7hm3wuipsuzutte
« Previous Showing results 1 — 15 out of 544 results