Filters








14 Hits in 1.4 sec

MASSAlign: Alignment and Annotation of Comparable Documents

Gustavo Henrique Paetzold, Fernando Alva-Manchego, Lucia Specia
2017 Zenodo  
Conference paper: MASSAlign: Alignment and Annotation of Comparable Documents  ...  Participants will be able to test MASSAlign by producing and displaying alignments and annotations for different kinds of comparable documents on the fly.  ...  Paragraph and Sentence Alignment The alignment module of MASSAlign finds equivalent paragraphs and sentences in comparable documents.  ... 
doi:10.5281/zenodo.1040791 fatcat:xh6b3m6cljdvpfxjih2daoyp6e

Neural CRF Model for Sentence Alignment in Text Simplification [article]

Chao Jiang, Mounica Maddela, Wuwei Lan, Yang Zhong, Wei Xu
2021 arXiv   pre-print
We apply our CRF aligner to construct two new text simplification datasets, Newsela-Auto and Wiki-Auto, which are much larger and of better quality compared to the existing datasets.  ...  To evaluate and improve sentence alignment quality, we create two manually annotated sentence-aligned datasets from two commonly used text simplification corpora, Newsela and Wikipedia.  ...  The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of NSF, ODNI, IARPA, ARO  ... 
arXiv:2005.02324v4 fatcat:qabsa6oms5evredcj6wczbzopi

Controlling Text Complexity in Neural Machine Translation

Sweta Agrawal, Marine Carpuat
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
We collect a high quality dataset of news articles available in English and Spanish, written for diverse grade levels and propose a method to align segments across comparable bilingual articles.  ...  This work introduces a machine translation task where the output is aimed at audiences of different levels of target language proficiency.  ...  A similarity matrix is created between the paragraphs/sentences of aligned documents/paragraphs using a standard bag-of-words TF-IDF model.  ... 
doi:10.18653/v1/d19-1166 dblp:conf/emnlp/AgrawalC19 fatcat:rkm4sxgd45b6djvmql57dklw7e

Controlling Text Complexity in Neural Machine Translation [article]

Sweta Agrawal, Marine Carpuat
2019 arXiv   pre-print
We collect a high quality dataset of news articles available in English and Spanish, written for diverse grade levels and propose a method to align segments across comparable bilingual articles.  ...  This work introduces a machine translation task where the output is aimed at audiences of different levels of target language proficiency.  ...  A similarity matrix is created between the paragraphs/sentences of aligned documents/paragraphs using a standard bag-of-words TF-IDF model.  ... 
arXiv:1911.00835v1 fatcat:3r6evbrf4zflhiiyxcxfszm364

EASSE: Easier Automatic Sentence Simplification Evaluation [article]

Fernando Alva-Manchego, Louis Martin, Carolina Scarton, Lucia Specia
2019 arXiv   pre-print
Through experiments, we show that these functionalities allow for better comparison and understanding of the performance of SS systems.  ...  We introduce EASSE, a Python package aiming to facilitate and standardise automatic evaluation and comparison of Sentence Simplification (SS) systems.  ...  Since there is no available simplification dataset with manual annotations of the transformations performed, we re-use the annotation algorithms from MASSAlign .  ... 
arXiv:1908.04567v2 fatcat:bhqcyoexnvhm7lfwkcwiw6c4nm

EASSE: Easier Automatic Sentence Simplification Evaluation

Fernando Alva-Manchego, Louis Martin, Carolina Scarton, Lucia Specia
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations  
Through experiments, we show that these functionalities allow for better comparison and understanding of the performance of SS systems.  ...  We introduce EASSE, a Python package aiming to facilitate and standardise automatic evaluation and comparison of Sentence Simplification (SS) systems.  ...  Since there is no available simplification dataset with manual annotations of the transformations performed, we re-use the annotation algorithms from MASSAlign .  ... 
doi:10.18653/v1/d19-3009 dblp:conf/emnlp/Alva-ManchegoMS19 fatcat:32jspqvtynfp3jpagj5zecvqza

Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

Fernando Alva-Manchego, Joachim Bingel, Gustavo Henrique Paetzold, Carolina Scarton, Lucia Specia
2017 Zenodo  
Current research in text simplification has been hampered by two central problems: (i) the small amount of high-quality parallel simplification data available, and (ii) the lack of explicit annotations  ...  We devise a way to automatically identify operations in a parallel corpus and introduce a sequence-labeling approach based on these annotations.  ...  Their algorithms search for the best alignment path between the paragraphs and sentences of parallel documents based on TF-IDF cosine similarity and an incremental vicinity search range.  ... 
doi:10.5281/zenodo.1042505 fatcat:vcmaka3d7fgxdiclvdx4qxo4f4

Data-Driven Sentence Simplification: Survey and Benchmark

Fernando Alva-Manchego, Carolina Scarton, Lucia Specia
2020 Computational Linguistics  
We also include a benchmark of different approaches on common datasets so as to compare them and highlight their strengths and limitations.  ...  In this article, we survey research on SS, focusing on approaches that attempt to learn how to simplify using corpora of aligned original-simplified sentence pairs in English, which is the dominant paradigm  ...  However, there is no available simplification corpus with such a type of information. As a work-around, we use the annotator module of MASSAlign .  ... 
doi:10.1162/coli_a_00370 fatcat:k7mlggplrreudk5pgq62x2fmva

A Survey on Text Simplification [article]

Punardeep Sikka, Vijay Mago
2020 arXiv   pre-print
This survey seeks to provide a comprehensive overview of TS, including a brief description of earlier approaches used, discussion of various aspects of simplification (lexical, semantic and syntactic),  ...  We also include a discussion of datasets and evaluations metrics commonly used, along with discussion of related fields within Natural Language Processing (NLP), like semantic similarity.  ...  Also, we would like to thank Lakehead University, DaTA Lab and CASES Bldg. for providing invaluable support and resources to make our research possible.  ... 
arXiv:2008.08612v2 fatcat:ki3l25kwr5hhpprxhuxr7b672a

Tutorial: Data-driven text simplification

Horacio Saggion, Sanja Štajner
2019 Zenodo  
We will present all the existing resources for TS for various languages, including parallel manually produced TS corpora, comparable automatically aligned TS corpora, paraphrase- and synonym- resources  ...  , TS-specific sentence-alignment tools, and several TS evaluation resources.  ...  [Barzilay and Elhadad(2003)] Regina Barzilay and Noemie Elhadad. Sentence alignment for monolingual comparable corpora.  ... 
doi:10.5281/zenodo.2593328 fatcat:jqzdbcgskveorcml7ogqwkzuue

Parallel Text Alignment and Monolingual Parallel Corpus Creation from Philosophical Texts for Text Simplification

Stefan Paun
2021 Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop   unpublished
We propose a new unsupervised method for aligning text based on Doc2Vec embeddings and a new alignment algorithm, capable of aligning texts at different levels.  ...  Training text simplification algorithms generally requires a lot of annotated data, however there are not many corpora suitable for this task.  ...  corpus and compare the number and quality of alignments obtained against already established methods.  ... 
doi:10.18653/v1/2021.naacl-srw.6 fatcat:snkimhgf65ehxh5mdwqhpyu42u

A novel informatics concept for high-throughput shotgun lipidomics based on the molecular fragmentation query language

Ronny Herzog, Dominik Schwudke, Kai Schuhmann, Julio L Sampaio, Stefan R Bornstein, Michael Schroeder, Andrej Shevchenko
2011 Genome Biology  
Shotgun lipidome profiling relies on direct mass spectrometric analysis of total lipid extracts from cells, tissues or organisms and is a powerful tool to elucidate the molecular composition of lipidomes  ...  We present a novel informatics concept of the molecular fragmentation query language implemented within the LipidXplorer open source software kit that supports accurate quantification of individual species  ...  Acknowledgements We are grateful to our colleagues in MPI CBG, Technical University of Dresden and University of Heidelberg for valuable discussions and betatesting of LipidXplorer software; to Mrs Kathy  ... 
doi:10.1186/gb-2011-12-1-r8 pmid:21247462 pmcid:PMC3091306 fatcat:pww7fjmjbjaapbrp6cummurzfu

A Corpus for Automatic Readability Assessment and Text Simplification of German

Alessia Battisti, Dominik Pfütze, Andreas Säuberli, Marek Kostrzewa, Sarah Ebling
2020
The corpus is compiled from web sources and consists of parallel as well as monolingual-only (simplified German) data amounting to approximately 6,200 documents (nearly 211,000 sentences).  ...  The corpus is compiled from web sources and consists of parallel as well as monolingual-only (simplified German) data amounting to approximately 6,200 documents (nearly 211,000 sentences).  ...  al., 2018) and MASSAlign (Paetzold et al., 2017) .  ... 
doi:10.5167/uzh-192839 fatcat:rnmn4zbkqvct5jfxuqo67lz6gy

Neural CRF Model for Sentence Alignment in Text Simplification

Chao Jiang, Mounica Maddela, Wuwei Lan, Yang Zhong, Wei Xu
2020 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics   unpublished
We apply our CRF aligner to construct two new text simplification datasets, NEWSELA-AUTO and WIKI-AUTO, which are much larger and of better quality compared to the existing datasets.  ...  To evaluate and improve sentence alignment quality, we create two manually annotated sentence-aligned datasets from two commonly used text simplification corpora, Newsela and Wikipedia.  ...  The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of NSF, ODNI, IARPA, ARO  ... 
doi:10.18653/v1/2020.acl-main.709 fatcat:sh3dwgk4yrcjxb3sc7bhfenk6u