Filters








59 Hits in 1.7 sec

MASSAlign: Alignment and Annotation of Comparable Documents

Gustavo Henrique Paetzold, Fernando Alva-Manchego, Lucia Specia
2017 Zenodo  
Conference paper: MASSAlign: Alignment and Annotation of Comparable Documents
doi:10.5281/zenodo.1040791 fatcat:xh6b3m6cljdvpfxjih2daoyp6e

Knowledge Distillation for Quality Estimation [article]

Amit Gajbhiye, Marina Fomicheva, Fernando Alva-Manchego, Frédéric Blain, Abiola Obamuyide, Nikolaos Aletras, Lucia Specia
2021 arXiv   pre-print
Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online social media conversations. Recent success in QE stems from the use of multilingual pre-trained representations, where very large models lead to impressive results. However, the inference time, disk and memory requirements of such models do not allow for wide usage in the real world. Models
more » ... ined on distilled pre-trained representations remain prohibitively large for many usage scenarios. We instead propose to directly transfer knowledge from a strong QE teacher model to a much smaller model with a different, shallower architecture. We show that this approach, in combination with data augmentation, leads to light-weight QE models that perform competitively with distilled pre-trained representations with 8x fewer parameters.
arXiv:2107.00411v1 fatcat:bkakpl4mwrcajnwyh5ghzw2t3m

EASSE: Easier Automatic Sentence Simplification Evaluation [article]

Fernando Alva-Manchego, Louis Martin, Carolina Scarton, Lucia Specia
2019 arXiv   pre-print
We introduce EASSE, a Python package aiming to facilitate and standardise automatic evaluation and comparison of Sentence Simplification (SS) systems. EASSE provides a single access point to a broad range of evaluation resources: standard automatic metrics for assessing SS outputs (e.g. SARI), word-level accuracy scores for certain simplification transformations, reference-independent quality estimation features (e.g. compression ratio), and standard test data for SS evaluation (e.g.
more » ... . Finally, EASSE generates easy-to-visualise reports on the various metrics and features above and on how a particular SS output fares against reference simplifications. Through experiments, we show that these functionalities allow for better comparison and understanding of the performance of SS systems.
arXiv:1908.04567v2 fatcat:bhqcyoexnvhm7lfwkcwiw6c4nm

Data-Driven Sentence Simplification: Survey and Benchmark

Fernando Alva-Manchego, Carolina Scarton, Lucia Specia
2020 Computational Linguistics  
Alva-Manchego et al. (2017) achieve the highest SARI score in the test set, and best simplicity score with human judgments.  ...  Alva-Manchego et al. (2017) model SS as a Sequence Labeling problem, identifying simplification transformations at word or phrase level.  ... 
doi:10.1162/coli_a_00370 fatcat:k7mlggplrreudk5pgq62x2fmva

Strong Baselines for Complex Word Identification across Multiple Languages [article]

Pierre Finnimore, Elisabeth Fritzsch, Daniel King, Alison Sneyd, Aneeq Ur Rehman, Fernando Alva-Manchego, Andreas Vlachos
2019 arXiv   pre-print
Complex Word Identification (CWI) is the task of identifying which words or phrases in a sentence are difficult to understand by a target audience. The latest CWI Shared Task released data for two settings: monolingual (i.e. train and test in the same language) and cross-lingual (i.e. test in a language not seen during training). The best monolingual models relied on language-dependent features, which do not generalise in the cross-lingual setting, while the best cross-lingual model used neural
more » ... networks with multi-task learning. In this paper, we present monolingual and cross-lingual CWI models that perform as well as (or better than) most models submitted to the latest CWI Shared Task. We show that carefully selected features and simple learning models can achieve state-of-the-art performance, and result in strong baselines for future development in this area. Finally, we discuss how inconsistencies in the annotation of the data can explain some of the results obtained.
arXiv:1904.05953v1 fatcat:w654do6n2rgepotwmd5yw7cw4q

The (Un)Suitability of Automatic Evaluation Metrics for Text Simplification

Fernando Alva-Manchego, Carolina Scarton, Lucia Specia
2021 Computational Linguistics  
This allows exploiting the alignment between TurkCorpus, HSplit (Sulem, Abend, and Rappoport 2018a) and ASSET (Alva-Manchego et al. 2020) to investigate  ...  More details can be found in (Alva-Manchego 2020, chap 3). 2. Determine the references to use.  ...  Alva-Manchego, Scarton and Specia (Un)Suitability of Metrics for Text Simplification Table 1 Descriptions of simplification systems included in the studied datasets.  ... 
doi:10.1162/coli_a_00418 fatcat:53a5hn2jxfgw5oepweoi65sbwm

Controllable Text Simplification with Explicit Paraphrasing [article]

Mounica Maddela, Fernando Alva-Manchego, Wei Xu
2021 arXiv   pre-print
However, these systems mostly rely on deletion and tend to generate very short outputs at the cost of meaning preservation (Alva-Manchego et al., 2017) .  ...  Following neural machine translation, the trend changed to performing all the operations together end-toend (Zhang and Lapata, 2017; Nisioi et al., 2017; Zhao et al., 2018; Alva-Manchego et al., 2017;  ... 
arXiv:2010.11004v3 fatcat:fxuw7kfrwvbave6wosdavj7c6a

Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

Fernando Alva-Manchego, Joachim Bingel, Gustavo Henrique Paetzold, Carolina Scarton, Lucia Specia
2017 Zenodo  
Current research in text simplification has been hampered by two central problems: (i) the small amount of high-quality parallel simplification data available, and (ii) the lack of explicit annotations of simplification operations, such as deletions or substitutions, on existing data. While the recently introduced Newsela corpus has alleviated the first problem, simplifications still need to be learned directly from parallel text using black-box, end-to-end approaches rather than from explicit
more » ... nnotations. These complex-simple parallel sentence pairs often differ to such a high degree that generalization becomes difficult. End-to-end models also make it hard to interpret what is actually learned from data. We propose a method that decomposes the task of TS into its sub-problems. We devise a way to automatically identify operations in a parallel corpus and introduce a sequence-labeling approach based on these annotations. Finally, we provide insights on the types of transformations that different approaches can model.
doi:10.5281/zenodo.1042505 fatcat:vcmaka3d7fgxdiclvdx4qxo4f4

ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations [article]

Fernando Alva-Manchego, Louis Martin, Antoine Bordes, Carolina Scarton, Benoît Sagot, Lucia Specia
2020 arXiv   pre-print
The parallel articles can be automatically aligned at the sentence level to train and test simplification models (Alva-Manchego et al., 2017; Štajner et al., 2018) .  ...  Introduction Sentence Simplification (SS) consists in modifying the content and structure of a sentence to make it easier to understand, while retaining its main idea and most of its original meaning (Alva-Manchego  ... 
arXiv:2005.00481v1 fatcat:nsiagoekprewhetbg32zagmx7a

Strong Baselines for Complex Word Identification across Multiple Languages

Pierre Finnimore, Elisabeth Fritzsch, Daniel King, Alison Sneyd, Aneeq Ur Rehman, Fernando Alva-Manchego, Andreas Vlachos
2019 Proceedings of the 2019 Conference of the North  
Complex Word Identification (CWI) is the task of identifying which words or phrases in a sentence are difficult to understand by a target audience. The latest CWI Shared Task released data for two settings: monolingual (i.e. train and test in the same language) and crosslingual (i.e. test in a language not seen during training). The best monolingual models relied on language-dependent features, which do not generalise in the cross-lingual setting, while the best cross-lingual model used neural
more » ... etworks with multi-task learning. In this paper, we present monolingual and cross-lingual CWI models that perform as well as (or better than) most models submitted to the latest CWI Shared Task. We show that carefully selected features and simple learning models can achieve state-of-the-art performance, and result in strong baselines for future development in this area. Finally, we discuss how inconsistencies in the annotation of the data can explain some of the results obtained.
doi:10.18653/v1/n19-1102 dblp:conf/naacl/FinnimoreFKSRAV19 fatcat:zapo22cqarhdfppuj4enki2v2u

EASSE: Easier Automatic Sentence Simplification Evaluation

Fernando Alva-Manchego, Louis Martin, Carolina Scarton, Lucia Specia
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations  
We introduce EASSE, a Python package aiming to facilitate and standardise automatic evaluation and comparison of Sentence Simplification (SS) systems. EASSE provides a single access point to a broad range of evaluation resources: standard automatic metrics for assessing SS outputs (e.g. SARI), wordlevel accuracy scores for certain simplification transformations, reference-independent quality estimation features (e.g. compression ratio), and standard test data for SS evaluation (e.g.
more » ... Finally, EASSE generates easy-to-visualise reports on the various metrics and features above and on how a particular SS output fares against reference simplifications. Through experiments, we show that these functionalities allow for better comparison and understanding of the performance of SS systems.
doi:10.18653/v1/d19-3009 dblp:conf/emnlp/Alva-ManchegoMS19 fatcat:32jspqvtynfp3jpagj5zecvqza

Knowledge Distillation for Quality Estimation

Amit Gajbhiye, Marina Fomicheva, Fernando Alva-Manchego, Frédéric Blain, Abiola Obamuyide, Nikolaos Aletras, Lucia Specia
2021 Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021   unpublished
Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online social media conversations. Recent success in QE stems from the use of multilingual pre-trained representations, where very large models lead to impressive results. However, the inference time, disk and memory requirements of such models do not allow for wide usage in the real world. Models
more » ... ined on distilled pre-trained representations remain prohibitively large for many usage scenarios. We instead propose to directly transfer knowledge from a strong QE teacher model to a much smaller model with a different, shallower architecture. We show that this approach, in combination with data augmentation, leads to light-weight QE models that perform competitively with distilled pre-trained representations with 8x fewer parameters.
doi:10.18653/v1/2021.findings-acl.452 fatcat:3hr72dybb5b3ddjrpwkg65hp7m

Controllable Text Simplification with Explicit Paraphrasing

Mounica Maddela, Fernando Alva-Manchego, Wei Xu
2021 Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies   unpublished
However, these systems mostly rely on deletion and tend to generate very short outputs at the cost of meaning preservation (Alva-Manchego et al., 2017) .  ...  Following neural machine translation, the trend changed to performing all the operations together end-toend (Zhang and Lapata, 2017; Nisioi et al., 2017; Zhao et al., 2018; Alva-Manchego et al., 2017;  ... 
doi:10.18653/v1/2021.naacl-main.277 fatcat:3nnt7vwagzfmvj7pessq6ggqpy

deepQuest-py: Large and Distilled Models for Quality Estimation

Fernando Alva-Manchego, Abiola Obamuyide, Amit Gajbhiye, Frédéric Blain, Marina Fomicheva, Lucia Specia
2021 Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations   unpublished
We introduce deepQuest-py, a framework for training and evaluation of large and lightweight models for Quality Estimation (QE). deepQuest-py provides access to (1) state-ofthe-art models based on pre-trained Transformers for sentence-level and word-level QE; (2) light-weight and efficient sentence-level models implemented via knowledge distillation; and (3) a web interface for testing models and visualising their predictions. deepQuestpy is available at https://github.com/ sheffieldnlp/deepQuest-py under a CC BY-NC-SA licence.
doi:10.18653/v1/2021.emnlp-demo.42 fatcat:j6cddek2wrczfguvc7aazs6wfi

ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations

Fernando Alva-Manchego, Louis Martin, Antoine Bordes, Carolina Scarton, Benoît Sagot, Lucia Specia
2020 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics   unpublished
The parallel articles can be automatically aligned at the sentence level to train and test simplification models (Alva-Manchego et al., 2017; Štajner et al., 2018) .  ...  Introduction Sentence Simplification (SS) consists in modifying the content and structure of a sentence to make it easier to understand, while retaining its main idea and most of its original meaning (Alva-Manchego  ... 
doi:10.18653/v1/2020.acl-main.424 fatcat:rvisgzdxt5ccxh73trjwfkutgi
« Previous Showing results 1 — 15 out of 59 results