A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
MASSAlign: Alignment and Annotation of Comparable Documents
2017
Zenodo
Conference paper: MASSAlign: Alignment and Annotation of Comparable Documents
doi:10.5281/zenodo.1040791
fatcat:xh6b3m6cljdvpfxjih2daoyp6e
Knowledge Distillation for Quality Estimation
[article]
2021
arXiv
pre-print
Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online social media conversations. Recent success in QE stems from the use of multilingual pre-trained representations, where very large models lead to impressive results. However, the inference time, disk and memory requirements of such models do not allow for wide usage in the real world. Models
arXiv:2107.00411v1
fatcat:bkakpl4mwrcajnwyh5ghzw2t3m
more »
... ined on distilled pre-trained representations remain prohibitively large for many usage scenarios. We instead propose to directly transfer knowledge from a strong QE teacher model to a much smaller model with a different, shallower architecture. We show that this approach, in combination with data augmentation, leads to light-weight QE models that perform competitively with distilled pre-trained representations with 8x fewer parameters.
EASSE: Easier Automatic Sentence Simplification Evaluation
[article]
2019
arXiv
pre-print
We introduce EASSE, a Python package aiming to facilitate and standardise automatic evaluation and comparison of Sentence Simplification (SS) systems. EASSE provides a single access point to a broad range of evaluation resources: standard automatic metrics for assessing SS outputs (e.g. SARI), word-level accuracy scores for certain simplification transformations, reference-independent quality estimation features (e.g. compression ratio), and standard test data for SS evaluation (e.g.
arXiv:1908.04567v2
fatcat:bhqcyoexnvhm7lfwkcwiw6c4nm
more »
... . Finally, EASSE generates easy-to-visualise reports on the various metrics and features above and on how a particular SS output fares against reference simplifications. Through experiments, we show that these functionalities allow for better comparison and understanding of the performance of SS systems.
Data-Driven Sentence Simplification: Survey and Benchmark
2020
Computational Linguistics
Alva-Manchego et al. (2017) achieve the highest SARI score in the test set, and best simplicity score with human judgments. ...
Alva-Manchego et al. (2017) model SS as a Sequence Labeling problem, identifying simplification transformations at word or phrase level. ...
doi:10.1162/coli_a_00370
fatcat:k7mlggplrreudk5pgq62x2fmva
Strong Baselines for Complex Word Identification across Multiple Languages
[article]
2019
arXiv
pre-print
Complex Word Identification (CWI) is the task of identifying which words or phrases in a sentence are difficult to understand by a target audience. The latest CWI Shared Task released data for two settings: monolingual (i.e. train and test in the same language) and cross-lingual (i.e. test in a language not seen during training). The best monolingual models relied on language-dependent features, which do not generalise in the cross-lingual setting, while the best cross-lingual model used neural
arXiv:1904.05953v1
fatcat:w654do6n2rgepotwmd5yw7cw4q
more »
... networks with multi-task learning. In this paper, we present monolingual and cross-lingual CWI models that perform as well as (or better than) most models submitted to the latest CWI Shared Task. We show that carefully selected features and simple learning models can achieve state-of-the-art performance, and result in strong baselines for future development in this area. Finally, we discuss how inconsistencies in the annotation of the data can explain some of the results obtained.
The (Un)Suitability of Automatic Evaluation Metrics for Text Simplification
2021
Computational Linguistics
This allows exploiting the alignment between TurkCorpus, HSplit (Sulem, Abend, and Rappoport 2018a) and ASSET (Alva-Manchego et al. 2020) to investigate ...
More details can be found in (Alva-Manchego 2020, chap 3).
2. Determine the references to use. ...
Alva-Manchego, Scarton and Specia (Un)Suitability of Metrics for Text Simplification Table 1 Descriptions of simplification systems included in the studied datasets. ...
doi:10.1162/coli_a_00418
fatcat:53a5hn2jxfgw5oepweoi65sbwm
Controllable Text Simplification with Explicit Paraphrasing
[article]
2021
arXiv
pre-print
However, these systems mostly rely on deletion and tend to generate very short outputs at the cost of meaning preservation (Alva-Manchego et al., 2017) . ...
Following neural machine translation, the trend changed to performing all the operations together end-toend (Zhang and Lapata, 2017; Nisioi et al., 2017; Zhao et al., 2018; Alva-Manchego et al., 2017; ...
arXiv:2010.11004v3
fatcat:fxuw7kfrwvbave6wosdavj7c6a
Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs
2017
Zenodo
Current research in text simplification has been hampered by two central problems: (i) the small amount of high-quality parallel simplification data available, and (ii) the lack of explicit annotations of simplification operations, such as deletions or substitutions, on existing data. While the recently introduced Newsela corpus has alleviated the first problem, simplifications still need to be learned directly from parallel text using black-box, end-to-end approaches rather than from explicit
doi:10.5281/zenodo.1042505
fatcat:vcmaka3d7fgxdiclvdx4qxo4f4
more »
... nnotations. These complex-simple parallel sentence pairs often differ to such a high degree that generalization becomes difficult. End-to-end models also make it hard to interpret what is actually learned from data. We propose a method that decomposes the task of TS into its sub-problems. We devise a way to automatically identify operations in a parallel corpus and introduce a sequence-labeling approach based on these annotations. Finally, we provide insights on the types of transformations that different approaches can model.
ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations
[article]
2020
arXiv
pre-print
The parallel articles can be automatically aligned at the sentence level to train and test simplification models (Alva-Manchego et al., 2017; Štajner et al., 2018) . ...
Introduction Sentence Simplification (SS) consists in modifying the content and structure of a sentence to make it easier to understand, while retaining its main idea and most of its original meaning (Alva-Manchego ...
arXiv:2005.00481v1
fatcat:nsiagoekprewhetbg32zagmx7a
Strong Baselines for Complex Word Identification across Multiple Languages
2019
Proceedings of the 2019 Conference of the North
Complex Word Identification (CWI) is the task of identifying which words or phrases in a sentence are difficult to understand by a target audience. The latest CWI Shared Task released data for two settings: monolingual (i.e. train and test in the same language) and crosslingual (i.e. test in a language not seen during training). The best monolingual models relied on language-dependent features, which do not generalise in the cross-lingual setting, while the best cross-lingual model used neural
doi:10.18653/v1/n19-1102
dblp:conf/naacl/FinnimoreFKSRAV19
fatcat:zapo22cqarhdfppuj4enki2v2u
more »
... etworks with multi-task learning. In this paper, we present monolingual and cross-lingual CWI models that perform as well as (or better than) most models submitted to the latest CWI Shared Task. We show that carefully selected features and simple learning models can achieve state-of-the-art performance, and result in strong baselines for future development in this area. Finally, we discuss how inconsistencies in the annotation of the data can explain some of the results obtained.
EASSE: Easier Automatic Sentence Simplification Evaluation
2019
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations
We introduce EASSE, a Python package aiming to facilitate and standardise automatic evaluation and comparison of Sentence Simplification (SS) systems. EASSE provides a single access point to a broad range of evaluation resources: standard automatic metrics for assessing SS outputs (e.g. SARI), wordlevel accuracy scores for certain simplification transformations, reference-independent quality estimation features (e.g. compression ratio), and standard test data for SS evaluation (e.g.
doi:10.18653/v1/d19-3009
dblp:conf/emnlp/Alva-ManchegoMS19
fatcat:32jspqvtynfp3jpagj5zecvqza
more »
... Finally, EASSE generates easy-to-visualise reports on the various metrics and features above and on how a particular SS output fares against reference simplifications. Through experiments, we show that these functionalities allow for better comparison and understanding of the performance of SS systems.
Knowledge Distillation for Quality Estimation
2021
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
unpublished
Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online social media conversations. Recent success in QE stems from the use of multilingual pre-trained representations, where very large models lead to impressive results. However, the inference time, disk and memory requirements of such models do not allow for wide usage in the real world. Models
doi:10.18653/v1/2021.findings-acl.452
fatcat:3hr72dybb5b3ddjrpwkg65hp7m
more »
... ined on distilled pre-trained representations remain prohibitively large for many usage scenarios. We instead propose to directly transfer knowledge from a strong QE teacher model to a much smaller model with a different, shallower architecture. We show that this approach, in combination with data augmentation, leads to light-weight QE models that perform competitively with distilled pre-trained representations with 8x fewer parameters.
Controllable Text Simplification with Explicit Paraphrasing
2021
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
unpublished
However, these systems mostly rely on deletion and tend to generate very short outputs at the cost of meaning preservation (Alva-Manchego et al., 2017) . ...
Following neural machine translation, the trend changed to performing all the operations together end-toend (Zhang and Lapata, 2017; Nisioi et al., 2017; Zhao et al., 2018; Alva-Manchego et al., 2017; ...
doi:10.18653/v1/2021.naacl-main.277
fatcat:3nnt7vwagzfmvj7pessq6ggqpy
deepQuest-py: Large and Distilled Models for Quality Estimation
2021
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
unpublished
We introduce deepQuest-py, a framework for training and evaluation of large and lightweight models for Quality Estimation (QE). deepQuest-py provides access to (1) state-ofthe-art models based on pre-trained Transformers for sentence-level and word-level QE; (2) light-weight and efficient sentence-level models implemented via knowledge distillation; and (3) a web interface for testing models and visualising their predictions. deepQuestpy is available at https://github.com/ sheffieldnlp/deepQuest-py under a CC BY-NC-SA licence.
doi:10.18653/v1/2021.emnlp-demo.42
fatcat:j6cddek2wrczfguvc7aazs6wfi
ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations
2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
unpublished
The parallel articles can be automatically aligned at the sentence level to train and test simplification models (Alva-Manchego et al., 2017; Štajner et al., 2018) . ...
Introduction Sentence Simplification (SS) consists in modifying the content and structure of a sentence to make it easier to understand, while retaining its main idea and most of its original meaning (Alva-Manchego ...
doi:10.18653/v1/2020.acl-main.424
fatcat:rvisgzdxt5ccxh73trjwfkutgi
« Previous
Showing results 1 — 15 out of 59 results