379 Hits in 9.8 sec

An abstractive approach to sentence compression

Trevor Cohn, Mirella Lapata
2013 ACM Transactions on Intelligent Systems and Technology  
The model incorporates a grammar extraction method, uses a language model for coherent output, and can be easily tuned to a wide range of compression-specific loss functions.  ...  We present an experimental study showing that humans can naturally create abstractive sentences using a variety of rewrite operations, not just deletion.  ...  Special thanks to Phil Blunsom, James Clarke and Miles Osborne for their insightful suggestions.  ... 
doi:10.1145/2483669.2483674 fatcat:yxaeplg2kfdn7mrnjf677ok7ua

A Comparative Study of Machine Translation for Multilingual Sentence-level Sentiment Analysis

Matheus Araújo, Adriano Pereira, Fabrício Benevenuto
2019 Information Sciences  
In this work, we take a different step into this field. We focus on evaluating existing efforts proposed to do language specific sentiment analysis with a simple yet effective baseline approach.  ...  We hope our system setups a new baseline for future sentence-level methods developed in a wide set of languages.  ...  However, these approaches do not have a successful engagement yet. Most of the current applications are simple and language-specific.  ... 
doi:10.1016/j.ins.2019.10.031 fatcat:tle7kqohzrgdji54qpgsqi2b5i

Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization

Houda Oufaida, Omar Nouali, Philippe Blache
2014 Journal of King Saud University: Computer and Information Sciences  
Automatic text summarization aims to produce summaries for one or more texts using machine techniques. In this paper, we propose a novel statistical summarization system for Arabic texts.  ...  Second, we propose a novel sentence extraction algorithm which selects sentences with top ranked terms and maximum diversity.  ...  This extraction could be a simple reverse sorting or a recursive process. We propose a novel extraction algorithm.  ... 
doi:10.1016/j.jksuci.2014.06.008 fatcat:imaz2bazzbearamx6tdih4ld2i

Mining Documents and Sentiments in Cross-lingual Context

Motaz Saad
2016 Figshare  
First, we collect English, French and Arabic comparable corpora from Wikipedia and Euronews, and we align each corpus at the document level.  ...  The aim of this thesis is to study sentiments in comparable documents.  ...  These steps are applied for each English article in Wikipedia dump files. 5: Wikipedia article depth (August 2014) Rank Language 1 English 4 Arabic 5 French Arabic, French, and English comparable  ... 
doi:10.6084/m9.figshare.3204040.v1 fatcat:5kb4k2kylnc7nhdumanxjw5wpe

Alignment of comparable documents: Comparison of similarity measures on French–English–Arabic data

2018 Natural Language Engineering  
The comparability is calculated for each ArabicEnglish couple of documents of each month. This automatic task is then validated by hand.  ...  This led to a multilingual (ArabicEnglish) aligned corpus of 305 pairs of documents (233k English words and 137k Arabic words).  ...  A parallel corpus is a collection of aligned sentences, which are translations of each other.  ... 
doi:10.1017/s1351324918000232 fatcat:qulmfx2ujbelbc2k5sivychmwq

Negation and Speculation in NLP: A Survey, Corpora, Methods, and Applications

Ahmed Mahany, Heba Khaled, Nouh Sabri Elmitwally, Naif Aljohani, Said Ghoniemy
2022 Applied Sciences  
In some NLP applications, inclusion of a system that is negation- and speculation-aware improves performance, yet this aspect is still not addressed or considered an essential step.  ...  Many English corpora for various domains are now annotated with negation and speculation; moreover, the availability of annotated corpora in other languages has started to increase.  ...  In their study, they built word-embedding models for the French language that were composed of French Wikipedia articles and biomedical data.  ... 
doi:10.3390/app12105209 fatcat:jzm5hjhcqbbr5ck6cosat7n5zq

Multi-document arabic text summarisation

Mahmoud El-Haj, Udo Kruschwitz, Chris Fox
2011 2011 3rd Computer Science and Electronic Engineering Conference (CEEC)  
We developed extractive language dependent and language independent single and multi-document summarisers, both for Arabic and English. In our work we provided state-of-the-art approaches for Ara-  ...  One of the obstacles to progress is the limited availability of Arabic resources to support this research.  ...  The summarisers' primary data source was a collection of Arabic articles extracted from Wikipedia, a free online encyclopedia 1 .  ... 
doi:10.1109/ceec.2011.5995822 fatcat:74lmb2yhdzay3plc3nyszn57na

Machine Reading Comprehension: Methods and Trends of Low Resource Languages

2021 jecet  
This study presents a survey on trends and methods of Machine Reading Comprehension (MRC) in these low-resource languages.  ...  This is due to the unavailability of large-scale training datasets in low resource languages. Several studies on Machine Reading Comprehension (MRC) have proposed MRC models based on English.  ...  The system extracted articles from Wikipedia for Arabic questions and removed answers by translating them with the help of SQuAD.  ... 
doi:10.24214/jecet.b.10.2.05775 fatcat:fto7t3avmzexzn6nfsuu4k2emm

A resource-light method for cross-lingual semantic textual similarity

Goran Glavaš, Marc Franco-Salvador, Simone P. Ponzetto, Paolo Rosso
2018 Knowledge-Based Systems  
Furthermore, we evaluate the proposed method on two extrinsic tasks, namely extraction of parallel sentences from comparable corpora and cross-lingual plagiarism detection, and show that it yields performance  ...  Experimental results on three different datasets for measuring semantic textual similarity show that our simple resource-light approach reaches performance close to that of supervised and resource-intensive  ...  Acknowledgments Part of the work presented in this article was performed during second author's research visit to the University of Mannheim, supported by Contact Fellowship awarded by the DAAD scholarship  ... 
doi:10.1016/j.knosys.2017.11.041 fatcat:3ii72eswyfaetdv6gkzfisdi6i

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity [article]

Goran Glavaš, Marc Franco-Salvador, Simone Paolo Ponzetto, Paolo Rosso
2018 arXiv   pre-print
Furthermore, we evaluate the proposed method on two extrinsic tasks, namely extraction of parallel sentences from comparable corpora and cross lingual plagiarism detection, and show that it yields performance  ...  Experimental results on three different datasets for measuring semantic textual similarity show that our simple resource-light approach reaches performance close to that of supervised and resource intensive  ...  The results of the parallel sentence extraction evaluation are shown in Table 6 .  ... 
arXiv:1801.06436v1 fatcat:hftd7zwksjgnjdsp2s5csbybbu

Tailoring and Evaluating the Wikipedia for in-Domain Comparable Corpora Extraction [article]

Cristina España-Bonet, Alberto Barrón-Cedeño, Lluís Màrquez
2020 arXiv   pre-print
We propose an automatic language-independent graph-based method to build \'a-la-carte article collections on user-defined domains from the Wikipedia.  ...  Our best metric for domainness shows a strong correlation with the human-judged precision, representing a reasonable automatic alternative to assess the quality of domain-specific corpora.  ...  With the rise of deep learning for NLP and the need of large amounts of clean data, the use of Wikipedia has grown exponentially not only for parallel sentence extraction and machine translation (Varga  ... 
arXiv:2005.01177v1 fatcat:i2xzqzsjjjadvnvrntutt43n3u

BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

Roberto Navigli, Simone Paolo Ponzetto
2012 Artificial Intelligence  
We present an automatic approach to the construction of BabelNet, a very large, widecoverage multilingual semantic network.  ...  Key to our approach is the integration of lexicographic and encyclopedic knowledge from WordNet and Wikipedia.  ...  Acknowledgements The authors gratefully acknowledge the support of the ERC Starting Grant MultiJEDI No. 259234. Thanks go to Google for access to the University Research Program for Google Translate.  ... 
doi:10.1016/j.artint.2012.07.001 fatcat:m5lt7m6mhfevvauj5lofw3mqcu

Message from the general chair

Benjamin C. Lee
2015 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)  
case of abstract anaphora, namely, "this-issue" anaphora.  ...  To inject knowledge, we use a state-of-the-art system which cross-links (or "grounds") expressions in free text to Wikipedia.  ...  parallel Wikipedia sentences.  ... 
doi:10.1109/ispass.2015.7095776 dblp:conf/ispass/Lee15 fatcat:ehbed6nl6barfgs6pzwcvwxria

SAMAR: Subjectivity and sentiment analysis for Arabic social media

Muhammad Abdul-Mageed, Mona Diab, Sandra Kübler
2014 Computer Speech and Language  
Arabic is a morphologically rich language, which presents significant complexities for standard approaches to building SSA systems designed for the English language.  ...  SAMAR is a system for subjectivity and sentiment analysis (SSA) for Arabic social media genres.  ...  These dictionaries can be automatically extracted from the parallel corpus (Steinberger et al., 2012) , but their approach to SSA does not seem to exploit the parallel nature of the corpus.  ... 
doi:10.1016/j.csl.2013.03.001 fatcat:3qwdbbnj25cepkrtjpjuelnvgu

Porting Multilingual Subjectivity Resources across Languages

Carmen Banea, Rada Mihalcea, Janyce Wiebe
2013 IEEE Transactions on Affective Computing  
Subjectivity analysis focuses on the automatic extraction of private states in natural language.  ...  Given a bridge between English and the selected target language (e.g., a bilingual dictionary or a parallel corpus), the methods can be used to rapidly create tools for subjectivity analysis in the new  ...  Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.  ... 
doi:10.1109/t-affc.2013.1 fatcat:y4cwk57cvncahhvmxjovzkt53a
« Previous Showing results 1 — 15 out of 379 results