A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
An abstractive approach to sentence compression
2013
ACM Transactions on Intelligent Systems and Technology
The model incorporates a grammar extraction method, uses a language model for coherent output, and can be easily tuned to a wide range of compression-specific loss functions. ...
We present an experimental study showing that humans can naturally create abstractive sentences using a variety of rewrite operations, not just deletion. ...
Special thanks to Phil Blunsom, James Clarke and Miles Osborne for their insightful suggestions. ...
doi:10.1145/2483669.2483674
fatcat:yxaeplg2kfdn7mrnjf677ok7ua
A Comparative Study of Machine Translation for Multilingual Sentence-level Sentiment Analysis
2019
Information Sciences
In this work, we take a different step into this field. We focus on evaluating existing efforts proposed to do language specific sentiment analysis with a simple yet effective baseline approach. ...
We hope our system setups a new baseline for future sentence-level methods developed in a wide set of languages. ...
However, these approaches do not have a successful engagement yet. Most of the current applications are simple and language-specific. ...
doi:10.1016/j.ins.2019.10.031
fatcat:tle7kqohzrgdji54qpgsqi2b5i
Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization
2014
Journal of King Saud University: Computer and Information Sciences
Automatic text summarization aims to produce summaries for one or more texts using machine techniques. In this paper, we propose a novel statistical summarization system for Arabic texts. ...
Second, we propose a novel sentence extraction algorithm which selects sentences with top ranked terms and maximum diversity. ...
This extraction could be a simple reverse sorting or a recursive process. We propose a novel extraction algorithm. ...
doi:10.1016/j.jksuci.2014.06.008
fatcat:imaz2bazzbearamx6tdih4ld2i
Mining Documents and Sentiments in Cross-lingual Context
2016
Figshare
First, we collect English, French and Arabic comparable corpora from Wikipedia and Euronews, and we align each corpus at the document level. ...
The aim of this thesis is to study sentiments in comparable documents. ...
These steps are applied for each English article in Wikipedia dump files. 5: Wikipedia article depth (August 2014)
Rank Language
1
English
4
Arabic
5
French
Arabic, French, and English comparable ...
doi:10.6084/m9.figshare.3204040.v1
fatcat:5kb4k2kylnc7nhdumanxjw5wpe
Alignment of comparable documents: Comparison of similarity measures on French–English–Arabic data
2018
Natural Language Engineering
The comparability is calculated for each Arabic–English couple of documents of each month. This automatic task is then validated by hand. ...
This led to a multilingual (Arabic–English) aligned corpus of 305 pairs of documents (233k English words and 137k Arabic words). ...
A parallel corpus is a collection of aligned sentences, which are translations of each other. ...
doi:10.1017/s1351324918000232
fatcat:qulmfx2ujbelbc2k5sivychmwq
Negation and Speculation in NLP: A Survey, Corpora, Methods, and Applications
2022
Applied Sciences
In some NLP applications, inclusion of a system that is negation- and speculation-aware improves performance, yet this aspect is still not addressed or considered an essential step. ...
Many English corpora for various domains are now annotated with negation and speculation; moreover, the availability of annotated corpora in other languages has started to increase. ...
In their study, they built word-embedding models for the French language that were composed of French Wikipedia articles and biomedical data. ...
doi:10.3390/app12105209
fatcat:jzm5hjhcqbbr5ck6cosat7n5zq
Multi-document arabic text summarisation
2011
2011 3rd Computer Science and Electronic Engineering Conference (CEEC)
We developed extractive language dependent and language independent single and multi-document summarisers, both for Arabic and English. In our work we provided state-of-the-art approaches for Ara- ...
One of the obstacles to progress is the limited availability of Arabic resources to support this research. ...
The summarisers' primary data source was a collection of Arabic articles extracted from Wikipedia, a free online encyclopedia 1 . ...
doi:10.1109/ceec.2011.5995822
fatcat:74lmb2yhdzay3plc3nyszn57na
Machine Reading Comprehension: Methods and Trends of Low Resource Languages
2021
jecet
This study presents a survey on trends and methods of Machine Reading Comprehension (MRC) in these low-resource languages. ...
This is due to the unavailability of large-scale training datasets in low resource languages. Several studies on Machine Reading Comprehension (MRC) have proposed MRC models based on English. ...
The system extracted articles from Wikipedia for Arabic questions and removed answers by translating them with the help of SQuAD. ...
doi:10.24214/jecet.b.10.2.05775
fatcat:fto7t3avmzexzn6nfsuu4k2emm
A resource-light method for cross-lingual semantic textual similarity
2018
Knowledge-Based Systems
Furthermore, we evaluate the proposed method on two extrinsic tasks, namely extraction of parallel sentences from comparable corpora and cross-lingual plagiarism detection, and show that it yields performance ...
Experimental results on three different datasets for measuring semantic textual similarity show that our simple resource-light approach reaches performance close to that of supervised and resource-intensive ...
Acknowledgments Part of the work presented in this article was performed during second author's research visit to the University of Mannheim, supported by Contact Fellowship awarded by the DAAD scholarship ...
doi:10.1016/j.knosys.2017.11.041
fatcat:3ii72eswyfaetdv6gkzfisdi6i
A Resource-Light Method for Cross-Lingual Semantic Textual Similarity
[article]
2018
arXiv
pre-print
Furthermore, we evaluate the proposed method on two extrinsic tasks, namely extraction of parallel sentences from comparable corpora and cross lingual plagiarism detection, and show that it yields performance ...
Experimental results on three different datasets for measuring semantic textual similarity show that our simple resource-light approach reaches performance close to that of supervised and resource intensive ...
The results of the parallel sentence extraction evaluation are shown in Table 6 . ...
arXiv:1801.06436v1
fatcat:hftd7zwksjgnjdsp2s5csbybbu
Tailoring and Evaluating the Wikipedia for in-Domain Comparable Corpora Extraction
[article]
2020
arXiv
pre-print
We propose an automatic language-independent graph-based method to build \'a-la-carte article collections on user-defined domains from the Wikipedia. ...
Our best metric for domainness shows a strong correlation with the human-judged precision, representing a reasonable automatic alternative to assess the quality of domain-specific corpora. ...
With the rise of deep learning for NLP and the need of large amounts of clean data, the use of Wikipedia has grown exponentially not only for parallel sentence extraction and machine translation (Varga ...
arXiv:2005.01177v1
fatcat:i2xzqzsjjjadvnvrntutt43n3u
BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network
2012
Artificial Intelligence
We present an automatic approach to the construction of BabelNet, a very large, widecoverage multilingual semantic network. ...
Key to our approach is the integration of lexicographic and encyclopedic knowledge from WordNet and Wikipedia. ...
Acknowledgements The authors gratefully acknowledge the support of the ERC Starting Grant MultiJEDI No. 259234. Thanks go to Google for access to the University Research Program for Google Translate. ...
doi:10.1016/j.artint.2012.07.001
fatcat:m5lt7m6mhfevvauj5lofw3mqcu
Message from the general chair
2015
2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
case of abstract anaphora, namely, "this-issue" anaphora. ...
To inject knowledge, we use a state-of-the-art system which cross-links (or "grounds") expressions in free text to Wikipedia. ...
parallel Wikipedia sentences. ...
doi:10.1109/ispass.2015.7095776
dblp:conf/ispass/Lee15
fatcat:ehbed6nl6barfgs6pzwcvwxria
SAMAR: Subjectivity and sentiment analysis for Arabic social media
2014
Computer Speech and Language
Arabic is a morphologically rich language, which presents significant complexities for standard approaches to building SSA systems designed for the English language. ...
SAMAR is a system for subjectivity and sentiment analysis (SSA) for Arabic social media genres. ...
These dictionaries can be automatically extracted from the parallel corpus (Steinberger et al., 2012) , but their approach to SSA does not seem to exploit the parallel nature of the corpus. ...
doi:10.1016/j.csl.2013.03.001
fatcat:3qwdbbnj25cepkrtjpjuelnvgu
Porting Multilingual Subjectivity Resources across Languages
2013
IEEE Transactions on Affective Computing
Subjectivity analysis focuses on the automatic extraction of private states in natural language. ...
Given a bridge between English and the selected target language (e.g., a bilingual dictionary or a parallel corpus), the methods can be used to rapidly create tools for subjectivity analysis in the new ...
Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. ...
doi:10.1109/t-affc.2013.1
fatcat:y4cwk57cvncahhvmxjovzkt53a
« Previous
Showing results 1 — 15 out of 379 results