A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Extracting Multilingual Topics from Unaligned Comparable Corpora
[chapter]
2010
Lecture Notes in Computer Science
In this paper we present a generative model called JointLDA which uses a bilingual dictionary to mine multilingual topics from an unaligned corpus. ...
Though there are some attempts to mine topical structure from cross-lingual corpora, they require clues about document alignments. ...
Extracting Multilingual Topics from Unaligned Comparable Corpora ...
doi:10.1007/978-3-642-12275-0_39
fatcat:rzqvbd4kjva3bnzp6slasgsx6q
Qualitative Comparison of Native and Machine-Translated Parliamentary Debates
2022
Digital Humanities in the Nordic Countries Conference
It qualitatively compares three steps in topic interpretation: topic description, topic significance in subcorpora, and marginal topic distribution. ...
They can potentially lift the barriers to applying NLP tools and methods to previously unsupported languages and boost comparative cross-lingual research in digital humanities. ...
PTM can be extended to unaligned documents, but not all corpora contain comparable documents. ...
dblp:conf/dhn/Zagar22
fatcat:lav36rgjmnhgxcvyzhums5yfay
Pseudo-Aligned Multilingual Corpora
2007
International Joint Conference on Artificial Intelligence
We apply semisupervised methods to pseudo-align multilingual corpora. Specifically, we construct a topicbased graph for each language. ...
Experimental results show that pseudo-alignment of multilingual corpora is feasible and that the document alignments produced are qualitatively sound. ...
This would allow one to leverage topic information from different languages when defining the lower-dimensional topic space. Second, we adopted parallel corpora for evaluation reasons. ...
dblp:conf/ijcai/DiazM07
fatcat:3fdipgp2wveqhpfddgohhznecu
Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation
2010
Conference on Empirical Methods in Natural Language Processing
MLSLDA provides a method for extracting topical and sentimentcorrelated word lists from multilingual corpora. ...
Figure 4 shows extracted topics from German-English and German-Chinese corpora. MLSLDA is able to distinguish sentiment-bearing topics from content bearing topics. ...
dblp:conf/emnlp/Boyd-GraberR10
fatcat:rh3xsvsf6fch3d36uy3b4hlgyu
Cross-Lingual Latent Topic Extraction
2010
Annual Meeting of the Association for Computational Linguistics
Both qualitative and quantitative experimental results show that the PCLSA model can effectively extract cross-lingual latent topics from multilingual text data. ...
Probabilistic latent topic models have recently enjoyed much success in extracting and analyzing latent topics in text in an unsupervised way. ...
Besides all the multilingual topic modeling work discussed above, comparable corpora have also been studied extensively (e.g. ...
dblp:conf/acl/ZhangMZ10
fatcat:icpzu6wsrnd2riv2caamvqpvzy
Neural Machine Translation for Low Resource Languages using Bilingual Lexicon Induced from Comparable Corpora
2018
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Resources for the non-English languages are scarce and this paper addresses this problem in the context of machine translation, by automatically extracting parallel sentence pairs from the multilingual ...
In this paper, we have used an end-to-end Siamese bidirectional recurrent neural network to generate parallel sentences from comparable multilingual articles in Wikipedia. ...
Comparable corpora such as Wikipedia, are collections of topic-aligned but non-sentence-aligned multilingual documents which are rich resources for extracting parallel sentences from. ...
doi:10.18653/v1/n18-4016
dblp:conf/naacl/RameshS18
fatcat:emmblyhcwbahvczghp4tm4cy3a
Harvesting Comparable Corpora and Mining Them for Equivalent Bilingual Sentences Using Statistical Classification and Analogy-Based Heuristics
[chapter]
2015
Lecture Notes in Computer Science
This research explores our new methodologies for mining such data from previously obtained comparable corpora. ...
Here we propose a web crawling method for building subject-aligned comparable corpora from e.g. Wikipedia dumps and Euronews web page. ...
We seek to obtain parallel corpora from unaligned data. Solution proposed by our team is based on sequential analogy detection. ...
doi:10.1007/978-3-319-25252-0_46
fatcat:l6y2i4om3jddhilmfoeozdjn6e
Adversarial Alignment of Multilingual Models for Extracting Temporal Expressions from Text
[article]
2020
arXiv
pre-print
In this paper, we explore multilingual methods for the extraction of temporal expressions from text and investigate adversarial training for aligning embedding spaces to one common space. ...
With this, we create a single multilingual model that can also be transferred to unseen languages and set the new state of the art in those cross-lingual transfer experiments. ...
Introduction The extraction of temporal expressions from text is an important processing step for many applications, such as topic detection and questions answering (Strötgen and Gertz, 2016) . ...
arXiv:2005.09392v1
fatcat:rs5brn7oevegth3gxbsqfusa4e
Building Subject-aligned Comparable Corpora and Mining it for Truly Parallel Sentence Pairs
2014
Procedia Technology - Elsevier
We also introduce a method for extracting truly parallel sentences that are filtered out from noisy or just comparable sentence pairs. ...
This research explores our methodology for mining such data from previously obtained comparable corpora. ...
Acknowledgements This work was supported by the European Community from the European Social Fund within the Interkadra project UDA-POKL-04.01.01-00-014/10-00 and Eu-Bridge 7th FR EU project (Grant Agreement ...
doi:10.1016/j.protcy.2014.11.024
fatcat:fijsnjenrnarhnqt5kgw6wewta
The Role of Sketch Engine in Multiple Types of Corpora
2019
VOLUME-8 ISSUE-10, AUGUST 2019, REGULAR ISSUE
This paper sheds light on the significant role Sketch Engine plays in relation to different types of corpora. ...
The software's features that support the creation of multilingual dictionaries and lexigraphy are also discussed. ...
MULTIPLE TYPES OF CORPORA This paper uses different Sketch Engine terms which are explained as follows:
A. Comparable Corpora This type of corpora are unaligned to each other. ...
doi:10.35940/ijitee.k1307.0981119
fatcat:s7e6pl6nprfivkaj5uazmqssty
Controlling Target Features in Neural Machine Translation via Prefix Constraints
2017
Workshop on Asian Translation
Prefix constraints can be predicted from source sentence jointly with target sentence, while side constraints must be provided by the user or predicted by some other methods. ...
prefix constraints are more flexible than side constraints and can be used to control the behavior of neural machine translation, in terms of output length, bidirectional decoding, domain adaptation, and unaligned ...
Tatoeba is a collection of multilingual translated example sentences from Tatoeba website. ...
dblp:conf/aclwat/TakenoNY17
fatcat:eap7tu5gszfgncszwml3s4zdiq
ECCParaCorp: a cross-lingual parallel corpus towards cancer education, dissemination and application
2020
BMC Medical Informatics and Decision Making
However, the scarcity of multilingual cancer corpus limits the intelligent processing, such as machine translation in medical scenarios. ...
application as a preparatory data foundation e.g. cancer-related machine translation, cancer system development towards medical education, and disease-oriented knowledge extraction. ...
MulTed is a multilingual parallel corpus collected from TED talks containing general topics [8] etc. ...
doi:10.1186/s12911-020-1116-1
pmid:32646415
fatcat:vg7hmi2levewxk4dpcjl5m4pt4
LIMSI: Translations as Source of Indirect Supervision for Multilingual All-Words Sense Disambiguation and Entity Linking
2015
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
We present the LIMSI submission to the Multilingual Word Sense Disambiguation and Entity Linking task of SemEval-2015. ...
The system exploits the parallelism of the multilingual test data and uses translations as source of indirect supervision for sense selection. ...
Task Description The SemEval-2015 Multilingual WSD and EL task (Moro and Navigli, 2015) aims to promote joint research in these two closely-related topics. ...
doi:10.18653/v1/s15-2050
dblp:conf/semeval/ApidianakiG15
fatcat:azui7vg2cbfxnkv5sebyhoagim
Multilingual and code-switching ASR challenges for low resource Indian languages
[article]
2021
arXiv
pre-print
of labeled corpora in multiple languages. ...
With multilingualism becoming common in today's world, there has been increasing interest in code-switching ASR as well. ...
Dataset Description The Hindi-English and Bengali-English datasets are extracted from spoken tutorials. ...
arXiv:2104.00235v1
fatcat:eevwpnji2fdtdk7ltatn7hkkua
Coarse-grained Cross-lingual Alignment of Comparable Texts with Topic Models and Encyclopedic Knowledge
[article]
2014
arXiv
pre-print
induced multilingual topics. ...
We present a method for coarse-grained cross-lingual alignment of comparable texts: segments consisting of contiguous paragraphs that discuss the same theme (e.g. history, economy) are aligned based on ...
Multilingual topic modeling Jagarlamudi and Daumé III (2010) use a bilingual dictionary to obtain multilingual topics from an unaligned multilingual corpus. ...
arXiv:1411.7820v1
fatcat:il5zvspwajfdjpf5yuwiv4fdim
« Previous
Showing results 1 — 15 out of 187 results