157 Hits in 5.2 sec

Bilingual Chunk Alignment Based on Interactional Matching and Probabilistic Latent Semantic Indexing [chapter]

Feifan Liu, Qianli Jin, Jun Zhao, Bo Xu
2005 Lecture Notes in Computer Science  
An integrated method for bilingual chunk partition and alignment, called "Interactional Matching", is proposed in this paper.  ...  Furthermore, with the technology of Probabilistic Latent Semantic Indexing(PLSI), this method can deal with not only compositional chunks, but also non-compositional ones.  ...  The method involves two key technologies, namely Interactional Matching and Probabilistic Latent Semantic Indexing (PLSI).  ... 
doi:10.1007/978-3-540-30211-7_44 fatcat:7koilevhdjbthmf6qbeux3q3ca

Getting Past the Language Gap: Innovations in Machine Translation [chapter]

Rodolfo Delmonte
2012 Mobile Speech and Advanced Natural Language Solutions  
There are many possible ways of segmenting and translating phrases: this is done on a probabilistic basis, and the probability distribution of the collected phrase pairs is usually based on their relative  ...  Then section "Knowledge-Based MT Systems" will introduce knowledge, semantically-based systems.  ...  Costa-jussà (2011) proposed and evaluated an approach that uses a semantic feature for statistical machine translation, based on Latent Semantic Indexing.  ... 
doi:10.1007/978-1-4614-6018-3_6 fatcat:2njkc6meabhaxosl4wircumfjm

Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches

T. Naseem, B. Snyder, J. Eisenstein, R. Barzilay
2009 The Journal of Artificial Intelligence Research  
model which instead incorporates multilingual context using latent variables.  ...  We consider two ways of applying this intuition to the problem of unsupervised part-of-speech tagging: a model that directly merges tag structures for a pair of languages into a single sequence and a second  ...  Any opinions, findings, and conclusions or recommendations expressed above are those of the authors and do not necessarily reflect the views of the NSF.  ... 
doi:10.1613/jair.2843 fatcat:vwi2yze4endgngwkvtybiczta4

Message from the general chair

Benjamin C. Lee
2015 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)  
Our system gives a better performance than all the learning-based systems from the CoNLL-2011 shared task on the same dataset.  ...  To maximize the utility of the injected knowledge, we deploy a learning-based multi-sieve approach and develop novel entity-based features.  ...  rule extraction by deleting spurious word alignment links and adding new valuable links based on bilingual translation span correspondences.  ... 
doi:10.1109/ispass.2015.7095776 dblp:conf/ispass/Lee15 fatcat:ehbed6nl6barfgs6pzwcvwxria

Analyzing Non-Textual Content Elements to Detect Academic Plagiarism

Norman Meuschke, Bela Gipp, Harald Reiterer, Michael L. Nelson
2021 Zenodo  
Detection approaches proposed so far analyze lexical, syntactical, and semantic text similarity. These approaches find copied, moderately reworded, and literally translated text.  ...  To demonstrate the benefit of combining non-textual and text-based detection methods, the thesis describes the first plagiarism detection system that integrates th [...]  ...  chunked, considering shared citations depending on the prior chunk, no merging step Text-based detection methods Enco Encoplot-exact character 16-gram string matching Sherlock Sherlock-probabilistic  ... 
doi:10.5281/zenodo.4913344 fatcat:xmpaahvwuva53l5l5i2gaidvi4

Statistical machine translation enhancements through linguistic levels

Marta R. Costa-Jussà, Mireia Farrús
2014 ACM Computing Surveys  
One of the most popular approaches is the Statistical Machine Translation (SMT) approach, which tries to cover translation in a holistic manner by learning from parallel corpus aligned at the sentence  ...  However, with this basic approach, there are some issues at each written linguistic level (i.e., orthographic, morphological, lexical, syntactic and semantic) that remain unsolved.  ...  The authors exploit both the standard vector-space model [Salton and McGill 1983] and latent semantic indexing [Landauer et al. 1998 ].  ... 
doi:10.1145/2518130 fatcat:cy6cud32tjgvjjsgiiv5aj65zi

Association for Computational Linguistics [chapter]

G. Hirst
2006 Encyclopedia of Language & Linguistics  
The lecturers have been invited to write papers on all aspects of computational approaches to Natural Language Processing. The papers received have been revised and prepared to compose this issue.  ...  We thank all lecturers and participants who have contributed and made this publication possible.  ...  (Calvo 2013) presents different categorical approaches based on Vector Space Model (VSM) with three dimensionality reduction techniques: Latent Semantic Analysis (LSA), Probabilistic Latent Semantic Analysis  ... 
doi:10.1016/b0-08-044854-2/05234-2 fatcat:bbncnskzhvhxtbfdk5ftli7gva

Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval [article]

Wessel Kraaij, Jian-Yun Nie, Michel Simard
2003 arXiv   pre-print
Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks.  ...  on a bag-of-words.  ...  Finally, we want to thank Elliott Macklovitch and the two anonymous reviewers for their constructive comments and careful review.  ... 
arXiv:cs/0312008v1 fatcat:hztoxce3frcgpbsmegftpg4rdu

Embedding Web-Based Statistical Translation Models in Cross-Language Information Retrieval

Wessel Kraaij, Jian-Yun Nie, Michel Simard
2003 Computational Linguistics  
Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks.  ...  on a bag of words.  ...  Finally, we want to thank Elliott Macklovitch and the two anonymous reviewers for their constructive comments and careful review.  ... 
doi:10.1162/089120103322711587 fatcat:dkxidh7b3vdszodokvwhjd4nre

Pre-training Methods in Information Retrieval [article]

Yixing Fan, Xiaohui Xie, Yinqiong Cai, Jia Chen, Xinyu Ma, Xiangsheng Li, Ruqing Zhang, Jiafeng Guo
2022 arXiv   pre-print
Moreover, we discuss some open challenges and highlight several promising directions, with the hope of inspiring and facilitating more works on these topics for future research.  ...  In addition, we also introduce PTMs specifically designed for IR, and summarize available datasets as well as benchmark leaderboards.  ...  Hybrid Retrieval Models Sparse retrieval models take a (latent) word as the unit of representations, which can calculate the matching score based on exact matching signals.  ... 
arXiv:2111.13853v3 fatcat:pilemnpphrgv5ksaktvctqdi4y

Video Description: A Survey of Methods, Datasets and Evaluation Metrics [article]

Nayyer Aafaq, Ajmal Mian, Wei Liu, Syed Zulqarnain Gilani, Mubarak Shah
2019 arXiv   pre-print
It has applications in human-robot interaction, helping the visually impaired and video subtitling.  ...  Classical video description approaches combined subject, object and verb detection with template based language models to generate sentences.  ...  The research was supported by ARC Discovery Grant DP160101458 and DP150102405.  ... 
arXiv:1806.00186v3 fatcat:elxztcpzizhr7clugnbjvvrpte

A Survey on Event Extraction for Natural Language Understanding: Riding the Biomedical Literature Wave

Giacomo Frisoni, Gianluca Moro, Antonella Carbonaro
2021 IEEE Access  
INDEX TERMS Biomedical text mining, event extraction, natural language understanding, semantic parsing.  ...  Events can model complex interactions involving multiple participants having a specific semantic role, also handling nested and overlapping definitions.  ...  ACKNOWLEDGMENT The authors thank Giulio Carlassare for his contributions during productive discussions and practical experiments on biomedical corpora.  ... 
doi:10.1109/access.2021.3130956 fatcat:wlr7zeikdva77ojuppqx3vmocy

Multi-word Entity Classification in a Highly Multilingual Environment

Sophie Chesney, Guillaume Jacquet, Ralf Steinberger, Jakub Piskorski
2017 Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)  
The program also included a panel discussion on the future directions of the MWE community and the SIGLEX Section.  ...  We also want to thank the IC1207 COST Action PARSEME and SIGLEX for their endorsement and support, as well as the EACL 2017 organizers.  ...  In addition, we would like to thank Lauren Rudat for her suggestions on improving the stimuli, and to the anonymous reviewers for their suggestions on improving the paper.  ... 
doi:10.18653/v1/w17-1702 dblp:conf/mwe/ChesneyJSP17 fatcat:bv7aavgth5eurmzuphuowtuuhq

Automatic Extraction of Property Norm-Like Data From Large Text Corpora

Colin Kelly, Barry Devereux, Anna Korhonen
2013 Cognitive Science  
similarity evaluation and a Word-Net semantic similarity comparison.  ...  Traditional methods for deriving property-based representations of concepts from text have focused on extracting unspecified relationships (e.g., car -petrol) or only a subset of possible relation types  ...  Unsupervised techniques have found applications in many parts of NLP (e.g., grammar induction, word-alignment for bilingual translation) and do not suffer from the same limits on data resources; however  ... 
doi:10.1111/cogs.12091 pmid:25019134 fatcat:s4fboxw6szhcdl5znujtxffiru

Neural machine translation: A review of methods, resources, and tools

Zhixing Tan, Shuo Wang, Zonghan Yang, Gang Chen, Xuancheng Huang, Maosong Sun, Yang Liu
2020 AI Open  
In this article, we first provide a broad review of the methods for NMT and focus on methods relating to architectures, decoding, and data augmentation.  ...  In recent years, end-to-end neural machine translation (NMT) has achieved great success and has become the new mainstream method in practical MT systems.  ...  Program of China (No. 2017YFB0 202204), National Natural Science Foundation of China (No. 61925601, No. 61761166 008, No. 61772302), Beijing Academy of Artificial Intelligence, Huawei Noah's Ark Lab, and  ... 
doi:10.1016/j.aiopen.2020.11.001 fatcat:wkplwv43knb3lebicckmwbxlwu
« Previous Showing results 1 — 15 out of 157 results