12,423 Hits in 6.7 sec

Comparing Neural- and N-Gram-Based Language Models for Word Segmentation [article]

Yerai Doval, Carlos Gómez-Rodríguez
2018 Journal of the Association for Information Science and Technology   pre-print
In this article we propose an approach based on a beam search algorithm and a language model working at the byte/character level, the latter component implemented either as an n-gram model or a recurrent  ...  Word segmentation is the task of inserting or deleting word boundary characters in order to separate character sequences that correspond to words in some language.  ...  both the Galician Network for Lexicography-RELEX (ED431D R2016/046) and Grant ED431B-2017/01.  ... 
doi:10.1002/asi.24082 pmid:30775406 pmcid:PMC6360409 arXiv:1812.00815v1 fatcat:s3ob2bvidvedlidnxnbup3jmcq

Pre-screening Textual Based Evaluation for the Diagnosed Female Breast Cancer (WBC)

Mahmood Alhlffee
2019 Revue d'intelligence artificielle : Revue des Sciences et Technologies de l'Information  
Our VA was developed based on the neural network called long short-term memory (LSTM), integrating two N-gram models, namely, bigram and trigram.  ...  Keywords: virtual assistance, sequence to sequence neural network, bigram and trigram Neural network for word segmentation (WS) CWS is usually refer as Chinese-based labelling. For each  ...  Figure 3 shows the flowchart of the proposed model which is mainly consists of two logics: Chinese Word Segmentation and Bi-gram and Tri-gram language model.  ... 
doi:10.18280/ria.330401 fatcat:ada3hfbnd5a5zmhazyepc7coxq

MiNgMatch—A Fast N-gram Model for Word Segmentation of the Ainu Language

Karol Nowakowski, Michal Ptaszynski, Fumito Masui
2019 Information  
Furthermore, we performed a series of experiments comparing our algorithm with systems utilizing state-of-the-art lexical n-gram-based language modelling techniques (namely, Stupid Backoff model and a  ...  model with modified Kneser-Ney smoothing), as well as a neural model performing word segmentation as character sequence labelling.  ...  We are also grateful to Jagna Nieuważny and Ali Bakdur for useful discussions. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/info10100317 fatcat:pfprdpqmsvebtpuo37wegk2bbq

Investigation on N-Gram Approximated RNNLMs for Recognition of Morphologically Rich Speech [chapter]

Balázs Tarján, György Szaszák, Tibor Fegyó, Péter Mihajlik
2019 Lecture Notes in Computer Science  
We compare the performance of conventional back-off n-gram language models (BNLM), BNLM approximation of RNNLMs (RNN-BNLM) and RNN n-grams in terms of perplexity and word error rate (WER).  ...  Recurrent Neural Network Language Model (RNNLM) can provide remedy for the high perplexity of the task; however, two-pass decoding introduces a considerable processing delay.  ...  recurrent neural language model to the back-off n-gram model used in the single-pass decoding.  ... 
doi:10.1007/978-3-030-31372-2_19 fatcat:2vwgitaoarbg3kouu5mrqzeliy

Advances in Subword-based HMM-DNN Speech Recognition Across Languages

Peter Smit, Sami Virpioja, Mikko Kurimo
2020 Computer Speech and Language  
Our approach performs well even for English, where the phoneme-based acoustic models and word-based language models typically dominate: The phoneme-based baseline performance can be reached and improved  ...  The results show that the benefits of short subwords are even more consistent with NNLMs than with traditional n-gram language models.  ...  Acknowledgements This work was supported by Svenska folkskolans v€ anner r.f. via the DigiTala project, Business Finland's Challenge Finland project TELLme, Kone foundation, EU's Horizon 2020 research and  ... 
doi:10.1016/j.csl.2020.101158 fatcat:xs2lq7o4cbgfdbry5nxqlsz7ra

Smooth Bilingual N-Gram Translation

Holger Schwenk, Marta R. Costa-jussà, José A. R. Fonollosa
2007 Conference on Empirical Methods in Natural Language Processing  
Using a continuous space model for the translation model and the target language model, an improvement of 1.5 BLEU on the test data is observed.  ...  A neural network is used to perform the projection and the probability estimation. Smoothing probabilities is most important for tasks with a limited amount of training material.  ...  the the Spanish government under a FPU grant and the project AVIVAVOZ (TEC2006-13964-C03).  ... 
dblp:conf/emnlp/SchwenkCF07 fatcat:twfkfpiumjfshk6hkrxmlm4i4y

The KIT-LIMSI Translation System for WMT 2015

Thanh-Le Ha, Quoc-Khanh DO, Eunah Cho, Jan Niehues, Alexandre Allauzen, François Yvon, Alex Waibel
2015 Proceedings of the Tenth Workshop on Statistical Machine Translation  
This year we improved our systems' performance over last year through n-best list rescoring using neural networkbased translation and language models and novel discriminative models based on different  ...  We submitted phrase-based translation systems for three directions, namely English→German, German→English, and English→Vietnamese.  ...  Word-based and non-word language models such as bilingual, POS-based and cluster language models are integrated in the system. Conventional DWLs using source n-grams are also utilized in this phase.  ... 
doi:10.18653/v1/w15-3012 dblp:conf/wmt/HaDCNAYW15 fatcat:slwyzyoqvvgyvdenel3gb64nma

Unsupervised morph segmentation and statistical language models for vocabulary expansion

Matti Varjokallio, Dietrich Klakow
2016 Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)  
We propose a combination of unsupervised morph segmentation and statistical language models and evaluate on languages from the Babel corpus.  ...  This work explores the use of unsupervised morph segmentation along with statistical language models for the task of vocabulary expansion.  ...  The work was partially funded by the Saarland University SFB1102 Collaborative Research Center for Information Density and Linguistic Encoding.  ... 
doi:10.18653/v1/p16-2029 dblp:conf/acl/VarjokallioK16 fatcat:j6vlbismsrdxhjidlgah277r5e

Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2016

Katsuhito Sudoh, Masaaki Nagata
2016 Workshop on Asian Translation  
Chinese words are reordered by a learning-to-rank model based on pairwise classification to obtain word order close to Japanese.  ...  In this year's system, two different machine translation methods are compared: traditional phrase-based statistical machine translation and recent sequence-to-sequence neural machine translation with an  ...  Acknowledgments We greatly appreciate the workshop organizers for this valuable evaluation campaign. We also thank the Japan Patent Office for providing its patent translation dataset.  ... 
dblp:conf/aclwat/SudohN16 fatcat:kdwrqq3wirhfrkq5btlin7rxqu

Real-time Neural-based Input Method [article]

Jiali Yao, Raphael Shu, Xinjian Li, Katsutoshi Ohtsuki, Hideki Nakayama
2018 arXiv   pre-print
In this work, we apply a LSTM-based language model to input method and evaluate its performance for both prediction and conversion tasks with Japanese BCCWJ corpus.  ...  It converts sequential keyboard inputs to the characters in its target language, which is indispensable for Japanese and Chinese users.  ...  EVALUATION OF NEURAL-BASED INPUT METHOD We first compared the neural model performance with a conventional n-gram model.  ... 
arXiv:1810.09309v1 fatcat:qjyohpdwurgw7dtyznaqiygbau

Paraphrastic language models

X. Liu, M.J.F. Gales, P.C. Woodland
2014 Computer Speech and Language  
Only modelling the observed surface word sequence can result in poor context coverage and generalization, for example, when using n-gram language models (LMs).  ...  the baseline n-gram and neural network LMs respectively.  ...  Acknowledgments The research leading to these results was supported by EPSRC grant EP/I031022/1 (Natural Speech Technology) and DARPA under the Broad Operational Language Translation (BOLT) program.  ... 
doi:10.1016/j.csl.2014.04.004 fatcat:iumgn7pu2ra2zn6agt6ywab33u

The KIT-LIMSI Translation System for WMT 2014

Quoc Khanh Do, Teresa Herrmann, Jan Niehues, Alexander Allauzen, François Yvon, Alex Waibel
2014 Proceedings of the Ninth Workshop on Statistical Machine Translation  
Originally, SOUL translation models were applied to n-gram-based translation systems that use tuples as translation units instead of phrase pairs.  ...  The baseline system already includes several models like conventional language models on different word factors and a discriminative word lexicon. This system is used to generate a k-best list.  ...  Acknowledgments The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n • 287658 as well as the French Armaments  ... 
doi:10.3115/v1/w14-3307 dblp:conf/wmt/DoHNAYW14 fatcat:im27ssuojjaato672j5bmsko6m

Exploration of the Impact of Maximum Entropy in Recurrent Neural Network Language Models for Code-Switching Speech

Ngoc Thang Vu, Tanja Schultz
2014 Proceedings of the First Workshop on Computational Approaches to Code Switching  
First, we explore extensively the integration of part-of-speech tags and language identifier information in recurrent neural network language models for Code-Switching.  ...  This paper presents our latest investigations of the jointly trained maximum entropy and recurrent neural network language models for Code-Switching speech.  ...  (Chan et al., 2006) compare four different kinds of n-gram language models to predict Code-Switching.  ... 
doi:10.3115/v1/w14-3904 dblp:conf/acl-codeswitch/VuS14 fatcat:bohxkvtp55b6bdfgsbmirg6drq

A Deep Learning Based Approach to Transliteration

Soumyadeep Kundu, Sayantan Paul, Santanu Pal
2018 Proceedings of the Seventh Named Entities Workshop  
In the NEWS 2018 Shared Task on Transliteration, our method achieves top performance for the En-Pe and Pe-En language pairs and comparable results for other cases.  ...  Though a number of statistical models for transliteration have already been proposed in the past few decades, we proposed some neural network based deep learning architectures for the transliteration of  ...  Now, when an input word is considered, the word is searched according to these character n-grams and are segmented accordingly.  ... 
doi:10.18653/v1/w18-2411 dblp:conf/aclnews/KunduPP18 fatcat:2fkhcnthbfehhocxs6eslsqzlu

The UEDIN English ASR system for the IWSLT 2013 evaluation

Peter Bell, Fergus McInnes, Siva Reddy Gangireddy, Mark Sinclair, Alexandra Birch, Steve Renals
2013 International Workshop on Spoken Language Translation  
neural network language model.  ...  Improvements to our system since the 2012 evaluationwhich include the use of a significantly improved n-gram language model -result in a 19% relative WER reduction on the tst2012 set. 4.  ...  Language modelling The ASR system used Kneser-Ney smoothed N-gram language models for decoding and lattice rescoring, and a recurrent neural network (RNN) language model for a final rescoring stage based  ... 
dblp:conf/iwslt/BellMGSBR13 fatcat:c3k425uy4nda7gmq4xu6ngm5m4
« Previous Showing results 1 — 15 out of 12,423 results