Filters








1,549 Hits in 6.2 sec

Probabilistic Chinese word segmentation with non-local information and stochastic training

Xu Sun, Yaozhong Zhang, Takuya Matsuzaki, Yoshimasa Tsuruoka, Jun'ichi Tsujii
2013 Information Processing & Management  
In this article, we focus on Chinese word segmentation by systematically incorporating non-local information based on latent variables and word-level features.  ...  Differing from previous work which captures non-local information by using semi-Markov models, we propose an alternative method for modeling non-local information: a latent variable word segmenter employing  ...  This work is supported by National High Technology Research and Development Program of China (863 Program) (No. 2012AA011101) and National Natural Science Foundation of China (Nos. 91024009 and 60973053  ... 
doi:10.1016/j.ipm.2012.12.003 fatcat:2pznp54hu5ebzc7phb3y5hpeia

Chunk Parsing and Entity Relation Extracting to Chinese Text by Using Conditional Random Fields Model

Junhua Wu, Longxia Liu
2010 Journal of Intelligent Learning Systems and Applications  
Conditional random fields (CRFs) model is the valid probabilistic model to segment and label sequence data. This paper models chunk and entity relation problems in Chinese text.  ...  Chunk parsing and entity relation extracting is important work to understanding information semantic in natural language processing.  ...  Label Bias Classical discriminative Markov models, maximum entropy taggers (Ratnaparkhi, 1996) , and MEMMs, as well as non-probabilistic sequence tagging and segmentation models with independently trained  ... 
doi:10.4236/jilsa.2010.23017 fatcat:3zygc42m6ncplc2nxoymwisiz4

A Survey on Journey of Topic Modeling Techniques from SVD to Deep Learning

Deepak Sharma, Bijendra Kumar, Satish Chand
2017 International Journal of Modern Education and Computer Science  
A topic is a group of words that frequently occur together. A topic modeling can connect words with similar meanings and make a distinction between uses of words with several meanings.  ...  We have used the three hierarchical classification criteria's for classifying topic models that include LDA and non-LDA based, bag-of-words or sequence-of-words approach and unsupervised or supervised  ...  The training of the model has been accomplished by applying back-propagation algorithm for adjusting weights and stochastic gradient descent with L 2 norm regularization.  ... 
doi:10.5815/ijmecs.2017.07.06 fatcat:nadnmsoj4zdi7onlxivrne6gqm

Stochastic language models for style-directed layout analysis of document images

T. Kanungo, Song Mao
2003 IEEE Transactions on Image Processing  
The exact form of the hierarchy and the stochastic language is specified by the user, while the probabilities associated with the transitions are estimated from groundtruth data.  ...  While many segmentation algorithms exist in the literature, very few i) allow users to specify the physical style, and ii) incorporate user-specified style information into the algorithm's objective function  ...  Chou for providing relevant references; and D. Oard, P. Resnik, S. Khudanpur, and D. Yarowsky for discussions on the problem of dictionary parsing for translingual information access.  ... 
doi:10.1109/tip.2003.811487 pmid:18237934 fatcat:5fc4gtnbnnbwpkcy3qhxl4eipe

Towards an Hybrid Approach for Semantic Arabic Spontaneous Speech Analysis

Chahira Lhioui, Anis Zouaghi, Mounir Zrigui
2015 Research in Computing Science  
We also present, our corpus, inspired from MEDIA and LUNA project corpora, collected with the Wizard of Oz method. This corpus deals with the touristic Arabic information and hotel reservation.  ...  This hybridization has the advantage of being robust while coping with irregularities of oral language such as the non-fixed order of words, selfcorrections, repetitions, false departures which are called  ...  These three levels describe respectively three data types: syntaxicosemantic rules, conceptual and probabilistic information.  ... 
doi:10.13053/rcs-90-1-8 fatcat:zzqfmqqe3rewpozljsuyiqhaki

Towards a Hybrid Approach to Semantic Analysis of Spontaneous Arabic Speech

Chahira Lhioui, Anis Zouaghi, Mounir Zrigui
2014 International Journal of Computational Linguistics and Applications  
We also present, our corpus, inspired from MEDIA and LUNA project corpora, collected with the Wizard of Oz method. This corpus deals with the touristic Arabic information and hotel reservation.  ...  This hybridization has the advantage of being robust while coping with irregularities of oral language such as the non-fixed order of words, self-corrections, repetitions, false departures which are called  ...  They have used a stochastic language model for spontaneous Arabic speech semantic analysis in the context of a restricted field (Train Information).  ... 
dblp:journals/ijcla/LhiouiZZ14 fatcat:aehyhcf64jhe5ptm4xxj4vbbpi

Word Alignment Modeling with Context Dependent Deep Neural Network

Nan Yang, Shujie Liu, Mu Li, Ming Zhou, Nenghai Yu
2013 Annual Meeting of the Association for Computational Linguistics  
learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences.  ...  Experiments on a large scale English-Chinese word alignment task show that the proposed method outperforms the HMM and IBM model 4 baselines by 2 points in F-score.  ...  We also thank Dongdong Zhang, Lei Cui, Chunyang Wu and Zhenyan He for fruitful discussions.  ... 
dblp:conf/acl/YangLLZY13 fatcat:lnlxjs7xjzaold5kvpzctgfc3a

Keyword Spotting from Online Chinese Handwritten Documents Using One-vs-All Trained Character Classifier

Heng Zhang, Da-Han Wang, Cheng-Lin Liu
2010 2010 12th International Conference on Frontiers in Handwriting Recognition  
To overcome the ambiguity of character segmentation, multiple candidates of character patterns are generated by over-segmentation, and sequences of candidate characters are matched with the query word  ...  For words of four characters, the recall, precision and F measure are 87. 25%, 94.84% and 90.88%, respectively.  ...  Acknowledgements This work is supported by the National Natural Science Foundation of China (NSFC) under grants no. 60775004, 60825301 and 60933010.  ... 
doi:10.1109/icfhr.2010.49 dblp:conf/icfhr/ZhangWL10 fatcat:fxzhbz2i3nhn7pbuiesbxbfdfq

Probabilistic Graph-based Dependency Parsing with Convolutional Neural Network

Zhisong Zhang, Hai Zhao, Lianhui Qin
2016 Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)  
This paper presents neural probabilistic parsing models which explore up to thirdorder graph-based parsing with maximum likelihood training criteria.  ...  Secondly, a linear layer is added to integrate different order neural models and trained with perceptron method.  ...  The training process utilizes a mini-batched stochastic gradient descent method with momentum.  ... 
doi:10.18653/v1/p16-1131 dblp:conf/acl/ZhangZQ16 fatcat:zvv2likasrc4xoqif35pu6zwwe

Neural Word Segmentation Learning for Chinese [article]

Deng Cai, Hai Zhao
2016 arXiv   pre-print
Most previous approaches to Chinese word segmentation formalize this problem as a character-based sequence labeling task where only contextual information within fixed sized local windows and simple interactions  ...  In this paper, we propose a novel neural framework which thoroughly eliminates context windows and can utilize complete segmentation history.  ...  The model with only word score can be regarded as the situation that the segmentation decisions are made only based on local window information.  ... 
arXiv:1606.04300v2 fatcat:g6d5dnzul5c7fmzssdccbmkss4

Feature-Frequency–Adaptive On-line Training for Fast and Accurate Natural Language Processing

Xu Sun, Wenjie Li, Houfeng Wang, Qin Lu
2014 Computational Linguistics  
Experiments are conducted based on well-known benchmark tasks, including named entity recognition, word segmentation, phrase chunking, and sentiment analysis.  ...  These tasks consist of three structured classification tasks and one non-structured classification task, with binary features and real-valued features, respectively.  ...  Acknowledgments This work is a substantial extension of the conference version presented at ACL 2012 (Sun, Wang, and Li 2012) .  ... 
doi:10.1162/coli_a_00193 fatcat:xnweb3j4ebfz5aibvlm5bo2nwu

Scene Text Recognition with Sliding Convolutional Character Models [article]

Fei Yin, Yi-Chao Wu, Xu-Yao Zhang, Cheng-Lin Liu
2017 arXiv   pre-print
recognize unknown words. (4) The recognition process is highly parallel and enables fast recognition.  ...  ; (2) The model can be trained simply and efficiently because it avoids gradient vanishing/exploding in training RNN-LSTM based models; (3) It bases on character models trained free of lexicon, and can  ...  The network is trained with stochastic gradient descent (SGD) implemented by Torch 7 [4] .  ... 
arXiv:1709.01727v1 fatcat:hvp6a4lk7rbabf7qjoj3ewqk2y

A Hybrid Approach to Detect and Localize Texts in Natural Scene Images

Yi-Feng Pan, Xinwen Hou, Cheng-Lin Liu
2011 IEEE Transactions on Image Processing  
A text region detector is designed to estimate the text existing confidence and scale information in image pyramid, which help segment candidate text components by local binarization.  ...  Finally, text components are grouped into text lines/words with a learning-based energy minimization method.  ...  Zhou and F. Yin for helpful discussions, and to the anonymous reviewers for valuable comments.  ... 
doi:10.1109/tip.2010.2070803 pmid:20813645 fatcat:jom6t3r67bam7hp5ctx4hppbwu

A Generative Model of Phonotactics

Richard Futrell, Adam Albright, Peter Graff, Timothy J. O'Donnell
2017 Transactions of the Association for Computational Linguistics  
by phonologically-informed structure building operations.  ...  We learn an inventory of subparts by applying stochastic memoization (Johnson et al., 2007; Goodman et al., 2008) to a generative process for phonemes structured as an and-or graph, based on concepts of  ...  Acknowledgments We would like to thank Tal Linzen, Leon Bergen, Edward Flemming, Edward Gibson, Bob Berwick, Jim Glass, and the audiences at MIT's Phonology Circle, SIGMORPHON, and the LSA 2016 Annual  ... 
doi:10.1162/tacl_a_00047 fatcat:327pqlhm7bbp3hz4vs2dv6bjhu

Temporal enhanced sentence-level attention model for hashtag recommendation

Jun Ma, Chong Feng, Ge Shi, Xuewen Shi, Heyang Huang
2018 CAAI Transactions on Intelligence Technology  
with hashtags having wrong labels.  ...  Meanwhile, recency also plays an important role in microblog hashtag, but the information is not used in the existing studies.  ...  The work was mainly supported by the National Key Research and Development Program of China (no. 2017YFB1002101) and the National Natural Science Foundation of China (no. U1636203). References  ... 
doi:10.1049/trit.2018.0012 fatcat:mo2la4izqnacnnekr5rtavcmmy
« Previous Showing results 1 — 15 out of 1,549 results