Filters








4,629 Hits in 3.4 sec

Neural Chinese Word Segmentation with Dictionary Knowledge [article]

Junxin Liu, Fangzhao Wu, Chuhan Wu, Yongfeng Huang, Xing Xie
2018 arXiv   pre-print
Chinese word segmentation (CWS) is an important task for Chinese NLP. Recently, many neural network based methods have been proposed for CWS.  ...  The experimental results on two benchmark datasets validate that our approach can effectively improve the performance of Chinese word segmentation, especially when training data is insufficient.  ...  Luckily, many of these rare words are included in Chinese dictionary. If the neural model is aware of that "人工智能" is a Chinese word, then it can better segment the aforementioned sentence.  ... 
arXiv:1807.05849v1 fatcat:2urhgejndjdzbauc2repbrjxrm

Neural Word Segmentation Learning for Chinese [article]

Deng Cai, Hai Zhao
2016 arXiv   pre-print
Most previous approaches to Chinese word segmentation formalize this problem as a character-based sequence labeling task where only contextual information within fixed sized local windows and simple interactions  ...  In this paper, we propose a novel neural framework which thoroughly eliminates context windows and can utilize complete segmentation history.  ...  Chinese idiom dictionaries. 3 Table 5 : 5 Comparison with previous neural network models.  ... 
arXiv:1606.04300v2 fatcat:g6d5dnzul5c7fmzssdccbmkss4

Neural Word Segmentation Learning for Chinese

Deng Cai, Hai Zhao
2016 Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)  
Most previous approaches to Chinese word segmentation formalize this problem as a character-based sequence labeling task so that only contextual information within fixed sized local windows and simple  ...  In this paper, we propose a novel neural framework which thoroughly eliminates context windows and can utilize complete segmentation history.  ...  Chinese idiom dictionaries. 3 Table 5 : 5 Comparison with previous neural network models.  ... 
doi:10.18653/v1/p16-1039 dblp:conf/acl/CaiZ16 fatcat:dinvkgncqbge3fhrh6eft35nnu

Segmenting Chinese Microtext: Joint Informal-Word Detection and Segmentation with Neural Networks

Meishan Zhang, Guohong Fu, Nan Yu
2017 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence  
word detection can be helpful for microtext processing.In this work, we investigate it under the neural setting, by proposing a joint segmentation model that integrates the detection of informal words  ...  State-of-the-art Chinese word segmentation systems typically exploit supervised modelstrained on a standard manually-annotated corpus,achieving performances over 95% on a similar standard testing corpus.However  ...  We combine the advantages of both work, aiming to enhance the segmentation of Chinese microtext. On the one hand, we construct training examples automatically with the help of an external dictionary.  ... 
doi:10.24963/ijcai.2017/591 dblp:conf/ijcai/ZhangFY17 fatcat:ylgslg5q6fbhvnnktutkd7hkum

Text Window Denoising Autoencoder: Building Deep Architecture for Chinese Word Segmentation [chapter]

Ke Wu, Zhiqiang Gao, Cheng Peng, Xiao Wen
2013 Communications in Computer and Information Science  
On the PKU dataset of Chinese word segmentation bakeoff 2005, applying this method decreases the F1 error rate by 11.9% for deep neural network based models.  ...  We are the first to apply deep learning methods to Chinese word segmentation to our best knowledge.  ...  We demonstrated that deep neural networks for Chinese word segmentation can be effectively trained with this model.  ... 
doi:10.1007/978-3-642-41644-6_1 fatcat:e47ermthx5gkxmv3ilzzs2vf2a

Stochastic Tokenization with a Language Model for Neural Text Classification

Tatsuya Hiraoka, Hiroyuki Shindo, Yuji Matsumoto
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics  
Sentences are usually segmented with words or subwords by a morphological analyzer or byte pair encoding and then encoded with word (or subword) representations for neural networks.  ...  For unsegmented languages such as Japanese and Chinese, tokenization of a sentence has a significant impact on the performance of text classification.  ...  (Cai et al., 2017) proposed a similar architecture to the caching mechanism for neural Chinese word segmentation.  ... 
doi:10.18653/v1/p19-1158 dblp:conf/acl/HiraokaSM19 fatcat:i5afsmop2nck7fgurnj7faq4qm

Generating Abbreviations for Chinese Named Entities Using Recurrent Neural Network with Dynamic Dictionary

Qi Zhang, Jin Qian, Ya Guo, Yaqian Zhou, Xuanjing Huang
2016 Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing  
It combines recurrent neural network (RNN) with an architecture determining whether a given sequence of characters can be a word or not.  ...  To address this problem, we propose a novel neural network architecture to perform task.  ...  Hence, most of the Chinese natural language processing methods assume a Chinese word segmenter is used in a pre-processing step to produce word-segmented Chinese sentences as input.  ... 
doi:10.18653/v1/d16-1069 dblp:conf/emnlp/ZhangQGZH16 fatcat:n2htt6dtyjh5fcrhfv6genl3re

Intelligent diagnosis with Chinese electronic medical records based on convolutional neural networks

Xiaozheng Li, Huazhen Wang, Huixin He, Jixiang Du, Jian Chen, Jinzhun Wu
2019 BMC Bioinformatics  
Therefore, effective word segmentation, word representation and model architecture are the core technologies in the literature on Chinese EMRs.  ...  Chinese language compared with English.  ...  Furthermore, there were not Discussion Impact of the Chinese medical dictionary on word segmentation With the dictionary-based word segmentation method incorporating our pediatric medical dictionary  ... 
doi:10.1186/s12859-019-2617-8 fatcat:vsddtl6yhrhrhh34u33kqsndra

Bidirectional Gated Recurrent Unit Neural Network for Chinese Address Element Segmentation

Pengpeng Li, An Luo, Jiping Liu, Yong Wang, Jun Zhu, Yue Deng, Junjie Zhang
2020 ISPRS International Journal of Geo-Information  
This method uses the Bi-GRU neural network to generate tag features based on Chinese word segmentation and then uses the Viterbi algorithm to perform tag inference to achieve the segmentation of Chinese  ...  Coupled with the diversity and complexity in Chinese address expressions, the segmentation of Chinese address elements is a substantial challenge.  ...  Currently, the most commonly used Chinese word segmentation methods are as follows: (1) The dictionary-based string-matching method matches the strings to be segmented with a dictionary library one by  ... 
doi:10.3390/ijgi9110635 fatcat:7da2vdilnrcufmked3nazktnyu

Shrinking

Arseny Tolmachev, Daisuke Kawahara, Sadao Kurohashi
2019 Proceedings of the 2019 Conference of the North  
For languages without natural word boundaries, like Japanese and Chinese, word segmentation is a prerequisite for downstream analysis.  ...  Morphological analyzers are trained on data hand-annotated with segmentation boundaries and part of speech tags.  ...  A neural model with only the unigram character input can solve word segmentation and POS tagging only if it builds some knowledge about the dictionary internally.  ... 
doi:10.18653/v1/n19-1281 dblp:conf/naacl/TolmachevKK19 fatcat:pv2kqfv3yfaj7kayx7hfrl3thy

A New Chinese Word Segmentation Method Based on Maximum Matching

Yue Zhao, Hang Li, Shoulin Yin, Yang Sun
2018 Journal of Information Hiding and Multimedia Signal Processing  
However, Chinese unique composition determines the Chinese is far more complicated than English. So in this paper, we propose a new Chinese word segmentation method based on maximum matching.  ...  Automatic Chinese word segmentation is a hot issue in information extraction, machine translation, information retrieval, automatic text categorization, speech recognition, and the voice conversion, natural  ...  Chinese word segmentation, i.e., a Chinese character sequence is segmented into words according to certain rules.  ... 
dblp:journals/jihmsp/ZhaoLYS18 fatcat:q5engxoupjd35fk4jnufqui35y

Vietnamese Word Segmentation

Dinh Dien, Hoang Kiem, Nguyen Van Toan
2001 Natural Language Processing Pacific Rim Symposium  
We evaluate the performance by comparing its word segmentation results with the manually annotated corpus and its performance proves to be very good.  ...  This word segmentation system is applied to Text-to-speech of Vietnamese and POS-tagger of Vietnamese.  ...  We apply WFST model for Chinese Word segmentation into our task as follows (Richard Sproat, 1996) : We represent the dictionary D as a Weighted Finite State Transducer.  ... 
dblp:conf/nlprs/DienKT01 fatcat:ghy3j4jwyvftply4ckwh7n6kqu

Morphological Analysis for Unsegmented Languages using Recurrent Neural Network Language Model

Hajime Morita, Daisuke Kawahara, Sadao Kurohashi
2015 Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing  
We present a new morphological analysis model that considers semantic plausibility of word sequences by using a recurrent neural network language model (RNNLM).  ...  In unsegmented languages, since language models are learned from automatically segmented texts and inevitably contain errors, it is not apparent that conventional language models contribute to morphological  ...  Neural network based models have been proposed for Chinese word segmentation and POS tagging (Pei et al., 2014) or word segmentation (Mansur et al., 2013) .  ... 
doi:10.18653/v1/d15-1276 dblp:conf/emnlp/MoritaKK15 fatcat:zyshrvbbkbfdbkbkbr3ad4swuu

Chinese News Text Classification Based on Convolutional Neural Network

Hanxu Wang, Xin Li
2022 Journal on Big Data  
This paper introduces a combinedconvolutional neural network text classification model based on word2vec and improved TF-IDF: firstly, the word vector is trained through word2vec model, then the weight  ...  With the explosive growth of Internet text information, the task of text classification is more important.  ...  Chinese Word Segmentation The common means of text word segmentation are: using word segmentation tools to segment the text directly, using existing dictionaries to segment words, and establishing word  ... 
doi:10.32604/jbd.2022.027717 fatcat:lovtogkvgrcynnlrnbg7euonde

Pre-screening Textual Based Evaluation for the Diagnosed Female Breast Cancer (WBC)

Mahmood Alhlffee
2019 Revue d'intelligence artificielle : Revue des Sciences et Technologies de l'Information  
Keywords: virtual assistance, sequence to sequence neural network, bigram and trigram Neural network for word segmentation (WS) CWS is usually refer as Chinese-based labelling. For each  ...  The integrated models are critical to text-based Chinese word segmentation (CWS). The sequence-to-sequence learning was introduced to covert the CWS into a framework of sequence classification.  ...  Recently, a different number of neural network approaches models have been proposed for Chinese word segmentation (CWS).  ... 
doi:10.18280/ria.330401 fatcat:ada3hfbnd5a5zmhazyepc7coxq
« Previous Showing results 1 — 15 out of 4,629 results