Filters








82 Hits in 3.1 sec

A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005

Huihsin Tseng, Pi-Chuan Chang, Galen Andrew, Daniel Jurafsky, Christopher D. Manning
2005 Workshop on Chinese Language Processing  
We present a Chinese word segmentation system submitted to the closed track of Sighan bakeoff 2005.  ...  Our segmenter was built using a conditional random field sequence model that provides a framework to use a large number of linguistic features such as character identity, morphological and character reduplication  ...  To this end, we proposed a new model using character identity, morphological and character reduplication features in a conditional random field modeling framework.  ... 
dblp:conf/acl-sighan/TsengCAJM05 fatcat:qwzu5ayga5es5bdhvqcywkej5q

Effective Neural Solution for Multi-Criteria Word Segmentation [article]

Han He, Lei Wu, Hua Yan, Zhimin Gao, Yi Feng, George Townsend
2018 arXiv   pre-print
We present a simple yet elegant solution to train a single joint model on multi-criteria corpora for Chinese Word Segmentation (CWS).  ...  The rest of the model including Long Short-Term Memory (LSTM) layer and Conditional Random Fields (CRFs) layer remains unchanged and is shared across all datasets, keeping the size of parameter collection  ...  We employed a Conditional Random Fields (CRF) (Lafferty et al., 2001) layer as the inference layer.  ... 
arXiv:1712.02856v2 fatcat:abvenw5swjau7dey4ev7h7jdxu

Chinese Named Entity Recognition and Word Segmentation Based on Character

Jingzhou He, Houfeng Wang
2008 International Joint Conference on Natural Language Processing  
This paper presents a character-based Conditional Random Fields (CRFs) model for such two tasks.  ...  In The SIGHAN Bakeoff 2007, this model participated in all closed tracks for both Chinese NER and word segmentation tasks, and turns out to perform well.  ...  We consider both tasks as sequence labeling problem, and a character-based Conditional Random Fields (CRFs) model is applied in this Bakeoff.  ... 
dblp:conf/ijcnlp/HeW08 fatcat:64shscwcs5a3znenukwk7xdo4a

Effective Tag Set Selection in Chinese Word Segmentation via Conditional Random Field Modeling

Hai Zhao, Changning Huang, Mu Li, Bao-Liang Lu
2006 Pacific Asia Conference on Language, Information and Computation  
This paper is concerned with Chinese word segmentation, which is regarded as a character based tagging problem under conditional random field framework.  ...  We show that there is a significant performance difference as different tag sets are selected. Based on the proposed method, our system gives the state-of-the-art performance.  ...  Conditional Random Field Maximum entropy tagger was used in early character-based tagging for Chinese word segmentation [2] , [3] , while we choose linear-chain CRF as our learning model in this study  ... 
dblp:conf/paclic/ZhaoHLL06 fatcat:5faemhfojzdghcz5sg7wcpcx7i

Enhancement of Feature Engineering for Conditional Random Field Learning in Chinese Word Segmentation Using Unlabeled Data

Mike Tian-Jian Jiang, Cheng-Wei Shih, Ting-Hao Yang, Chan-Hung Kuo, Richard Tzong-Han Tsai, Wen-Lian Hsu
2012 International Journal of Computational Linguistics and Chinese Language Processing  
This work proposes a unified view of several features based on frequent strings extracted from unlabeled data that improve the conditional random fields (CRF) model for Chinese word segmentation (CWS).  ...  Processing (SIGHAN) of the Association for Computational Linguistics (ACL) and SIGHAN CWS Bakeoff 2010.  ...  Performance comparison of accuracy on SIGHAN 2005 AS corpus. 9 Feature Engineering for Conditional Random 59 Field Learning in Chinese Word Segmentation Using Unlabeled Data Configuration P C P R C R F  ... 
dblp:journals/ijclclp/JiangSYKTH12 fatcat:enz23e6pmjgu5kkrhleidxt3hi

Chinese Word Segmentation Based on Large Margin Methods

Buzhou Tang, Xuan Wang, Xiaohong Wang
2009 International Journal of Asian Language Processing  
In this paper, the large margin methods, which combine the advantages of two typical state-of-the-art methods, Support vector machines (SVMs) and Conditional Random Fields (CRFs), are presented for Chinese  ...  Chinese Word segmentation is the initial step for Chinese languages processing tasks, which transforms a Character string into a word sequence.  ...  Further, some complex language models such as Maximum Entropy (ME) (Xue, 2003) and Conditional Random Fields (CRFs) (Peng, 2004; Zhang, 2006) are also used for Chinese Word Segmentation.  ... 
dblp:journals/jclc/TangWW09 fatcat:3hv7gs2msnhqvhvlrbjxxn5onu

A Pragmatic Chinese Word Segmentation System

Wei Jiang, Yi Guan, Xiaolong Wang
2006 Workshop on Chinese Language Processing  
This paper presents our work for participation in the Third International Chinese Word Segmentation Bakeoff.  ...  The experiment indicates that this system achieves Fmeasure 96.8% in MSRA open test in the third SIGHAN-2006 bakeoff.  ...  "荷花 奖"(lotus prize) can be recognized as one word by the conditional random fields model. Conclusion We have briefly described our word segmentation system and NER system.  ... 
dblp:conf/acl-sighan/JiangGW06 fatcat:rkzbvwlqwbcslppaob4uuhaam4

A Dual-layer CRFs Based Joint Decoding Method for Cascaded Segmentation and Labeling Tasks

Yanxin Shi, Mengqiu Wang
2007 International Joint Conference on Artificial Intelligence  
We present a method that performs joint decoding of separately trained Conditional Random Field (CRF) models, while guarding against violations of hard-constraints.  ...  Evaluated on Chinese word segmentation and part-of-speech (POS) tagging tasks, our proposed method achieved state-of-the-art performance on both the Penn Chinese Treebank and First SIGHAN Bakeoff datasets  ...  Thus, we use Conditional Random Fields (CRFs) [Lafferty et al., 2001] to define these two probability terms. CRFs define conditional probability, P (Z|X), by Markov random fields.  ... 
dblp:conf/ijcai/ShiW07 fatcat:qfui3mheqbc57mo3s5fth7edu4

Chinese Segmentation with a Word-Based Perceptron Algorithm

Yue Zhang, Stephen Clark
2007 Annual Meeting of the Association for Computational Linguistics  
Closed tests on the first and second SIGHAN bakeoffs show that our system is competitive with the best in the literature, achieving the highest reported F-scores for a number of corpora.  ...  Standard approaches to Chinese word segmentation treat the problem as a tagging task, assigning labels to the characters in the sequence indicating whether the character marks a word boundary.  ...  We thank the anonymous reviewers for their insightful comments.  ... 
dblp:conf/acl/ZhangC07 fatcat:cuonxwm4ibhpbkobjvb5bqe6ja

Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model

Aaron Li-Feng Han, Xiaodong Zeng, Derek F. Wong, Lidia S. Chao
2015 Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing  
The experiment shows that the unlabeled corpus can enhance the state-of-theart conditional random field (CRF) learning model and has potential to improve the tagging accuracy even though the margin is  ...  a little weak and not satisfying in current experiments.  ...  MYRG2015-00175-FST and MYRG2015-00188-FST) and the Science and Technology Development Fund of Macau (Grant No. 057/2014/A). The first author was supported by  ... 
doi:10.18653/v1/w15-3103 dblp:conf/acl-sighan/HanZWC15 fatcat:vyjdkahqjneipamzabyw2tkoku

Training Global Linear Models for Chinese Word Segmentation [chapter]

Dong Song, Anoop Sarkar
2009 Lecture Notes in Computer Science  
Global Linear Models for Chinese Word Segmentation Possible segmentations Score for each segmentation  Find the most plausible word segmentation y' for an un-segmented Chinese sentence x: Feature weight  ...  Random Field (GLM) Training with Averaged Perceptron (GLM) Decoding with Averaged Perceptron Conditional Random Field N-best Candidates  ...  CityU MSRA UPUC Number of sentences in Training Set 57,275 46,364 18,804 Number of sentences in Test Set 7,511 4,365 5,117  PU corpus from the first SIGHAN Bakeoff, word segmentation shared  ... 
doi:10.1007/978-3-642-01818-3_15 fatcat:fjomt3gbtjgatcs3fcoynsqyoq

A Gap-Based Framework for Chinese Word Segmentation via Very Deep Convolutional Networks [article]

Zhiqing Sun, Gehui Shen, Zhihong Deng
2017 arXiv   pre-print
Results show that our approach outperforms the best character-based and word-based methods on 5 benchmarks, without any further post-processing module (e.g. Conditional Random Fields) nor beam search.  ...  However, if we consider segmenting a given sentence, the most intuitive idea is to predict whether to segment for each gap between two consecutive characters, which in comparison makes previous approaches  ...  Conditional random fields: Probabilistic models for segmenting and labeling sequence data . Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013.  ... 
arXiv:1712.09509v1 fatcat:oavkkxsxnvg53afffhk5ukr5t4

Attention Is All You Need for Chinese Word Segmentation [article]

Sufeng Duan, Hai Zhao
2020 arXiv   pre-print
With the effective encoder design, our model only needs to take unigram features for scoring. Our model is evaluated on SIGHAN Bakeoff benchmark datasets.  ...  Our model consists of an attention only stacked encoder and a light enough decoder for the greedy segmentation plus two highway connections for smoother training, in which the encoder is composed of a  ...  Besides, conditional random field (CRF) or Semi-CRF for sequence labeling has been used for both traditional and neural models though with different representations (Peng et al., 2004; Andrew, 2006; Wang  ... 
arXiv:1910.14537v3 fatcat:ed26bdgkdjcszj7rhixodujdkm

A Pragmatic Chinese Word Segmentation Approach Based on Mixing Models

Wei Jiang, Yi Guan, Xiaolong Wang
2006 International Journal of Computational Linguistics and Chinese Language Processing  
A pragmatic Chinese word segmentation approach is presented in this paper based on mixing language models.  ...  First, a class-based trigram is adopted in basic word segmentation, which applies the Absolute Discount Smoothing algorithm to overcome data sparseness.  ...  Jian Zhao for their valuable suggestions in the proposed system.  ... 
dblp:journals/ijclclp/JiangGW06 fatcat:ndmgsgdxfjg7ficoz3y2gx2aua

Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

Xian Qian, Qi Zhang, Yaqian Zhou, Xuanjing Huang, Lide Wu
2010 Conference on Empirical Methods in Natural Language Processing  
Many sequence labeling tasks in NLP require solving a cascade of segmentation and tagging subtasks, such as Chinese POS tagging, named entity recognition, and so on.  ...  Experimental evaluations on CoNLL 2000 shallow parsing data set and Fourth SIGHAN Bakeoff CTB POS tagging data set demonstrate the superiority of our method over cross-product, pipeline and candidate reranking  ...  Acknowledgements The author wishes to thank the anonymous reviewers for their helpful comments.  ... 
dblp:conf/emnlp/QianZZHW10 fatcat:lxnheqwnlrgrdaoqtoaqmk62u4
« Previous Showing results 1 — 15 out of 82 results