9,507 Hits in 2.4 sec

Clustering tagged documents with labeled and unlabeled documents

Chien-Liang Liu, Wen-Hoar Hsaio, Chia-Hoang Lee, Chun-Hsien Chen
2013 Information Processing & Management  
This study employs our proposed semi-supervised clustering method called Constrained-PLSA to cluster tagged documents with a small amount of labeled documents and uses two data sets for system performance  ...  This study employs abstracts of papers and the tags annotated by users to cluster documents. Four combinations of tags and words are used for feature representations.  ...  Acknowledgment This work was supported in part by the National Science Council under the Grants NSC-100-2221-E-009-129 and NSC-100-2811-E-009-024.  ... 
doi:10.1016/j.ipm.2012.12.004 fatcat:kdmjhrvg6fhnhpidrgrfq4dhpq

SCT-D3 at the NTCIR-11 MedNLP-2 Task

Akinori Fujino, Jun Suzuki, Tsutomu Hirao, Hisashi Kurasawa, Katsuyoshi Hayashi
2014 NTCIR Conference on Evaluation of Information Access Technologies  
The SCT-D3 team participated in the Extraction of Complaint and Diagnosis subtask and the Normalization of Complaint and Diagnosis subtask of the NTCIR-11 MedNLP-2 Task.  ...  We tackled the two subtasks by using machine learning techniques and additional medical term dictionaries.  ...  As labeled data, we used the medical document set, ntcir11 mednlp mednlp2-train v0.xml, provided by the task organizers, which included medical terms annotated with <c> tags.  ... 
dblp:conf/ntcir/FujinoSHKH14 fatcat:euvqyjflebeezck4tycy7zi6uq

Semi-Supervised Linear Discriminant Clustering

Chien-Liang Liu, Wen-Hoar Hsaio, Chia-Hoang Lee, Fu-Sheng Gou
2014 IEEE Transactions on Cybernetics  
We use soft LDA with hard labels of labeled examples and soft labels of unlabeled examples to find a projection matrix. The clustering is then performed in the new feature space.  ...  We further discuss and analyze the influence of soft labels on classification performance by conducting experiments with different percentages of labeled examples.  ...  Each unlabeled document x i connects to k nearest labeled documents and k nearest unlabeled documents. Different weight coefficients are given in the above two cases.  ... 
doi:10.1109/tcyb.2013.2278466 pmid:23996591 fatcat:dpxxp6lcyraxhb2rzrbryy2pqa

Automatic subject heading assignment for online government publications using a semi-supervised machine learning approach

Xiao Hu, Larry S. Jackson, Sai Deng, Jing Zhang
2006 Proceedings of the American Society for Information Science and Technology  
The EM classifier makes use of easily obtained unlabeled documents and thus reduces the demand for labeled training examples.  ...  Automatic text categorization techniques can be applied to classify documents approximately, given a sufficient number of labeled training examples.  ...  ACKNOWLEDGMENTS This work was sponsored in part by a National Leadership Grant from the Institute of Museum and Library Services and by the Illinois State Library.  ... 
doi:10.1002/meet.14504201139 fatcat:yqy233xqj5hf5ffquzz7skrgla

Clustering documents with labeled and unlabeled documents using fuzzy semi-Kmeans

Chien-Liang Liu, Tao-Hsing Chang, Hsuan-Hsun Li
2013 Fuzzy sets and systems (Print)  
While focusing on document clustering, this work presents a fuzzy semi-supervised clustering algorithm called fuzzy semi-Kmeans.  ...  This work conducts experiments on three data sets and compares fuzzy semi-Kmeans with several methods.  ...  Each unlabeled document x i is connected to the k nearest labeled documents and the k nearest unlabeled documents. Different weight coefficients are given in the above two cases.  ... 
doi:10.1016/j.fss.2013.01.004 fatcat:5qgbucjr4fcxvhffxdgdqbi4gq

Robust Document Representations using Latent Topics and Metadata [article]

Natraj Raman, Armineh Nourbakhsh, Sameena Shah, Manuela Veloso
2020 arXiv   pre-print
This technique is not adequate when labeled examples are not available at training time and when the metadata artifacts in a document must be exploited.  ...  The generated document embeddings exhibit compositional characteristics and are directly used by downstream classification tasks to create decision boundaries from a small number of labeled examples, thereby  ...  In this study, we propose a transductive framework that can take advantage of a limited labeled dataset paired with a larger unlabeled dataset to generate rich representations for document classification  ... 
arXiv:2010.12681v1 fatcat:hi7thmsswvcmth62lexzf5xdz4

Weakly Supervised Slot Tagging with Partially Labeled Sequences from Web Search Click Logs

Young-Bum Kim, Minwoo Jeong, Karl Stratos, Ruhi Sarikaya
2015 Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies  
When combined with a novel initialization scheme that leverages unlabeled data, we show that our method gives significant improvement over strong supervised and weakly-supervised baselines.  ...  We extend the constrained lattice training of Täckström et al. (2013) to non-linear conditional random fields in which latent variables mediate between observations and labels.  ...  First, we cluster observation types in unlabeled data and treat the clusters as labels.  ... 
doi:10.3115/v1/n15-1009 dblp:conf/naacl/KimJSS15 fatcat:msi7v6h4zvdbrhhglyohbawwgm

Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations

Tsendsuren Munkhdalai, Meijing Li, Khuyagbaatar Batsuren, Hyeon Park, Nak Choi, Keun Ryu
2015 Journal of Cheminformatics  
Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature.  ...  We present a semi-supervised learning method that efficiently exploits unlabeled data in order to incorporate domain knowledge into a named entity recognition model and to leverage system performance.  ...  We drive the cluster label prefixes with 4, 6, 10 and 20 lengths in the Brown model by following the experiment of Turian et al.  ... 
doi:10.1186/1758-2946-7-s1-s9 pmid:25810780 pmcid:PMC4331699 fatcat:dgmnnfwsszcpdkv7k3dstaiory

Semi-supervised Categorization of Wikipedia Collection by Label Expansion [chapter]

Boris Chidlovskii
2009 Lecture Notes in Computer Science  
We cope with the case where there is a small number of labeled pages and a very large number of unlabeled ones.  ...  We address the problem of categorizing a large set of linked documents with important content and structure aspects, for example, from Wikipedia collection proposed at the INEX XML Mining track.  ...  Acknowledgment This work is partially supported by the ATASH Project co-funded by the French Association on Research and Technology (ANRT).  ... 
doi:10.1007/978-3-642-03761-0_42 fatcat:osbroim5mbepbe4qklojmldbbm

Latent semantic modeling for slot filling in conversational understanding

Gokhan Tur, Asli Celikyilmaz, Dilek Hakkani-Tur
2013 2013 IEEE International Conference on Acoustics, Speech and Signal Processing  
Our method decomposes the task into two steps: latent n-gram clustering using a semi-supervised latent Dirichlet allocation (LDA) and sequence tagging for learning semantic structures in a CU system.  ...  Similarly, for the unlabeled documents whose semantic slot tags are not known, we sample topics of each word n-gram as follows: if an unlabeled word exists in one or more lexicon dictionaries, we introduce  ...  An example utterance with semantic annotations. tag.  ... 
doi:10.1109/icassp.2013.6639285 dblp:conf/icassp/TurCH13 fatcat:qqqqwo7rrvdbfmlka7agooovme

Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression

Asli Çelikyilmaz, Dilek Hakkani-Tür, Gökhan Tür, Ruhi Sarikaya
2013 Annual Meeting of the Association for Computational Linguistics  
It can efficiently handle semantic ambiguity by extending standard topic models with two new features. First, it encodes word n-gram features from labeled source and unlabeled target data.  ...  Our new SSL approach improves semantic tagging performance by 3% absolute over the baseline models, and also compares favorably on semi-supervised syntactic tagging.  ...  We use the word-tag posterior probabilities obtained from a CRF sequence model trained on labeled utterances as features. The x={x l ,x u } has labeled (l) and unlabeled (u) parts.  ... 
dblp:conf/acl/CelikyilmazHTS13 fatcat:vqpkmom7dva4nhrgtcdan6wwse

Weakly Supervised Classification of Tweets for Disaster Management

Girish Keshav Palshikar, Manoj Apte, Deepak Pandita
2017 European Conference on Information Retrieval  
Since tweets are a diffferent category of documents than news, we next propose a model transfer algorithm, which essentially refines the model learned from news by analyzing a large unlabeled corpus of  ...  In this paper, we propose self-learning algorithms that, with minimal supervision, construct a simple bag-of-words model of information expressed in the news about various natural disasters.  ...  In addition, a large corpus U of unlabeled documents is given i.e., none of the documents in U are labeled with any class label (either +1 or −1).  ... 
dblp:conf/ecir/PalshikarAP17 fatcat:vryisecpajgq3a3bz6hrbwlwse

Filtering big data from social media – Building an early warning system for adverse drug reactions

Ming Yang, Melody Kiang, Wei Shang
2015 Journal of Biomedical Informatics  
We select drugs with more than 500 threads of discussion, and collect all the original posts and comments of these drugs using an automatic Web spidering program as the text corpus.  ...  It is expensive to manually label a large amount of ADR related messages (positive examples) and non-ADR related messages (negative examples) to train classification systems.  ...  Acknowledgments This work was partly supported by the Natural Science Foundation of China (Nos. 71301172, 71171186, 71301175, and 61272389) and Social Science Foundation of China (No. 13AXW010).  ... 
doi:10.1016/j.jbi.2015.01.011 pmid:25688695 fatcat:ci7r7u6twjf5hozujo4v5qhomu

PictureBook: A Text-and-Image Summary System for Web Search Result

Mei Wang, Hongtao Xu, Guoyu Hao, Xiangdong Zhou, Wei Wang, Qi Zhang, Baile Shi
2008 2008 IEEE 24th International Conference on Data Engineering  
Previous studies mainly focused on Web page clustering, document summary, visualization of search results, etc, which are applied separately to either text or image search.  ...  In this paper, we propose a demo to illustrate a new Web search result summary system − PictureBook, which combines text and image retrieval using techniques of multiple document summarization and image  ...  The framework of the system combines traditional Web page clustering and multi-document summarization techniques with statistical image semantic labeling models.  ... 
doi:10.1109/icde.2008.4497634 dblp:conf/icde/WangXHZWZS08 fatcat:u4naqhls35fapmk6dyhc6ved3a

Hetero-Labeled LDA: A Partially Supervised Topic Model with Heterogeneous Labels [chapter]

Dongyeop Kang, Youngja Park, Suresh N. Chari
2014 Lecture Notes in Computer Science  
the labels resulting in better classification and clustering accuracy than existing supervised or semisupervised topic models.  ...  Experiments with three document collections-Reuters, 20 Newsgroup and Delicious-validate that our model generates a better set of topics and efficiently discover additional latent topics not covered by  ...  Also, to further improve the performance of label prediction for partially labeled documents, we consider generating topic hierarchies such as Hierarchical Dirichlet Process (HDP) [23] .  ... 
doi:10.1007/978-3-662-44848-9_41 fatcat:sv4tdozw6narpgdjzna7osdjvq
« Previous Showing results 1 — 15 out of 9,507 results