A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Clustering tagged documents with labeled and unlabeled documents
2013
Information Processing & Management
This study employs our proposed semi-supervised clustering method called Constrained-PLSA to cluster tagged documents with a small amount of labeled documents and uses two data sets for system performance ...
This study employs abstracts of papers and the tags annotated by users to cluster documents. Four combinations of tags and words are used for feature representations. ...
Acknowledgment This work was supported in part by the National Science Council under the Grants NSC-100-2221-E-009-129 and NSC-100-2811-E-009-024. ...
doi:10.1016/j.ipm.2012.12.004
fatcat:kdmjhrvg6fhnhpidrgrfq4dhpq
SCT-D3 at the NTCIR-11 MedNLP-2 Task
2014
NTCIR Conference on Evaluation of Information Access Technologies
The SCT-D3 team participated in the Extraction of Complaint and Diagnosis subtask and the Normalization of Complaint and Diagnosis subtask of the NTCIR-11 MedNLP-2 Task. ...
We tackled the two subtasks by using machine learning techniques and additional medical term dictionaries. ...
As labeled data, we used the medical document set, ntcir11 mednlp mednlp2-train v0.xml, provided by the task organizers, which included medical terms annotated with <c> tags. ...
dblp:conf/ntcir/FujinoSHKH14
fatcat:euvqyjflebeezck4tycy7zi6uq
Semi-Supervised Linear Discriminant Clustering
2014
IEEE Transactions on Cybernetics
We use soft LDA with hard labels of labeled examples and soft labels of unlabeled examples to find a projection matrix. The clustering is then performed in the new feature space. ...
We further discuss and analyze the influence of soft labels on classification performance by conducting experiments with different percentages of labeled examples. ...
Each unlabeled document x i connects to k nearest labeled documents and k nearest unlabeled documents. Different weight coefficients are given in the above two cases. ...
doi:10.1109/tcyb.2013.2278466
pmid:23996591
fatcat:dpxxp6lcyraxhb2rzrbryy2pqa
Automatic subject heading assignment for online government publications using a semi-supervised machine learning approach
2006
Proceedings of the American Society for Information Science and Technology
The EM classifier makes use of easily obtained unlabeled documents and thus reduces the demand for labeled training examples. ...
Automatic text categorization techniques can be applied to classify documents approximately, given a sufficient number of labeled training examples. ...
ACKNOWLEDGMENTS This work was sponsored in part by a National Leadership Grant from the Institute of Museum and Library Services and by the Illinois State Library. ...
doi:10.1002/meet.14504201139
fatcat:yqy233xqj5hf5ffquzz7skrgla
Clustering documents with labeled and unlabeled documents using fuzzy semi-Kmeans
2013
Fuzzy sets and systems (Print)
While focusing on document clustering, this work presents a fuzzy semi-supervised clustering algorithm called fuzzy semi-Kmeans. ...
This work conducts experiments on three data sets and compares fuzzy semi-Kmeans with several methods. ...
Each unlabeled document x i is connected to the k nearest labeled documents and the k nearest unlabeled documents. Different weight coefficients are given in the above two cases. ...
doi:10.1016/j.fss.2013.01.004
fatcat:5qgbucjr4fcxvhffxdgdqbi4gq
Robust Document Representations using Latent Topics and Metadata
[article]
2020
arXiv
pre-print
This technique is not adequate when labeled examples are not available at training time and when the metadata artifacts in a document must be exploited. ...
The generated document embeddings exhibit compositional characteristics and are directly used by downstream classification tasks to create decision boundaries from a small number of labeled examples, thereby ...
In this study, we propose a transductive framework that can take advantage of a limited labeled dataset paired with a larger unlabeled dataset to generate rich representations for document classification ...
arXiv:2010.12681v1
fatcat:hi7thmsswvcmth62lexzf5xdz4
Weakly Supervised Slot Tagging with Partially Labeled Sequences from Web Search Click Logs
2015
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
When combined with a novel initialization scheme that leverages unlabeled data, we show that our method gives significant improvement over strong supervised and weakly-supervised baselines. ...
We extend the constrained lattice training of Täckström et al. (2013) to non-linear conditional random fields in which latent variables mediate between observations and labels. ...
First, we cluster observation types in unlabeled data and treat the clusters as labels. ...
doi:10.3115/v1/n15-1009
dblp:conf/naacl/KimJSS15
fatcat:msi7v6h4zvdbrhhglyohbawwgm
Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations
2015
Journal of Cheminformatics
Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature. ...
We present a semi-supervised learning method that efficiently exploits unlabeled data in order to incorporate domain knowledge into a named entity recognition model and to leverage system performance. ...
We drive the cluster label prefixes with 4, 6, 10 and 20 lengths in the Brown model by following the experiment of Turian et al. ...
doi:10.1186/1758-2946-7-s1-s9
pmid:25810780
pmcid:PMC4331699
fatcat:dgmnnfwsszcpdkv7k3dstaiory
Semi-supervised Categorization of Wikipedia Collection by Label Expansion
[chapter]
2009
Lecture Notes in Computer Science
We cope with the case where there is a small number of labeled pages and a very large number of unlabeled ones. ...
We address the problem of categorizing a large set of linked documents with important content and structure aspects, for example, from Wikipedia collection proposed at the INEX XML Mining track. ...
Acknowledgment This work is partially supported by the ATASH Project co-funded by the French Association on Research and Technology (ANRT). ...
doi:10.1007/978-3-642-03761-0_42
fatcat:osbroim5mbepbe4qklojmldbbm
Latent semantic modeling for slot filling in conversational understanding
2013
2013 IEEE International Conference on Acoustics, Speech and Signal Processing
Our method decomposes the task into two steps: latent n-gram clustering using a semi-supervised latent Dirichlet allocation (LDA) and sequence tagging for learning semantic structures in a CU system. ...
Similarly, for the unlabeled documents whose semantic slot tags are not known, we sample topics of each word n-gram as follows: if an unlabeled word exists in one or more lexicon dictionaries, we introduce ...
An example utterance with semantic annotations. tag. ...
doi:10.1109/icassp.2013.6639285
dblp:conf/icassp/TurCH13
fatcat:qqqqwo7rrvdbfmlka7agooovme
Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression
2013
Annual Meeting of the Association for Computational Linguistics
It can efficiently handle semantic ambiguity by extending standard topic models with two new features. First, it encodes word n-gram features from labeled source and unlabeled target data. ...
Our new SSL approach improves semantic tagging performance by 3% absolute over the baseline models, and also compares favorably on semi-supervised syntactic tagging. ...
We use the word-tag posterior probabilities obtained from a CRF sequence model trained on labeled utterances as features. The x={x l ,x u } has labeled (l) and unlabeled (u) parts. ...
dblp:conf/acl/CelikyilmazHTS13
fatcat:vqpkmom7dva4nhrgtcdan6wwse
Weakly Supervised Classification of Tweets for Disaster Management
2017
European Conference on Information Retrieval
Since tweets are a diffferent category of documents than news, we next propose a model transfer algorithm, which essentially refines the model learned from news by analyzing a large unlabeled corpus of ...
In this paper, we propose self-learning algorithms that, with minimal supervision, construct a simple bag-of-words model of information expressed in the news about various natural disasters. ...
In addition, a large corpus U of unlabeled documents is given i.e., none of the documents in U are labeled with any class label (either +1 or −1). ...
dblp:conf/ecir/PalshikarAP17
fatcat:vryisecpajgq3a3bz6hrbwlwse
Filtering big data from social media – Building an early warning system for adverse drug reactions
2015
Journal of Biomedical Informatics
We select drugs with more than 500 threads of discussion, and collect all the original posts and comments of these drugs using an automatic Web spidering program as the text corpus. ...
It is expensive to manually label a large amount of ADR related messages (positive examples) and non-ADR related messages (negative examples) to train classification systems. ...
Acknowledgments This work was partly supported by the Natural Science Foundation of China (Nos. 71301172, 71171186, 71301175, and 61272389) and Social Science Foundation of China (No. 13AXW010). ...
doi:10.1016/j.jbi.2015.01.011
pmid:25688695
fatcat:ci7r7u6twjf5hozujo4v5qhomu
PictureBook: A Text-and-Image Summary System for Web Search Result
2008
2008 IEEE 24th International Conference on Data Engineering
Previous studies mainly focused on Web page clustering, document summary, visualization of search results, etc, which are applied separately to either text or image search. ...
In this paper, we propose a demo to illustrate a new Web search result summary system − PictureBook, which combines text and image retrieval using techniques of multiple document summarization and image ...
The framework of the system combines traditional Web page clustering and multi-document summarization techniques with statistical image semantic labeling models. ...
doi:10.1109/icde.2008.4497634
dblp:conf/icde/WangXHZWZS08
fatcat:u4naqhls35fapmk6dyhc6ved3a
Hetero-Labeled LDA: A Partially Supervised Topic Model with Heterogeneous Labels
[chapter]
2014
Lecture Notes in Computer Science
the labels resulting in better classification and clustering accuracy than existing supervised or semisupervised topic models. ...
Experiments with three document collections-Reuters, 20 Newsgroup and Delicious-validate that our model generates a better set of topics and efficiently discover additional latent topics not covered by ...
Also, to further improve the performance of label prediction for partially labeled documents, we consider generating topic hierarchies such as Hierarchical Dirichlet Process (HDP) [23] . ...
doi:10.1007/978-3-662-44848-9_41
fatcat:sv4tdozw6narpgdjzna7osdjvq
« Previous
Showing results 1 — 15 out of 9,507 results