39,664 Hits in 6.9 sec

Co-Training on Textual Documents with a Single Natural Feature Set

Jason Chan, Irena Koprinska, Josiah Poon
2004 Australasian Document Computing Symposium  
This paper investigates the performance of co-training with only one natural feature set in two applications: Web page classification and email filtering.  ...  Co-training is a semi-supervised technique that allows classifiers to learn with fewer labelled documents by taking advantage of the more abundant unclassified documents.  ...  We compare co-training of a single natural feature set and co-training with two natural feature sets.  ... 
dblp:conf/adcs/ChanKP04 fatcat:mmhmr3gfl5cjraubramyeb754u

Collective classification of textual documents by guided self-organization in T-Cell cross-regulation dynamics

Alaa Abi-Haidar, Luis M. Rocha
2011 Evolutionary Intelligence  
of T-Cells in interaction with an idealized antigen presenting cell capable of presenting a single antigen.  ...  More specifically, here we test our model on a dataset of publicly available full-text biomedical articles provided by the BioCreative challenge (Krallinger in The biocreative ii. 5 challenge overview,  ...  Acknowledgements This work was partially supported by a grant from the FLAD Computational Biology Collaboratorium at the Instituto Gulbenkian de Ciencia in Portugal.  ... 
doi:10.1007/s12065-011-0052-5 fatcat:oum2zzruzfc7jjleciyd2d6wui

Collective Classification of Biomedical Articles using T-Cell Cross-regulation

Alaa Abi-Haidar, Luis Mateus Rocha
2010 Workshop on the Synthesis and Simulation of Living Systems  
, two classes with relevant and irrelevant documents for a given concept (e.g. articles with protein-protein interaction information).  ...  We continue our investigation of a bio-inspired solution for binary classification of textual documents inspired by T-cell cross-regulation in the vertebrate adaptive immune system, which is a complex  ...  Therefore, the dynamics of the ABCRM can self-correct initial erroneous biases from the natural textual co-occurrence of features.  ... 
dblp:conf/alife/Abi-HaidarR10 fatcat:euouxmq2ufhhpo62xeclikac5m

Feature Selection and Generalisation for Retrieval of Textual Cases [chapter]

Nirmalie Wiratunga, Ivan Koychev, Stewart Massie
2004 Lecture Notes in Computer Science  
Experiments with four textual data sets show significant improvement in retrieval accuracy whenever generalised features are used.  ...  The results further suggest that boosted decision stumps with generalised features to be a promising combination.  ...  Acknowledgements We thank Susan Craw, Rob Lothian and Dietrich Wettschereck for helpful discussions on this work.  ... 
doi:10.1007/978-3-540-28631-8_58 fatcat:oaibkkmeurdyhmbmez7mmsnmpi

A quantum-inspired multimodal sentiment analysis framework

Yazhou Zhang, Dawei Song, Peng Zhang, Panpan Wang, Jingfei Li, Xiang Li, Benyou Wang
2018 Theoretical Computer Science  
., an image that is associated with a textual description or a set of textual labels).  ...  The key challenge is rooted on the "semantic gap" between different lowlevel content features and high-level semantic information.  ...  Single textual model: we use word embeddings, for which the dimensionality is set to 100 [73], to represent all textual documents, and train a RF classifier or an SVM classifier (whose parameters use the  ... 
doi:10.1016/j.tcs.2018.04.029 fatcat:rpnlxnvps5fklcqozncurg3tyq

Compound Document Analysis by Fusing Evidence Across Media

Spiros Nikolopoulos, Christina Lakka, Ioannis Kompatsiaris, Christos Varytimidis, Konstantinos Rapantzikos, Yannis Avrithis
2009 2009 Seventh International Workshop on Content-Based Multimedia Indexing  
Experiments performed on a set of 54 compound documents showed that the proposed scheme is able to exploit the existing cross media relations and achieve performance improvements.  ...  It is essentially a late-fusion mechanism that operates on top of single-media extractors output and it's main novelty relies on using the evidence extracted from heterogeneous media sources to perform  ...  The global classifier was trained on a set of 3500 images that was manually annotated, while the region classifiers were trained on a dataset of 690 images of car interiors that were also manually annotated  ... 
doi:10.1109/cbmi.2009.35 dblp:conf/cbmi/NikolopoulosLKVRA09 fatcat:ysg3uw5zuzbn3ejop3ljh7jymq

Recent Trends in Deep Learning Based Open-Domain Textual Question Answering Systems

Zhen Huang, Shiyi Xu, Minghao Hu, Xinyi Wang, Jinyan Qiu, Yongquan Fu, Yuncai Zhao, Yuxing Peng, Changjian Wang
2020 IEEE Access  
To address this issue, we present a thorough survey to explicitly give the task scope of open-domain textual QA, overview recent key advancements on deep learning based open-domain textual QA, illustrate  ...  However, a comprehensive review of existing approaches and recent trends is lacked in this field.  ...  On the other hand, some bunch of works need to search and filter the paragraphs from multiple documents in open-domain textual QA settings.  ... 
doi:10.1109/access.2020.2988903 fatcat:po4euxfronf3pob52qc2wcgrre

Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity [article]

Sheshera Mysore, Arman Cohan, Tom Hope
2022 arXiv   pre-print
To train our model, we exploit a naturally-occurring source of supervision: sentences in the full-text of papers that cite multiple papers together (co-citations).  ...  We present a new scientific document similarity model based on matching fine-grained aspects of texts.  ...  We train BERT E on sets of co-citation contexts referencing the same set of papers (i.e. E) in a contrastive learning setup with random in-batch negative samples.  ... 
arXiv:2111.08366v3 fatcat:zltoorxljvbk5e2h3epvpv3kou

Domain Agnostic Few-Shot Learning For Document Intelligence [article]

Jaya Krishna Mandivarapu, Eric bunch, Glenn fung
2021 arXiv   pre-print
Few-shot learning aims to generalize to novel classes with only a few samples with class labels.  ...  Here the domain shift is significant, going from natural images to the semi-structured documents of interest.  ...  Meta-training a few-shot model on document data, 2. Combining visual and textual feature channels via canonical correlation, and 3. Domain adaptation of models trained on image data.  ... 
arXiv:2111.00007v1 fatcat:jkx3lekbbrgx7aixbhd53hglm4

LeSSA: A Unified Framework based on Lexicons and Semi-Supervised Learning Approaches for Textual Sentiment Classification

Jawad Khan, Young-Koo Lee
2019 Applied Sciences  
(b) training classification models based on a high-quality training dataset generated by using k-mean clustering, active learning, self-learning, and co-training algorithms.  ...  A reliable training data is vital to learn a sentiment classifier for textual sentiment classification, but due to domain heterogeneity, manually construction of reliable labeled sentiment corpora is a  ...  In order to obtain the most representative initial training set, first we cluster the documents into k-clusters, and then, a single document from each cluster closest to the respective centroids based  ... 
doi:10.3390/app9245562 fatcat:adzlvshbmbfklew457auwrh7ue

A Multi-Classifier Based Guideline Sentence Classification System

Mi Hwa Song, Sung Hyun Kim, Dong Kyun Park, Young Ho Lee
2011 Healthcare Informatics Research  
The additional sub-filtering using a combination of multi-classifiers was found to be more effective than a single conventional Term Frequency-Inverse Document Frequency (TF-IDF)-based search system in  ...  A transformation function is also used that extracts a predefined set of structural feature vectors determined by analyzing the sentential instance in terms of the underlying syntactic structures and phrase-level  ...  The lexical feature, which is Boolean by nature, cannot explain the co-occurring and repetitive template-based features like [should + be + VBN].  ... 
doi:10.4258/hir.2011.17.4.224 pmid:22259724 pmcid:PMC3259557 fatcat:a5k2t66embhmfjdfxwlkhe5nqm

Tagging and Tag Recommendation [chapter]

Fabiano Belém, Jussara Almeida, Marcos Gonçalves
2019 Text Mining - Analysis, Programming and Application [Working Title]  
Consisting of freely chosen keywords assigned to objects by users, tags represent a simpler, cheaper, and a more natural way of organizing content than a fixed taxonomy with a controlled vocabulary.  ...  Moreover, recent studies have demonstrated that among other textual features such as title, description, and user comments, tags are the most effective to support information retrieval (IR) services such  ...  textual feature (tags, in this case), instead of the full set of terms associated with the objects in the training dataset D.  ... 
doi:10.5772/intechopen.82242 fatcat:rn5xcdkpkjaffm32tal7cehx5u

An approach based on classifier combination for online handwritten text and non-text classification in Devanagari script

Rajib Ghosh, Saurav Shanu, Sugandha Ranjan, Khusboo Kumari
2019 Sadhana (Bangalore)  
The efficiency of the present system has been measured on a self-generated dataset and it provides promising result.  ...  The features are then studied separately in classification platforms based on Support Vector Machine (SVM) and Hidden Markov Model (HMM).  ...  Introduction It is very natural for human beings to write a document consisting of textual and non-textual information.  ... 
doi:10.1007/s12046-019-1159-0 fatcat:zbq2zh4mzbd7dbpadttr7plufe

Towards Automatic Service Level Agreements Information Extraction

Lucia De Marco, Filomena Ferrucci, M-Tahar Kechadi, Gennaro Napoli, Pasquale Salza
2016 Proceedings of the 6th International Conference on Cloud Computing and Services Science  
Information systems and computing capabilities are delivered through the Internet in the form of services; they are regulated by a Service Level Agreement (SLA) contract co-signed by a generic Application  ...  Some work in literature about these facilities rely on a structured language representation of SLAs in order to make them machine-readable.  ...  In every run, the training set is composed of 35 SLAs documents and the test set of 1.  ... 
doi:10.5220/0005873100590066 dblp:conf/closer/MarcoFKNS16 fatcat:62zwqlizm5f2pmxjgzycypvjku

Selecting text features for gene name classification

Goran Nenadić, Simon Rice, Irena Spasić, Sophia Ananiadou, Benjamin Stapley
2003 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine -  
Classification features range from words, lemmas and stems, to automatically extracted terms. Also, simple co-occurrences of genes within documents are considered.  ...  The preliminary experiments performed on a set of 3,000 S. cerevisiae gene names and 53,000 Medline abstracts have shown that using domain-specific terms can improve the performance compared to the standard  ...  These documents have been treated as a single virtual document pertinent to the given gene. All words co-occurring with a given gene in any of the abstracts were used as its features.  ... 
doi:10.3115/1118958.1118974 dblp:conf/bionlp/NenadicRSAS03 fatcat:ms2kbsxh55cohonxyvwbmbhsri
« Previous Showing results 1 — 15 out of 39,664 results