Filters








884 Hits in 4.4 sec

Finding themes in Medline documents - probabilistic similarity search

H. Shatkey, W.J. Wilber
2000 Proceedings IEEE Advances in Digital Libraries 2000 ADL-00  
We present a new theme-based probabilistic approach for finding documents relevant to a given query document, and summarizing their contents.  ...  Retrieval of documents similar to a userprovided example document is a promising query paradigm towards meeting this goal.  ...  Acknowledgments We thank Carolina Nadel, M.D, for her help in the interpretation of the AIDS-related results, and Luis Ortiz for his advice on EM issues.  ... 
doi:10.1109/adl.2000.848381 dblp:conf/adl/ShatkayW00 fatcat:hlbqw3nmo5hupofxrjygq25j6q

Thematic clustering of text documents using an EM-based approach

Sun Kim, W Wilbur
2012 Journal of Biomedical Semantics  
The main focus of the thematic clustering algorithm is to find a text description, i.e., keywords, of the subjects that occur in a document set. In this regard, finding clusters  ...  This can provide condensed text information from similar documents in a large repository.  ...  In our analysis, we find that terms in News-Similar-3 are not distinctive enough to identify clusters.  ... 
doi:10.1186/2041-1480-3-s3-s6 pmid:23046528 pmcid:PMC3465205 fatcat:kfgdezmzibhtdno6tbiytl6q3e

A Hybrid Document Features Extraction with Clustering based Classification Framework on Large Document Sets

S Anjali Devi, S Siva
2020 International Journal of Advanced Computer Science and Applications  
In this work, a hybrid document clustering similarity index is optimized to find the essential key document clusters based on the contextual keywords.  ...  In the proposed work, a hybrid glove feature selection model is proposed to improve the contextual similarity of the keywords in the large document corpus.  ...  In the contextual similarity measure, the similarity between the glove features are evaluated to find the contextual phrases in the biomedical or any textual document sets.  ... 
doi:10.14569/ijacsa.2020.0110748 fatcat:n4syo3e2mfdjtakvg6jmz2qswe

Identifying important concepts from medical documents

Quanzhi Li, Yi-Fang Brook Wu
2006 Journal of Biomedical Informatics  
The latter assigns weights to extracted noun phrases for a medical document based on how important they are to that document and how domain specific they are in the medical domain.  ...  The experimental results show that our noun phrase extractor is effective in identifying noun phrases from medical documents, so is the keyphrase extractor in identifying important medical conceptual terms  ...  Then they search the UMLS Metathesaurus to find the matches between the identified concepts and the Metathausurus entries. The matched concepts are used as indexing terms.  ... 
doi:10.1016/j.jbi.2006.02.001 pmid:16545986 fatcat:mysdq64h55hylduaihta3lgcni

A Document Clustering and Ranking System for Exploring MEDLINE Citations

Y. Lin, W. Li, K. Chen, Y. Liu
2007 JAMIA Journal of the American Medical Informatics Association  
Design: A text mining system framework for automatic document clustering and ranking organized MEDLINE citations following simple PubMed queries.  ...  The system grouped the retrieved citations, ranked the citations in each cluster, and generated a set of keywords and MeSH terms to describe the common theme of each cluster.  ...  related papers and another researcher interested in finding the latest cancer treatments might issue a MEDLINE query "Breast cancer."  ... 
doi:10.1197/jamia.m2215 pmid:17600104 pmcid:PMC1975797 fatcat:zw5l34xjz5fajkgqfhxdiuzgqm

Substring selection for biomedical document classification

B. Han, Z. Obradovic, Z.-Z. Hu, C. H. Wu, S. Vucetic
2006 Bioinformatics  
Motivation: Attribute selection is a critical step in development of document classification systems.  ...  This can lead to accuracy reduction, especially when the number of labeled documents is small.  ...  ACKNOWLEDGEMENTS This project is funded, in part, under a grant with the Pennsylvania Department of Health.  ... 
doi:10.1093/bioinformatics/btl350 pmid:16837530 fatcat:lyfgf7dbr5hjrgf57xerityekm

Substring selection for biomedical document classification

Slobodan Vucetic
2006 Proceedings of the 1st international workshop on Text mining in bioinformatics - TMBIO '06  
Motivation: Attribute selection is a critical step in development of document classification systems.  ...  This can lead to accuracy reduction, especially when the number of labeled documents is small.  ...  ACKNOWLEDGEMENTS This project is funded, in part, under a grant with the Pennsylvania Department of Health.  ... 
doi:10.1145/1183535.1183537 fatcat:c2xgm2f4u5dy3ia6ga2ls3uvse

A functionality taxonomy for document search engines [article]

Rik D.T. Janssen, Henderik A. Proper
2021 arXiv   pre-print
In this paper a functionality taxonomy for document search engines is proposed.  ...  We use the word 'search engine' in the broadest sense possible, including library and web based (meta) search engines.  ...  The taxonomy in this paper may also be viewed as the starting point of an architecture for an open and standardized search infrastructure.  ... 
arXiv:2105.12989v1 fatcat:5y5axtbayrdl3cm6nzerry6pwm

An Overview of Biomolecular Event Extraction from Scientific Documents

Jorge A. Vanegas, Sérgio Matos, Fabio González, José L. Oliveira
2015 Computational and Mathematical Methods in Medicine  
Many strategies have been proposed in the last decade, originating from different research areas such as natural language processing, machine learning, and statistics.  ...  However, automatic event extraction is a challenging task due to the ambiguity and diversity of natural language and higher-level linguistic phenomena, such as speculations and negations, which occur in  ...  tuned for extracting information from similar documents but could become unusable in a slightly different domain.  ... 
doi:10.1155/2015/571381 pmid:26587051 pmcid:PMC4637451 fatcat:7vrwycaycvb3nbgbub4v57y2xa

Text mining without document context

Eric SanJuan, Fidelia Ibekwe-SanJuan
2006 Information Processing & Management  
We consider a challenging clustering task: the clustering of muti-word terms without document co-occurrence information in order to form coherent groups of topics.  ...  Our clustering algorithm, named CPCL is implemented in the TermWatch system. We compared CPCL to other existing clustering algorithms, namely hierarchical and partitioning (k-means, k-medoids).  ...  The GENIA project 2 consists of 2, 000 abstracts downloaded from the MEDLINE database using the search keywords: Human, Blood Cells, and Transcription Factors.  ... 
doi:10.1016/j.ipm.2006.03.017 fatcat:nawx3ve44bfwzi23azu74ncm5y

Survey of Scientific Document Summarization Techniques

Sheena Kurian K, Sheena Mathew
2020 Computer Science  
The number of scientic or research papers published every year is growing at an exponential rate, which has led to an intensive research in scientic document summarization.  ...  The different methods commonly used in automatic text summarization are discussed in this paper with their pros and cons.  ...  Clustering-based Approaches Similar sentences are clustered together to identify important themes in a document; representative sentences are then selected from each of these to form a summary [14] .  ... 
doi:10.7494/csci.2020.21.2.3356 fatcat:mbklxkl5dfaq7ax452p3bptluq

Robust Argumentative Zoning for Sensemaking in Scholarly Documents [chapter]

Simone Teufel, Min-Yen Kan
2011 Lecture Notes in Computer Science  
We perform an in-depth study of our system both with clean and noisy inputs.  ...  We also give preliminary results from in situ acceptability testing when the classifier is embedded within a digital library reading environment.  ...  [16] use a similar Maximum Entropy approach to AZ which uses unigrams, bigrams and Viterbi search over the category history as its main features.  ... 
doi:10.1007/978-3-642-23160-5_10 fatcat:el2ksu6gbrblll6mldahfdtax4

Effective focused retrieval by exploiting query context and document structure

Rianne Kaptein
2012 SIGIR Forum  
., 2003) is similar to probabilistic la-tent semantic indexing, but the topic distribution is assumed to have a Dirichlet prior resulting in a better mixture of topics in a document.  ...  A similar approach where also Google is used to find entity homepage as well as to return candidate documents to extract entities from is applied in (Wang et al., 2010) .  ...  We kunnen ook gebruik maken van de structuur van Wikipedia om entiteiten te vinden in het algemene Web door het volgen van externe links en door het zoeken van entiteiten gevonden in Wikipedia in een algemene  ... 
doi:10.1145/2093346.2093366 fatcat:f6n2nplok5f7pd3ijcrghrkg2y

Extracting information from textual documents in the electronic health record: a review of recent research

S M Meystre, G K Savova, K C Kipper-Schuler, J F Hurdle
2008 IMIA Yearbook of Medical Informatics  
We examine recent published research on the extraction of information from textual documents in the Electronic Health Record (EHR).  ...  174 publications were selected and are discussed in this review in terms of methods used, pre-processing of textual documents, contextual features detection and analysis, extraction of information in general  ...  Chapman for her help in reviewing this paper.  ... 
pmid:18660887 fatcat:ckd5m65lefarfcgtvzumblyd5u

Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research

G. K. Savova, K. C. Kipper-Schuler, J. F. Hurdle, S. M. Meystre
2008 IMIA Yearbook of Medical Informatics  
Objectives We examine recent published research on the extraction of information from textual documents in the Electronic Health Record (EHR).  ...  Results 174 publications were selected and are discussed in this review in terms of methods used, pre-processing of textual documents, contextual features detection and analysis, extraction of information  ...  Chapman for her help in reviewing this paper.  ... 
doi:10.1055/s-0038-1638592 fatcat:pwgvfjuubvcedm46ubc36leg7m
« Previous Showing results 1 — 15 out of 884 results