Filters








1,302 Hits in 6.4 sec

Text Mining and Protein Annotations

Martin Krallinger, Rainer Malik, Alfonso Valencia
2006 Genome Informatics Series  
We introduce the Protein description sentence (Prodisen) corpus, a useful resource for the automatic identification and construction of text-based protein and gene description records using information  ...  Basic guidelines and criteria relevant for the construction of a text corpus of functional descriptions of genes and proteins are proposed.  ...  Type 1: bag of word approach after stop word removal; type 2: bag of word after stop word removal, stemming and case conversion; type 3: word-POS pairs, type 4: only stemmed words previously labeled as  ... 
doi:10.11234/gi1990.17.2_121 fatcat:bjmblz4kmnezzcsatcd2lmzdci

Mining Protein Interactions from Text Using Convolution Kernels [chapter]

Ramanathan Narayanan, Sanchit Misra, Simon Lin, Alok Choudhary
2010 Lecture Notes in Computer Science  
As the sizes of biomedical literature databases increase, there is an urgent need to develop intelligent systems that automatically discover Protein-Protein interactions from text.  ...  In this paper, we describe a scalable hierarchical Support Vector Machine(SVM) based framework to efficiently mine protein interactions with high precision.  ...  Contextual features: For boundary identification, we use neighboring words and the POS of neighboring words as contextual features.  ... 
doi:10.1007/978-3-642-14640-4_9 fatcat:7eo7uky5ivee5htr5rgvbdy5ka

Text mining and protein annotations: the construction and use of protein description sentences

Martin Krallinger, Rainer Malik, Alfonso Valencia
2006 Genome Informatics Series  
We introduce the Protein description sentence (Prodisen) corpus, a useful resource for the automatic identification and construction of text-based protein and gene description records using information  ...  Basic guidelines and criteria relevant for the construction of a text corpus of functional descriptions of genes and proteins are proposed.  ...  Type 1: bag of word approach after stop word removal; type 2: bag of word after stop word removal, stemming and case conversion; type 3: word-POS pairs, type 4: only stemmed words previously labeled  ... 
pmid:17503385 fatcat:6uuve2sddjfhrgbfxeqzqgikaq

Introducing meta-services for biomedical information extraction

Florian Leitner, Martin Krallinger, Carlos Rodriguez-Penagos, Jörg Hakenberg, Conrad Plake, Cheng-Ju Kuo, Chun-Nan Hsu, Richard Tsai, Hsi-Chuan Hung, William W Lau, Calvin A Johnson, Rune Sætre (+20 others)
2008 Genome Biology  
Annotation types cover gene names, gene IDs, species, and protein-protein interactions. The annotations are distributed by the meta-server in both human and machine readable formats (HTML/XML).  ...  This prototype platform is a joint effort of 13 research groups and provides automatically generated annotations for PubMed/Medline abstracts.  ...  Acknowledgements We should like to thank a number of bioinformaticians who have discussed with us the need for this type of system for their own research, in particular Jaak Vilo and Ewan Birney.  ... 
doi:10.1186/gb-2008-9-s2-s6 pmid:18834497 pmcid:PMC2559990 fatcat:vxtilyugbza73lep66bvclyjem

BioCreative II Workshop Proceedings

Lynette, Martin, Alfonso
2007 Zenodo  
(IAS) 41 Assessment of the Second BioCreative PPI task: Automatic Extraction of Protein-Protein Interactions 55 Annotating molecular interactions in the MINT database 61 IntAct - Serving the text-mining  ...  Gene Mention Task 17 Overview of BioCreative II Gene Normalization 29 Evaluating the Detection and Ranking of Protein Interaction relevant Articles: the BioCreative Challenge Interaction Article Sub-task  ...  Acknowledgements We would like to thank Santiago Schnell for graciously providing us with additional proteomics related articles not containing protein-protein interaction information.  ... 
doi:10.5281/zenodo.4274543 fatcat:3sa3fvgngffjrblxzgswof42tq

Protein Named Entity Identification Based on Probabilistic Features Derived from GENIA Corpus and Medical Text on the Web

Sagara Sumathipala, Koichi Yamada, Muneyuki Unehara, Izumi Suzuki
2015 International Journal of Fuzzy Logic and Intelligent Systems  
In this paper, we explore the use of abstracts of biomedical literature in MEDLINE for protein name identification and present the results of the conducted experiments.  ...  Protein named entity identification is one of the most essential and fundamental predecessor for extracting information about protein-protein interactions from biomedical literature.  ...  The conditional probability that W appears in a MEDLINE abstract with the word "Protein", or equivalently the likelihood that the word "Protein" appears in a MEDLINE abstract with W is given by: P ML (  ... 
doi:10.5391/ijfis.2015.15.2.111 fatcat:bjl3pfodv5aptdbrbmf57lpb3y

Automating curation using a natural language processing pipeline

Beatrice Alex, Claire Grover, Barry Haddow, Mijail Kabadjov, Ewan Klein, Michael Matthews, Richard Tobin, Xinglong Wang
2008 Genome Biology  
as detection and normalization of interacting protein pairs, are still challenging for NLP systems.  ...  Results: Our system was among the highest performing on the interaction subtasks, and competitive performance on the gene mention task was achieved with minimal development effort.  ...  Acknowledgements The TXM pipeline on which this system is based was developed as part of a joint project with Cognia EU [33] , supported by the Text Mining Programme of ITI Life Sciences Scotland [34  ... 
doi:10.1186/gb-2008-9-s2-s10 pmid:18834488 pmcid:PMC2559981 fatcat:asmlwci6kze55j3m64gpicdzui

Natural language processing in text mining for structural modeling of protein complexes

Varsha D. Badal, Petras J. Kundrotas, Ilya A. Vakser
2018 BMC Bioinformatics  
The results showed that the keyword dictionaries designed for identification of protein interactions are not adequate for the TM prediction of the binding mode.  ...  Structural modeling of protein-protein interactions produces a large number of putative configurations of the protein complexes.  ...  protein interactions.  ... 
doi:10.1186/s12859-018-2079-4 pmid:29506465 pmcid:PMC5838950 fatcat:pc6vxv43uffn7nxo4c2pcpt7zq

PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites

D. Cheng, C. Knox, N. Young, P. Stothard, S. Damaraju, D. S. Wishart
2008 Nucleic Acids Research  
PolySearch's performance has been assessed in tasks such as gene synonym identification, proteinprotein interaction identification and disease gene identification using a variety of manually assembled  ...  PolySearch supports'50 different classes of queries against nearly a dozen different types of text, scientific abstract or bioinformatic databases.  ...  Funding for this project was provided by the Protein Engineering Network of Centres of Excellence (PENCE), The Alberta Cancer Foundation, NSERC and Genome Prairie (a division of Genome Canada).  ... 
doi:10.1093/nar/gkn296 pmid:18487273 pmcid:PMC2447794 fatcat:4nqjvxjlejdmjlrwrnuesj3voe

Simple and efficient machine learning frameworks for identifying protein-protein interaction relevant articles and experimental methods used to study the interactions

Shashank Agarwal, Feifan Liu, Hong Yu
2011 BMC Bioinformatics  
Protein-protein interaction (PPI) is an important biomedical phenomenon.  ...  One performs binary classification to determine whether the given article is PPI relevant or not, named "Simple Classifier", and the other one maps the PPI relevant articles with corresponding interaction  ...  This article has been published as part of BMC Bioinformatics Volume 12 Supplement 8, 2011: The Third BioCreative -Critical Assessment of Information Extraction in Biology Challenge.  ... 
doi:10.1186/1471-2105-12-s8-s10 pmid:22151701 pmcid:PMC3269933 fatcat:x7id4qy3ybbehal3h7afazhx4y

Text Mining for Protein Docking

Varsha D. Badal, Petras J. Kundrotas, Ilya A. Vakser, Nir Ben-Tal
2015 PLoS Computational Biology  
The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach.  ...  Our procedure retrieves published abstracts on a protein-protein interaction and extracts the relevant information.  ...  There are different TM tools for identification of interacting proteins from biological literature and databases [25] .  ... 
doi:10.1371/journal.pcbi.1004630 pmid:26650466 pmcid:PMC4674139 fatcat:5usgubv7uzho7iapj4ha26ge7q

BioPPISVMExtractor: A protein–protein interaction extractor for biomedical literature using SVM and rich feature sets

Zhihao Yang, Hongfei Lin, Yanpeng Li
2010 Journal of Biomedical Informatics  
Protein-protein interactions play a key role in various aspects of the structural and functional organization of the cell.  ...  However, the amount of biomedical literature regarding protein interactions is increasing rapidly and it is difficult for interaction database curators to detect and curate protein interaction information  ...  Acknowledgments This work is supported by grant from the Natural Science Foundation of China (No. 60373095 and 60673039) and the National High Tech Research and Development Plan of China (2006AA01Z151)  ... 
doi:10.1016/j.jbi.2009.08.013 pmid:19706337 fatcat:rwb6ukl535dozaxtp3asbfrjna

Boosting precision and recall of dictionary-based protein name recognition

Yoshimasa Tsuruoka, Jun'ichi Tsujii
2003 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine -  
Dictionary-based protein name recognition is the first step for practical information extraction from biomedical documents because it provides ID information of recognized terms unlike machine learning  ...  Experimental results using the GE-NIA corpus show that the filtering using a naive Bayes classifier greatly improves precision with slight loss of recall, resulting in a much better F-score.  ...  W middle : the other words of the term without positional information (bag-of-words).  ... 
doi:10.3115/1118958.1118964 dblp:conf/bionlp/TsuruokaT03 fatcat:pvfq2b7f5jckvd6xc3b3th4m4a

Multimodal deep representation learning for protein interaction identification and protein family classification

Da Zhang, Mansur Kabuka
2019 BMC Bioinformatics  
In our paper, we construct a stacked auto-encoder architecture together with a continuous bag-of-words (CBOW) model based on generated metapaths to study the PPI predictions.  ...  However, compared to the protein sequences obtainable from various species and organisms, the number of revealed protein-protein interactions is relatively limited.  ...  About this supplement This article has been published as part of BMC Bioinformatics Volume 20 Supplement  ... 
doi:10.1186/s12859-019-3084-y pmid:31787089 pmcid:PMC6886253 fatcat:pt5nwss7u5h7restbb6ci4r2r4

Named Entity Recognition and Relation Detection for Biomedical Information Extraction

Nadeesha Perera, Matthias Dehmer, Frank Emmert-Streib
2020 Frontiers in Cell and Developmental Biology  
In this paper, we review practices for Named Entity Recognition (NER) and Relation Detection (RD), allowing, e.g., to identify interactions between proteins and drugs or genes and diseases.  ...  Since there is currently no automatic archiving of the obtained results, much of this information remains buried in textual details not readily available for further usage or analysis.  ...  For instance, identifying the interactions of proteins allows the construction of protein-protein interaction networks.  ... 
doi:10.3389/fcell.2020.00673 pmid:32984300 pmcid:PMC7485218 fatcat:khclwjfykjfi3jktvrbuliwidm
« Previous Showing results 1 — 15 out of 1,302 results