3,512 Hits in 9.1 sec

Toward Computer-Assisted Text Curation: Classification Is Easy (Choosing Training Data Can Be Hard...) [chapter]

Robert Denroche, Ramana Madupu, Shibu Yooseph, Granger Sutton, Hagit Shatkay
2010 Lecture Notes in Computer Science  
We trained two classifiers using small datasets labeled by CHAR curators, and another classifier based on a much larger dataset using annotations from public databases.  ...  We describe the datasets, the classification method, and discuss the unexpected results.  ...  While the actual evidence is most likely to be found in the full text, it is important to note that the coarser task of just determining relevance can be performed (in most cases) using the title and the  ... 
doi:10.1007/978-3-642-13131-8_5 fatcat:xpbkebekqrbcravknfndiyr5ka

Curating Social Media Data [article]

Kushal Vaghani
2020 arXiv   pre-print
A key requirement is to curate the raw data before fed into analytics pipelines. This curation process transforms the raw data into contextualized data and knowledge.  ...  We propose a data curation pipeline, namely CrowdCorrect, to enable analysts cleansing and curating social data and preparing it for reliable analytics.  ...  This data can be classified as open data, since it is available publicly and can be queried [75] .  ... 
arXiv:2002.09202v1 fatcat:5w2coglezfc4jlnu4h6oh23oqm

Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases [article]

Gerhard Weikum, Luna Dong, Simon Razniewski, Fabian Suchanek
2021 arXiv   pre-print
This machine knowledge can be harnessed to semantically interpret textual phrases in news, social media and web tables, and contributes to question answering, natural language processing and data analytics  ...  This article surveys fundamental concepts and practical methods for creating and curating large knowledge bases.  ...  It is a great pleasure and honor to have such wonderful colleagues in our research community.  ... 
arXiv:2009.11564v2 fatcat:vh2lqfmhhbcwpf6dcsej3hhvgy

OUP accepted manuscript

2018 Database: The Journal of Biological Databases and Curation  
being proposed to bring some level of automation in the process of ontology acquisition from unstructured text.  ...  Exponential increase in unstructured data on the web has made automated acquisition of ontology from unstructured text a most prominent research area.  ...  The acquired textual data is exploited as a source of input to a fine text classification model, which is trained by utilizing various standard machine learning methodologies.  ... 
doi:10.1093/database/bay101 fatcat:krqzvilqpndufj36fj4wsin32i

System Architecture and Intelligent Data Curation of Virtual Museum for Ancient History

Desislava Paneva-Marinova, Stoikov Stoikov, Lilia Pavlova, Detelin Luchev
2019 Труды СПИИРАН  
This paper proposes a solution for intelligent data curation that can be implemented in a virtual museum in order to provide opportunity to observe the valuable historical specimens in a proper way.  ...  These systems suffer from the lack of tools for intelligent data curation with the capacity to validate data from different sources and to add value to data.  ...  It is easy for a well-qualified human to determine if such a cluster refers to the same entity, but it is hard for a machine to conduct this judgement.  ... 
doi:10.15622/sp.18.2.444-470 fatcat:b5jalohgrfgnlemq4zmxuw5fba

Curated databases and their role in clinical bioinformatics

C.C. Englbrecht, M. Han, M.T. Mader, A. Osanger, K.F.X. Mayer
2004 IMIA Yearbook of Medical Informatics  
Similar studies and sometimes even completely distinct studies can, for example, improve the classification accuracy due to an increase in the training data set's size.  ...  This demanding task can only be fulfilled with the help of excellent, curated databases.  ... 
doi:10.1055/s-0038-1638188 fatcat:ziahknl6kjdu7jz5oabyvtvbui

Curating and contextualizing Twitter stories to assist with social newsgathering

Arkaitz Zubiaga, Heng Ji, Kevin Knight
2013 Proceedings of the 2013 international conference on Intelligent user interfaces - IUI '13  
While journalism is evolving toward a rather open-minded participatory paradigm, social media presents overwhelming streams of data that make it difficult to identify the information of a journalist's  ...  This tool was built with the aim of assisting journalists both with gathering and with researching news stories as users comment on them.  ...  Acknowledgments Thanks to Daniel Marcu of SDL for assistance with translations, and to Nicholas Diakopoulos for discussion.  ... 
doi:10.1145/2449396.2449424 dblp:conf/iui/ZubiagaJK13 fatcat:sxdwxgr6cjfodgbdfzv4b6innm

A beginner's guide to manual curation of transposable elements

Clement Goubert, Rory J. Craig, Agustin F. Bilat, Valentina Peona, Aaron A. Vogan, Anna V. Protasio
2022 Mobile DNA  
Results Our manuscript attempts to fill this gap by providing a set of detailed computer protocols, software recommendations and video tutorials for those aiming to manually curate TEs.  ...  This know-how is often passed on from mentor-to-mentee within research groups, making it difficult for those outside the field to access this highly specialised skill.  ...  We hope that these guidelines can offer a starting point for those interested in investigating TE biology.  ... 
doi:10.1186/s13100-021-00259-7 pmid:35354491 pmcid:PMC8969392 fatcat:6bkxngp6yvbr7p46qwvfrhbeke

Measurement Recorder: developing a useful tool for making species descriptions that produces computable phenotypes

Hong Cui, Limin Zhang, Bruce Ford, Hsin-Liang Cheng, James A Macklin, Anton Reznicek, Julian Starr
2020 Database: The Journal of Biological Databases and Curation  
Results suggest that participants can use Measurement Recorder without training and they find it easy to use after limited practice. Participants also appreciate the semantic enhancement features.  ...  This postpublication curation process is not only slow and costly, it is also burdened with significant intercurator variation (including curator-author variation), due to different interpretations of  ...  While any structured data is machine-actionable, by 'computable', we mean data that are unambiguously defined, can be algorithmically compared, and can be used in computational analyses in a meaningful  ... 
doi:10.1093/database/baaa079 pmid:33216896 pmcid:PMC7678789 fatcat:5vwrfvjovrelzjr3llmijbie2m

Text mining resources for the life sciences

Piotr Przybyła, Matthew Shardlow, Sophie Aubin, Robert Bossy, Richard Eckart de Castilho, Stelios Piperidis, John McNaught, Sophia Ananiadou
2016 Database: The Journal of Biological Databases and Curation  
management systems that can be used to rapidly configure and compare domain-and task-specific processes, via access to a wide range of pre-built tools.  ...  Text mining resources for the life sciences. Abstract Text mining is a powerful technology for quickly distilling key information from vast quantities of biomedical literature.  ...  Funding This work is jointly supported by the EC/H2020 project: an Open Mining INfrastructure for TExt and Data (OpenMinTeD) Grant ID: 654021 and the BBSRC project: Enriching Metabolic PATHwaY models with  ... 
doi:10.1093/database/baw145 pmid:27888231 pmcid:PMC5199186 fatcat:nb5eby565jdcrho63yporkcjaa

Analysis of CASP8 targets, predictions and assessment methods

S. Shi, J. Pei, R. I. Sadreyev, L. N. Kinch, I. Majumdar, J. Tong, H. Cheng, B.-H. Kim, N. V. Grishin
2009 Database: The Journal of Biological Databases and Curation  
Finally, these data can serve as grounds to develop and analyze methods for assessing prediction quality. Here we present results of our analysis in these areas.  ...  expert analysis, including invaluable analysis prior to target structure release; and (ii) develop an assessment methodology tailored towards current challenges in the field.  ...  These major clusters can be used for evaluation, as they demonstrate the data set splitting naturally into 'hard' and 'easy'.  ... 
doi:10.1093/database/bap003 pmid:20157476 pmcid:PMC2794793 fatcat:5isyzutujrg2bikw6ffgpt64rq

Advanced Curation of Astromaterials for Planetary Science

Francis M. McCubbin, Christopher D. K. Herd, Toru Yada, Aurore Hutzler, Michael J. Calaway, Judith H. Allton, Cari M. Corrigan, Marc D. Fries, Andrea D. Harrington, Timothy J. McCoy, Julie L. Mitchell, Aaron B. Regberg (+5 others)
2019 Space Science Reviews  
the degree of precision that can be expected of those answers.  ...  Astromaterials acquisition and curation practices have direct consequences on the contamination levels of astromaterials and hence the types of questions that can be answered about our solar system and  ...  This work is dedicated to the women and men that have worked tirelessly to make sample return missions possible, to the countless astromaterials curation personnel that have processed and cared for astromaterials  ... 
doi:10.1007/s11214-019-0615-9 fatcat:hm4qeez4n5gljelby2klat7vny

FlyClockbase: Importance of Biological Model Curation for Analyzing Variability in the Circadian Clock of Drosophila melanogaster by Integrating Time Series from 25 Years of Research [article]

Katherine S. Scheuer, Bret Hanlon, Jerdon W. Dresel, Erik D. Nolan, John C. Davis, Laurence Loewe
2017 bioRxiv   pre-print
We developed a trans-disciplinary workflow, which demonstrates the importance of developing compilers for VBIRs with a more biology-friendly logic that is likely to greatly simplify biological model curation  ...  We found that very few computational models test their quality directly against experimentally observed time series scattered in the literature.  ...  primary data 37 is lost from hard drives).  ... 
doi:10.1101/099192 fatcat:446gqznfofdo3pyuzgvst7bolm

Overview of the Ninth Annual Meeting of the BioLINK SIG at ISMB: Linking Literature, Information and Knowledge for Biology [chapter]

Christian Blaschke, Lynette Hirschman, Hagit Shatkay, Alfonso Valencia
2010 Lecture Notes in Computer Science  
The broad area of biomedical text mining is concerned with using methods from natural language processing, information extraction, information retrieval and summarization to automate knowledge discovery  ...  from biomedical text.  ...  Toward Computer-Assisted Text Curation: Classification is Easy (Choosing Training Data can be Hard…) by Robert Denroche, Ramana Madupu, Shibu Yooseph, Granger Sutton and Hagit Shatkay.  ... 
doi:10.1007/978-3-642-13131-8_1 fatcat:42nm5vxcdnafjkogcz4wcoasoa

An Overview of BioCreative II.5

F Leitner, S A Mardis, M Krallinger, G Cesareni, L A Hirschman, A Valencia
2010 IEEE/ACM Transactions on Computational Biology & Bioinformatics  
The BioCreative II.5 challenge evaluated automatic annotations from 15 text mining teams based on a gold standard created by reconciling annotations from curators, authors, and automated systems.  ...  For article classification, the best AUC iP/R was 0.70; for interacting proteins, the best system achieved good macroaveraged recall (0.73) and interpolated area under the precision/recall curve (0.58)  ...  The first was trained using the runs provided on the training data, and tested against the test data gold standard. We computed the F-measure at ¼ 10 at a cutoff of 30.  ... 
doi:10.1109/tcbb.2010.61 pmid:20704011 fatcat:em2jhe322fb2hb65qwewzhfiee
« Previous Showing results 1 — 15 out of 3,512 results