178 Hits in 4.7 sec

Identification of Tasks, Datasets, Evaluation Metrics, and Numeric Scores for Scientific Leaderboards Construction

Yufang Hou, Charles Jochim, Martin Gleize, Francesca Bonin, Debasis Ganguly
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics  
In this paper we build two datasets and develop a framework (TDMS-IE) aimed at automatically extracting task, dataset, metric and score from NLP papers, towards the automatic construction of leaderboards  ...  While the fast-paced inception of novel tasks and new datasets helps foster active research in a community towards interesting directions, keeping track of the abundance of research activity in different  ...  Dataset Construction We create two datasets for testing our approach for task, dataset, metric, and score (TDMS) identification.  ... 
doi:10.18653/v1/p19-1513 dblp:conf/acl/HouJGBG19 fatcat:h2mpyifv5vanpa7cyjo2bhjvpm

Automated Mining of Leaderboards for Empirical AI Research [article]

Salomon Kabongo, Jennifer D'Souza, Sören Auer
2021 arXiv   pre-print
Our analysis reveals an optimal approach that significantly outperforms existing baselines for the task with evaluation scores above 90% in F1.  ...  Specifically, we investigate the problem of automated Leaderboard construction using state-of-the-art transformer models, viz. Bert, SciBert, and XLNet.  ...  Acknowledgements This work was co-funded by the Federal Ministry of Education and Research (BMBF) of Germany for the project LeibnizKILabor (grant no. 01DD20003) and by the European Research Council for  ... 
arXiv:2109.13089v1 fatcat:7wc55fqho5cwjblnoxsg46i4wq

Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development [article]

Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, Marinka Zitnik
2021 arXiv   pre-print
However, advancement in this field requires formulation of meaningful learning tasks and careful curation of datasets.  ...  To date, TDC includes 66 AI-ready datasets spread across 22 learning tasks and spanning the discovery and development of safe and effective medicines.  ...  For each dataset, we provide a data description and statistics, together with the recommended dataset splits and evaluation metrics and units in the case of numeric labels.  ... 
arXiv:2102.09548v2 fatcat:i5f5vrbaxnehhmhqiuwkkx2s6y

Metric-Type Identification for Multi-Level Header Numerical Tables in Scientific Papers [article]

Lya Hulliyyatus Suadaa, Hidetaka Kamigaito, Manabu Okumura, Hiroya Takamura
2021 arXiv   pre-print
We introduce a new information extraction task, metric-type identification from multi-level header numerical tables, and provide a dataset extracted from scientific papers consisting of header tables,  ...  Numerical tables are widely used to present experimental results in scientific papers. For table understanding, a metric-type is essential to discriminate numbers in the tables.  ...  We thank the anonymous reviewers for helpful discussion of this work and comments on previous drafts of the paper.  ... 
arXiv:2102.00819v1 fatcat:5psekqlys5c7hayhifvrpy55t4

DataPerf: Benchmarks for Data-Centric AI Development [article]

Mark Mazumder, Colby Banbury, Xiaozhe Yao, Bojan Karlaš, William Gaviria Rojas, Sudnya Diamos, Greg Diamos, Lynn He, Douwe Kiela, David Jurado, David Kanter, Rafael Mosquera (+24 others)
2022 arXiv   pre-print
Machine learning (ML) research has generally focused on models, while the most prominent datasets have been employed for everyday ML tasks without regard for the breadth, difficulty, and faithfulness of  ...  To solve this problem, we present DataPerf, a benchmark package for evaluating ML datasets and dataset-working algorithms.  ...  DataPerf is a scientific instrument to systematically measure the quality of training and test datasets on a variety of ML tasks and to measure the quality of algorithms for constructing such datasets.  ... 
arXiv:2207.10062v1 fatcat:n7rahesyprfcji4ml6webxr7q4

Metric-Type Identification for Multilevel Header Numerical Tables in Scientific Papers

Lya Hulliyyatus Suadaa, Hidetaka Kamigaito, Manabu Okumura, Hiroya Takamura
2021 Journal of Natural Language Processing  
Herein, we introduce a new information extraction task, i.e., metric-type identification from multilevel header numerical tables, and provide a dataset extracted from scientific papers comprising header  ...  Numerical tables are widely used to present experimental results in scientific papers. For table understanding, a metric-type is essential to discriminate numbers in the tables.  ...  We thank the anonymous reviewers for their discussions pertaining to this study and their comments on previous drafts of the paper.  ... 
doi:10.5715/jnlp.28.1247 fatcat:nh3uebxhgndqrddyjrhuojb3z4

Computer Science Named Entity Recognition in the Open Research Knowledge Graph [article]

Jennifer D'Souza, Sören Auer
2022 arXiv   pre-print
This work proposes a standardized task by defining a set of seven contribution-centric scholarly entities for CS NER viz., research problem, solution, resource, language, tool, method, and dataset.  ...  Currently, progress on CS NER -- the focus of this work -- is hampered in part by its recency and the lack of a standardized annotation aim for scientific entities/terms.  ...  They release a public download dump of crowdsourced leaderboards in scholarly articles on research problems in AI annotated w.r.t. task, dataset, metric, score, and method entities.  ... 
arXiv:2203.14579v1 fatcat:fzq37ng56zhovomjnorusvfg3i

Assessment of network module identification across complex diseases

Sarvenaz Choobdar, The DREAM Module Identification Challenge Consortium, Mehmet E. Ahsen, Jake Crawford, Mattia Tomasoni, Tao Fang, David Lamparter, Junyuan Lin, Benjamin Hescott, Xiaozhe Hu, Johnathan Mercer, Ted Natoli (+11 others)
2019 Nature Methods  
Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies.  ...  This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology.  ...  The computations were performed at the Vital-IT ( Center for high-performance computing of the SIB Swiss Institute of Bioinformatics.  ... 
doi:10.1038/s41592-019-0509-5 pmid:31471613 pmcid:PMC6719725 fatcat:rledffj7c5hnnaqamxleb2dnpy

UPB at SemEval-2020 Task 6: Pretrained Language Models for Definition Extraction [article]

Andrei-Marius Avram, Dumitru-Clementin Cercel, Costin-Gabriel Chiru
2020 arXiv   pre-print
Our best performing model evaluated on the DeftEval dataset obtains the 32nd place for the first subtask and the 37th place for the second subtask.  ...  We also explore a multi-task architecture that was trained to jointly predict the outputs for the second and the third subtasks.  ...  Table 2 reports the evaluation metrics, Macro-Precision, Recall, and F1-scores, respectively, on both development and test datasets.  ... 
arXiv:2009.05603v2 fatcat:qtiyvmikejbktemuospr5h462e

Representing Numbers in NLP: a Survey and a Vision [article]

Avijit Thawani, Jay Pujara, Pedro A. Szekely, Filip Ilievski
2021 arXiv   pre-print
We synthesize best practices for representing numbers in text and articulate a vision for holistic numeracy in NLP, comprised of design trade-offs and a unified evaluation.  ...  We arrange recent NLP work on numeracy into a comprehensive taxonomy of tasks and methods.  ...  An analogous leaderboard could be constructed to evaluate models on numeric reasoning tasks, again categorized according to the skills evaluated, e.g., exact vs approximate granularity, or abstract vs  ... 
arXiv:2103.13136v1 fatcat:qxw7wi6dbzbtpgcoalh7j7w36i

D2S: Document-to-Slide Generation Via Query-Based Text Summarization [article]

Edward Sun, Yufang Hou, Dakuo Wang, Yunfeng Zhang, Nancy X.R. Wang
2021 arXiv   pre-print
Our evaluation suggests that long-form QA outperforms state-of-the-art summarization baselines on both automated ROUGE metrics and qualitative human evaluation.  ...  Presentations are critical for communication in all areas of our lives, yet the creation of slide decks is often tedious and time-consuming.  ...  We also thank our friends and colleagues who participated in our human evaluation study.  ... 
arXiv:2105.03664v1 fatcat:7zs3jzyev5hfvnw3x34rc5h4oa

From Crowd to Community

Neal Reeves, Ramine Tinati, Sergej Zerr, Max G. Van Kleek, Elena Simperl
2017 Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing - CSCW '17  
Online citizen science projects have been increasingly used in a variety of disciplines and contexts to enable large-scale scientific research.  ...  Our findings contribute to the ongoing discussion on citizen science design and the relationship between community and microtask design for achieving successful outcomes.  ...  metrics and evaluation methods utilised within the literature.  ... 
doi:10.1145/2998181.2998302 fatcat:myfv4eqyu5abde2lnkq4ljzcnu

Layout Aware Semantic Element Extraction for Sustainable Science & Technology Decision Support

Hyuntae Kim, Jongyun Choi, Soyoung Park, Yuchul Jung
2022 Sustainability  
Moreover, to constructing a scientific knowledge graph consisting of multiple S&T documents, we newly defined an extensible Semantic Elements Knowledge Graph (SEKG) structure.  ...  In addition, to illustrate the potential power of our SEKG, we provide two promising application scenarios, such as a scientific knowledge guide across multiple S&T documents and questions and answering  ...  Conflicts of Interest: The authors declare no conflict of interest. Sustainability 2022, 14, 2802  ... 
doi:10.3390/su14052802 fatcat:eew4bb5q55ccpavk6yxroogsgq

ChEMU 2020: Natural Language Processing Methods Are Effective for Information Extraction From Chemical Patents

Jiayuan He, Dat Quoc Nguyen, Saber A. Akhondi, Christian Druckenbrodt, Camilo Thorne, Ralph Hoessel, Zubair Afzal, Zenan Zhai, Biaoyan Fang, Hiyori Yoshikawa, Ameer Albahem, Lawrence Cavedon (+3 others)
2021 Frontiers in Research Metrics and Analytics  
The Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020), was introduced to support the development of advanced  ...  The ChEMU 2020 lab received 37 team registrations and 46 runs. Overall, the performance of submissions for these tasks exceeded our expectations, with the top systems outperforming strong baselines.  ...  AUTHOR CONTRIBUTIONS JH: managing day-to-day activities of ChEMU lab, evaluation of shared task results, baseline design, and paper writing.  ... 
doi:10.3389/frma.2021.654438 pmid:33870071 pmcid:PMC8028406 fatcat:w4vhhufqyfhshafwp4cjcobzdu

Advancing computational biology and bioinformatics research through open innovation competitions

Andrea Blasco, Michael G. Endres, Rinat A. Sergeev, Anup Jonchhe, N. J. Maximilian Macaluso, Rajiv Narayan, Ted Natoli, Jin H. Paik, Bryan Briney, Chunlei Wu, Andrew I. Su, Aravind Subramanian (+2 others)
2019 PLoS ONE  
Open data science and algorithm development competitions offer a unique avenue for rapid discovery of better computational strategies.  ...  Performance gains are evaluated quantitatively using realistic, albeit sanitized, data sets.  ...  They also must construct metrics to evaluate solutions that provide live feedback to participants.  ... 
doi:10.1371/journal.pone.0222165 pmid:31560691 pmcid:PMC6764653 fatcat:xchse6qxirbshka2flivs77yye
« Previous Showing results 1 — 15 out of 178 results