A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Identification of Tasks, Datasets, Evaluation Metrics, and Numeric Scores for Scientific Leaderboards Construction
2019
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
In this paper we build two datasets and develop a framework (TDMS-IE) aimed at automatically extracting task, dataset, metric and score from NLP papers, towards the automatic construction of leaderboards ...
While the fast-paced inception of novel tasks and new datasets helps foster active research in a community towards interesting directions, keeping track of the abundance of research activity in different ...
Dataset Construction We create two datasets for testing our approach for task, dataset, metric, and score (TDMS) identification. ...
doi:10.18653/v1/p19-1513
dblp:conf/acl/HouJGBG19
fatcat:h2mpyifv5vanpa7cyjo2bhjvpm
Automated Mining of Leaderboards for Empirical AI Research
[article]
2021
arXiv
pre-print
Our analysis reveals an optimal approach that significantly outperforms existing baselines for the task with evaluation scores above 90% in F1. ...
Specifically, we investigate the problem of automated Leaderboard construction using state-of-the-art transformer models, viz. Bert, SciBert, and XLNet. ...
Acknowledgements This work was co-funded by the Federal Ministry of Education and Research (BMBF) of Germany for the project LeibnizKILabor (grant no. 01DD20003) and by the European Research Council for ...
arXiv:2109.13089v1
fatcat:7wc55fqho5cwjblnoxsg46i4wq
Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development
[article]
2021
arXiv
pre-print
However, advancement in this field requires formulation of meaningful learning tasks and careful curation of datasets. ...
To date, TDC includes 66 AI-ready datasets spread across 22 learning tasks and spanning the discovery and development of safe and effective medicines. ...
For each dataset, we provide a data description and statistics, together with the recommended dataset splits and evaluation metrics and units in the case of numeric labels. ...
arXiv:2102.09548v2
fatcat:i5f5vrbaxnehhmhqiuwkkx2s6y
Metric-Type Identification for Multi-Level Header Numerical Tables in Scientific Papers
[article]
2021
arXiv
pre-print
We introduce a new information extraction task, metric-type identification from multi-level header numerical tables, and provide a dataset extracted from scientific papers consisting of header tables, ...
Numerical tables are widely used to present experimental results in scientific papers. For table understanding, a metric-type is essential to discriminate numbers in the tables. ...
We thank the anonymous reviewers for helpful discussion of this work and comments on previous drafts of the paper. ...
arXiv:2102.00819v1
fatcat:5psekqlys5c7hayhifvrpy55t4
DataPerf: Benchmarks for Data-Centric AI Development
[article]
2022
arXiv
pre-print
Machine learning (ML) research has generally focused on models, while the most prominent datasets have been employed for everyday ML tasks without regard for the breadth, difficulty, and faithfulness of ...
To solve this problem, we present DataPerf, a benchmark package for evaluating ML datasets and dataset-working algorithms. ...
DataPerf is a scientific instrument to systematically measure the quality of training and test datasets on a variety of ML tasks and to measure the quality of algorithms for constructing such datasets. ...
arXiv:2207.10062v1
fatcat:n7rahesyprfcji4ml6webxr7q4
Metric-Type Identification for Multilevel Header Numerical Tables in Scientific Papers
2021
Journal of Natural Language Processing
Herein, we introduce a new information extraction task, i.e., metric-type identification from multilevel header numerical tables, and provide a dataset extracted from scientific papers comprising header ...
Numerical tables are widely used to present experimental results in scientific papers. For table understanding, a metric-type is essential to discriminate numbers in the tables. ...
We thank the anonymous reviewers for their discussions pertaining to this study and their comments on previous drafts of the paper. ...
doi:10.5715/jnlp.28.1247
fatcat:nh3uebxhgndqrddyjrhuojb3z4
Computer Science Named Entity Recognition in the Open Research Knowledge Graph
[article]
2022
arXiv
pre-print
This work proposes a standardized task by defining a set of seven contribution-centric scholarly entities for CS NER viz., research problem, solution, resource, language, tool, method, and dataset. ...
Currently, progress on CS NER -- the focus of this work -- is hampered in part by its recency and the lack of a standardized annotation aim for scientific entities/terms. ...
They release a public download dump of crowdsourced leaderboards in scholarly articles on research problems in AI annotated w.r.t. task, dataset, metric, score, and method entities. ...
arXiv:2203.14579v1
fatcat:fzq37ng56zhovomjnorusvfg3i
Assessment of network module identification across complex diseases
2019
Nature Methods
Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies. ...
This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology. ...
The computations were performed at the Vital-IT (http://www.vital-it.ch) Center for high-performance computing of the SIB Swiss Institute of Bioinformatics. ...
doi:10.1038/s41592-019-0509-5
pmid:31471613
pmcid:PMC6719725
fatcat:rledffj7c5hnnaqamxleb2dnpy
UPB at SemEval-2020 Task 6: Pretrained Language Models for Definition Extraction
[article]
2020
arXiv
pre-print
Our best performing model evaluated on the DeftEval dataset obtains the 32nd place for the first subtask and the 37th place for the second subtask. ...
We also explore a multi-task architecture that was trained to jointly predict the outputs for the second and the third subtasks. ...
Table 2 reports the evaluation metrics, Macro-Precision, Recall, and F1-scores, respectively, on both development and test datasets. ...
arXiv:2009.05603v2
fatcat:qtiyvmikejbktemuospr5h462e
Representing Numbers in NLP: a Survey and a Vision
[article]
2021
arXiv
pre-print
We synthesize best practices for representing numbers in text and articulate a vision for holistic numeracy in NLP, comprised of design trade-offs and a unified evaluation. ...
We arrange recent NLP work on numeracy into a comprehensive taxonomy of tasks and methods. ...
An analogous leaderboard could be constructed to evaluate models on numeric reasoning tasks, again categorized according to the skills evaluated, e.g., exact vs approximate granularity, or abstract vs ...
arXiv:2103.13136v1
fatcat:qxw7wi6dbzbtpgcoalh7j7w36i
D2S: Document-to-Slide Generation Via Query-Based Text Summarization
[article]
2021
arXiv
pre-print
Our evaluation suggests that long-form QA outperforms state-of-the-art summarization baselines on both automated ROUGE metrics and qualitative human evaluation. ...
Presentations are critical for communication in all areas of our lives, yet the creation of slide decks is often tedious and time-consuming. ...
We also thank our friends and colleagues who participated in our human evaluation study. ...
arXiv:2105.03664v1
fatcat:7zs3jzyev5hfvnw3x34rc5h4oa
From Crowd to Community
2017
Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing - CSCW '17
Online citizen science projects have been increasingly used in a variety of disciplines and contexts to enable large-scale scientific research. ...
Our findings contribute to the ongoing discussion on citizen science design and the relationship between community and microtask design for achieving successful outcomes. ...
metrics and evaluation methods utilised within the literature. ...
doi:10.1145/2998181.2998302
fatcat:myfv4eqyu5abde2lnkq4ljzcnu
Layout Aware Semantic Element Extraction for Sustainable Science & Technology Decision Support
2022
Sustainability
Moreover, to constructing a scientific knowledge graph consisting of multiple S&T documents, we newly defined an extensible Semantic Elements Knowledge Graph (SEKG) structure. ...
In addition, to illustrate the potential power of our SEKG, we provide two promising application scenarios, such as a scientific knowledge guide across multiple S&T documents and questions and answering ...
Conflicts of Interest: The authors declare no conflict of interest. Sustainability 2022, 14, 2802 ...
doi:10.3390/su14052802
fatcat:eew4bb5q55ccpavk6yxroogsgq
ChEMU 2020: Natural Language Processing Methods Are Effective for Information Extraction From Chemical Patents
2021
Frontiers in Research Metrics and Analytics
The Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020), was introduced to support the development of advanced ...
The ChEMU 2020 lab received 37 team registrations and 46 runs. Overall, the performance of submissions for these tasks exceeded our expectations, with the top systems outperforming strong baselines. ...
AUTHOR CONTRIBUTIONS JH: managing day-to-day activities of ChEMU lab, evaluation of shared task results, baseline design, and paper writing. ...
doi:10.3389/frma.2021.654438
pmid:33870071
pmcid:PMC8028406
fatcat:w4vhhufqyfhshafwp4cjcobzdu
Advancing computational biology and bioinformatics research through open innovation competitions
2019
PLoS ONE
Open data science and algorithm development competitions offer a unique avenue for rapid discovery of better computational strategies. ...
Performance gains are evaluated quantitatively using realistic, albeit sanitized, data sets. ...
They also must construct metrics to evaluate solutions that provide live feedback to participants. ...
doi:10.1371/journal.pone.0222165
pmid:31560691
pmcid:PMC6764653
fatcat:xchse6qxirbshka2flivs77yye
« Previous
Showing results 1 — 15 out of 178 results