Zero-Shot and Few-Shot Classification of Biomedical Articles in Context of the COVID-19 Pandemic [article]

Simon Lupart, Benoit Favre, Vassilina Nikoulina, Salah Ait-Mokhtar
2022
MeSH (Medical Subject Headings) is a large thesaurus created by the National Library of Medicine and used for fine-grained indexing of publications in the biomedical domain. In the context of the COVID-19 pandemic, MeSH descriptors have emerged in relation to articles published on the corresponding topic. Zero-shot classification is an adequate response for timely labeling of the stream of papers with MeSH categories. In this work, we hypothesise that rich semantic information available in MeSH
more » ... has potential to improve BioBERT representations and make them more suitable for zero-shot/few-shot tasks. We frame the problem as determining if MeSH term definitions, concatenated with paper abstracts are valid instances or not, and leverage multi-task learning to induce the MeSH hierarchy in the representations thanks to a seq2seq task. Results establish a baseline on the MedLine and LitCovid datasets, and probing shows that the resulting representations convey the hierarchical relations present in MeSH.
doi:10.48550/arxiv.2201.03017 fatcat:jxi7ne52h5gsrduazoxbp47qbm