Can Machines Learn to Comprehend Scientific Literature?

Donghyeon Park, Yonghwa Choi, Daehan Kim, Minhwan Yu, Seongsoon Kim, Jaewoo Kang
2019 IEEE Access  
To measure the ability of a machine to understand professional-level scientific articles, we construct a scientific question answering task called PaperQA. The PaperQA task is based on more than 80 000 "fill-in-the-blank" type questions on articles from reputed scientific journals such as Nature and Science. We perform fine-grained linguistic analysis and evaluation to compare PaperQA and other conventional question and answering (QA) tasks on general literature (e.g., books, news articles, and
more » ... Wikipedia texts). The results indicate that the PaperQA task is the most difficult QA task for both humans (lay people) and machines (deep-learning models). Moreover, humans generally outperform machines in conventional QA tasks, but we found that advanced deep-learning models outperform humans by 3%-13% on average in the PaperQA task. The PaperQA dataset used in this paper is publicly available at http://dmis.korea.ac.kr/downloads?id=PaperQA.
doi:10.1109/access.2019.2891666 fatcat:t35pl7o7pvdmhh2ehullgow2di