PSU at CLEF-2020 ARQMath Track: Unsupervised Re-ranking using Pretraining

Shaurya Rohatgi, Jian Wu, C. Lee Giles
2020 Conference and Labs of the Evaluation Forum  
This paper elaborates on our submission to the ARQMath track at CLEF 2020. Our primary run for the main Task-1: Question Answering uses a two-stage retrieval technique in which the first stage is a fusion of traditional BM25 scoring and tf-idf with cosine similarity-based retrieval while the second stage is a finer re-ranking technique using contextualized embeddings. For the re-ranking we use a pre-trained robertabase model (110 million parameters) to make the language model more math-aware.
more » ... r approach achieves a higher NDCG score than the baseline, while our MAP and P@10 scores are competitive, performing better than the best submission (MathDowsers) for text and text+formula dependent topics.
dblp:conf/clef/Rohatgi0G20 fatcat:76djkwmfejbeva5e7j2y74irve