A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
Work to automate the identification of related articles in corpora of academic research content is described. Pairs of related articles are recognised on the basis of the phrases they contain, using a similarity measure that emphasizes the importance of phrase overlap. Phrases are weighted according to their significance, evaluated in terms of statistical under-or over-representation relative to corpus-level frequency, and the significance scores of n-grams with higher n values are boosted. Thedoi:10.33774/coe-2020-7zd6k-v2 fatcat:hqojz6ororfb3jedup6glpanby