Statistical analyses of digital collections: using a large corpus of systematic reviews to study non-citations

Tove Faber Frandsen, Jeppe Nicolaisen
2017 Libellarium: Journal for the Research of Writing, Books, and Cultural Heritage Institutions  
Using statistical methods to analyse digital material for patterns makes it possible to detect patterns in big data that we would otherwise not be able to detect. This paper seeks to exemplify this fact by statistically analysing a large corpus of references in systematic reviews. The aim of the analysis is to study the phenomenon of non-citation: Situations where just one (or some) document(s) are cited from a pool of otherwise equally citable documents. The study is based on more than 120,000
more » ... n more than 120,000 cited studies, and a total number of non-cited studies of more than 1.6 million. The number of cited studies is found to be much smaller than the number of non-cited. Also, the cited and non-cited studies are found to differ in age. Very recent studies tend to be non-cited whereas the cited studies are rarely of recent age (e.g. within the same year). The greatest differences are found within the first 10 years. After 10 years the cited and non-cited studies tend to be more similar in terms of age. Separating the data set into different sub-disciplines reveals that the sub-disciplines vary in terms of age of cited vs. non-cited references. Some fields may be expanding and the number of published studies is thus growing. Consequently, cited and non-cited studies tend to be younger. Other fields may be more slowly progressing fields that use a greater proportion of the older literature within the field. These field differences manifest themselves in the average age of references. Tove Faber Frandsen, Jeppe Nicolaisen , Statistical analyses of digital collections: using a large corpus of systematic reviews to study non-citations, Libellarium, IX, 2 (2016): 81 -94.
doi:10.15291/libellarium.v9i2.253 fatcat:vkuoof3o2rhdvef7ogqhjnjj7e