Augmenting mathematical formulae for more effective querying & efficient presentation [article]

Moritz Schubotz, Technische Universität Berlin, Technische Universität Berlin, Volker Markl
<span title="2017-07-27">2017</span>
Mathematical Information Retrieval (MIR) is a research area that focuses on the Information Need (IN) of the Science, Technology, Engineering and Mathematics (STEM) domain. Unlike traditional Information Retrieval (IR) research, that extracts information from textual data sources, MIR takes mathematical formulae into account as well. This thesis makes three main contributions: 1. It analyses the strengths and weaknesses of current MIR systems and establishes a new MIR task for future
more &raquo; ... ; 2. Based on the analysis, it augments mathematical notation as a foundation for future MIR systems to better fit the IN from the STEM domain; and 3. It presents a solution on how large web publishers can efficiently present mathematics to satisfy the INs of each individual visitor. With regard to evaluation of MIR systems, it analyses the first international MIR task and proposes the Math Wikipedia Task (WMC). In contrast to other tasks, which evaluate the overall performance of MIR systems based on an IN, that is described by a combination of textual keywords and formulae, WMC was designed to gain insights about the math-specific aspects of MIR systems. In addition to that, this thesis investigates how different factors of similarity measures for mathematical expressions influence the effectiveness of MIR results. Based on the aforementioned evaluations, this thesis proposes to rethink the fundamentals of MIR systems. MIR systems should elevate the internal representation of mathematics and use a more semantic rather than syntactic representation for the retrieval algorithms. This approach simplifies MIR research by defining three orthogonal MIR research challenges: (1) Augmentation; (2) Querying; and (3) Efficient Execution. As augmentation target, this thesis proposes the concept of context-free formulae visualized by the idea of Formula Home Page (FHP). By visiting a FHP, a mathematically literate person can fully understand the formula semantics without context or additional resources. As a first step towards unsupe [...]
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="">doi:10.14279/depositonce-6034</a> <a target="_blank" rel="external noopener" href="">fatcat:mvb7hxd6mjcrjaje6ik3jpujdm</a> </span>
