Principled Approach to Natural Language Generation
Yevgeniy Puzikov
2021
The research field of Natural Language Generation offers practitioners a wide range of techniques for producing texts from a variety of data types. These techniques find their way into various real- world applications and help many people to automate time-consuming tasks of text production in many areas. At the moment, the design and evaluation of text generation approaches is largely empirical. Many systems are being developed to solve one particular task and work on a single data type, which
more »
... akes it hard to compare the approach to any other technique and critically evaluate its performance. Some systems employ complex machine learning algorithms to learn rich data representations and perform joint modeling of the steps involved in the process of text generation. Such approaches offer an attractive trade-off between the development costs and output quality, but often lack transparency in terms of the reasoning about the behavior of the system. The number of the proposed approaches constantly grows, but the methodology lags behind and sometimes fails to solicit a better understanding of which approaches work, and the reasons for it. In this thesis we present our view on the task of text production from a methodological point of view. We analyze the existent scientific literature, examine common text generation approaches and the established evaluation protocols. We further propose a principled view on the problem: we break it into components, examine their interaction and develop a set of recommendations which are envisioned to offer assistance during the design or analysis of a study. We further conduct a range of experiments to test this framework in several text generation tasks. First, we show that task specification analysis sometimes allows one to solve the problem at hand with very simple techniques, without resorting to the complex machinery of advanced statistical learning methods. We further demonstrate the potential of the developed framework to find discrepancies in the established evaluation protoco [...]
doi:10.26083/tuprints-00019115
fatcat:nymcjpnsmjcgtmvmdbyhjgo6ce