Multi-view representation learning for natural language processing applications [article]

Nikolaos Papasarantopoulos, University Of Edinburgh, University Of Edinburgh, Shay Cohen, Stephen Renals
The pervasion of machine learning in a vast number of applications has given rise to an increasing demand for the effective processing of complex, diverse and variable datasets. One representative case of data diversity can be found in multi-view datasets, which contain input originating from more than one source or having multiple aspects or facets. Examples include, but are not restricted to, multimodal datasets, where data may consist of audio, image and/or text. The nature of multi-view
more » ... sets calls for special treatment in terms of representation. A subsequent fundamental problem is that of combining information from potentially incoherent sources; a problem commonly referred to as view fusion. Quite often, the heuristic solution of early fusion is applied to this problem: aggregating representations from different views using a simple function (concatenation, summation or mean pooling). However, early fusion can cause overfitting in the case of small training samples and also, it may result in specific statistical properties of each view being lost in the learning process. Representation learning, the set of ideas and algorithms devised to learn meaningful representations for machine learning problems, has recently grown to a vibrant research field, that encompasses multiple view setups. A plethora of multi-view representation learning methods has been proposed in the literature, with a large portion of them being based on the idea of maximising the correlation between available views. Commonly, such techniques are evaluated on synthetic datasets or strictly defined benchmark setups; a role that, within Natural Language Processing, is often assumed by the multimodal sentiment analysis problem. This thesis argues that more complex downstream applications could benefit from such representations and describes a multi-view contemplation of a range of tasks, from static, two-view, unimodal to dynamic, three-view, trimodal applications.setting out to explore the limits of the seeming applicability of multi-view [...]
doi:10.7488/era/267 fatcat:f7tjkwf6prc3zb74oh653zav5a