On incorporating inductive biases into deep neural networks [article]

Sameera Ramasinghe, University, The Australian National
A machine learning (ML) algorithm can be interpreted as a system that learns to capture patterns in data distributions. Before the modern \emph{deep learning era}, emulating the human brain, the use of structured representations and strong inductive bias have been prevalent in building ML models, partly due to the expensive computational resources and the limited availability of data. On the contrary, armed with increasingly cheaper hardware and abundant data, deep learning has made
more » ... d progress during the past decade, showcasing incredible performance on a diverse set of ML tasks. In contrast to \emph{classical ML} models, the latter seeks to minimize structured representations and inductive bias when learning, implicitly favoring the flexibility of learning over manual intervention. Despite the impressive performance, attention is being drawn towards enhancing the (relatively) weaker areas of deep models such as learning with limited resources, robustness, minimal overhead to realize simple relationships, and ability to generalize the learned representations beyond the training conditions, which were (arguably) the forte of classical ML. Consequently, a recent hybrid trend is surfacing that aims to blend structured representations and substantial inductive bias into deep models, with the hope of improving them. Based on the above motivation, this thesis investigates methods to improve the performance of deep models using inductive bias and structured representations across multiple problem domains. To this end, we inject a priori knowledge into deep models in the form of enhanced feature extraction techniques, geometrical priors, engineered features, and optimization constraints. Especially, we show that by leveraging the prior knowledge about the task in hand and the structure of data, the performance of deep learning models can be significantly elevated. We begin by exploring equivariant representation learning. In general, the real-world observations are prone to fundamental transformations (e.g., [...]
doi:10.25911/ph29-x543 fatcat:m5rkvjyy65hunkbp56qvgkpgqi