Ten Questions for a Theory of Vision

Marco Gori
2022 Frontiers in Computer Science  
By and large, the remarkable progress in visual object recognition in the last few years has been fueled by the availability of huge amounts of labelled data paired with powerful, bespoke computational resources. This has opened the doors to the massive use of deep learning, which has led to remarkable improvements on new challenging benchmarks. While acknowledging this point of view, in this paper I claim that the time has come to begin working towards a deeper understanding of visual
more » ... onal processes that, instead of being regarded as applications of general purpose machine learning algorithms, are likely to require tailored learning schemes. A major claim of in this paper is that current approaches to object recognition lead to facing a problem that is significantly more difficult than the one offered by nature. This is because of learning algorithms that work on images in isolation, while neglecting the crucial role of temporal coherence. Starting from this remark, this paper raises ten questions concerning visual computational processes that might contribute to better solutions to a number of challenging computer vision tasks. While this paper is far from being able to provide answers to those questions, it contains some insights that might stimulate an in-depth re-thinking in object perception, while suggesting research directions in the control of object-directed action.
doi:10.3389/fcomp.2021.701248 fatcat:zzxmdu35ufhe3lg7ski6uontka