3-D Visual Object Classification with Hierarchical Radial Basis Function Networks [chapter]

F. Schwenker, H. A. Kestler
2001 Studies in Fuzziness and Soft Computing  
In this chapter we present a 3-D visual object recognition system for an autonomous mobile robot. This object recognition system performs the following three tasks: Object localisation in the camera images, feature extraction, and classification of the extracted feature vectors with hierarchical radial basis function (RBF) networks. In primitives-based approaches the 3-D objects are modelled using a small set of 3-D volumetric primitives (cubes, cylinders, cones, etc) in a CAD-like model. In
more » ... recognition phase the most important step is to identify the primitives that are visible in the camera image. This approach is derived from the recognition-by-components theory developped by Biederman in [18, 19] . It seems that this approach is reasonable for CAD applications, but has its limitations for the recognition of free-form objects, for example in face recognition. Psychophysical results achieved during the last years have shown that humans are able to learn to recognize 3-D objects from different characteristic 2-D views. In these view-based approaches a set of 2-D views of each object is stored or learned in order to build an internal object representation of the 3-D object. In the recognition phase of such a viewbased system a single 2-D view of an object is compared to the learnt 2-D views. This processing step is related to methods like template matching and nearest neighbor classification. One of the main tasks in these viewbased approaches is the selection of characteristic object views. The objects have to be recorded from various viewpoints, in different poses and with different illumination in order to build a recognition system which is robust under all such transformations. Artificial neural network models can be used to learn to recognize 3-D objects on the basis of a small set of 2-D camera images which are recorded from distinct view points [4] . Based on a training set of feature vectors, the network learns a discrimination function in the highdimensional feature space. For this kind of classification task supervised network training procedures must be utilized. Often synthetic images or well prepared data sets ignoring problems which are present at lower processing levels have been used in order to simplify the recognition problem, e.g. the 3-D objects are always in the center of the camera images. We attempt to solve a more realistic problem and use camera images recorded from real 3-D objects for training and testing the recognition system. In the recognition phase scenes with multiple 3-D objects may be presented to the recognition system. The recognition of a 3-D object consisted of the following three subtasks which will be discussed throughout this chapter: 1. Localization of objects in the camera image. In this processing step the entire camera image is segmented into
doi:10.1007/978-3-7908-1826-0_8 fatcat:cjdb6b7kijgqbldf3bwslw7l3i