A Multi-scale CNN for Affordance Segmentation in RGB Images [chapter]

Anirban Roy, Sinisa Todorovic
2016 Lecture Notes in Computer Science  
Given a single RGB image our goal is to label every pixel with an affordance type. By affordance, we mean an object's capability to readily support a certain human action, without requiring precursor actions. We focus on segmenting the following five affordance types in indoor scenes: 'walkable', 'sittable', 'lyable', 'reachable', and 'movable'. Our approach uses a deep architecture, consisting of a number of multiscale convolutional neural networks, for extracting mid-level visual cues and
more » ... ining them toward affordance segmentation. The mid-level cues include depth map, surface normals, and segmentation of four types of surfaces -namely, floor, structure, furniture and props. For evaluation, we augmented the NYUv2 dataset with new ground-truth annotations of the five affordance types. We are not aware of prior work which starts from pixels, infers mid-level cues, and combines them in a feed-forward fashion for predicting dense affordance maps of a single RGB image.
doi:10.1007/978-3-319-46493-0_12 fatcat:3py62rcasvgkjbknitxntws7vi