Integrating Randomization and Discrimination for Classifying Human-Object Interaction Activities [chapter]

Aditya Khosla, Bangpeng Yao, Li Fei-Fei
2014 Human-Centered Social Media Analytics  
Psychologists have shown that the ability of humans to perform basic-level categorization (e.g. cars vs. dogs; kitchen vs. highway) develops well before their ability to perform subordinate-level categorization, or fine-grained visual categorization (e.g. distinguishing dog breeds such as Golden retrievers vs. Labradors) [18] . It is interesting to observe that computer vision research has followed a similar trajectory. Basic-level object and scene recognition has seen great progress [15, 21,
more » ... , 31] while fine-grained categorization has received little attention. Unlike basic-level recognition, even humans might have difficulty with some of the fine-grained categorization [32] . Thus, an automated visual system for this task could be valuable in many applications. Action recognition in still images can be regarded as a fine-grained classification problem [17] as the action classes only differ by human pose or type of human-object interactions. Unlike traditional object or scene recognition problems where different classes can be distinguished by different parts or coarse spatial layout [16, 21, 15] , more detailed visual distinctions need to be explored for finegrained image classification. The bounding boxes in Figure 1 demarcate the dis-
doi:10.1007/978-3-319-05491-9_5 fatcat:2rmnayfpgjdp7envy6vsj4pgqa