Sketch-a-Net that Beats Humans

Qian Yu, Yongxin Yang, Yi-Zhe Song, Tao Xiang, Timothy Hospedales
2015 Procedings of the British Machine Vision Conference 2015  
Sketches are very intuitive to humans and have long been used as an effective communicative tool. With the proliferation of touchscreens, sketching has become a much easier undertaking for many -we can sketch on phones, tablets and even watches. However, recognising free-hand sketches (e.g. asking a person to draw a car without any instance of car as reference) is an extremely challenging task. This is due to a number of reasons: (i) sketches are highly iconic and abstract, e.g., human figures
more » ... an be depicted as stickmen; (ii) due to the free-hand nature, the same object can be drawn with hugely varied levels of detail/abstraction, e.g., a human figure sketch can be either a stickman or a portrait with fine details depending on the drawer; (iii) sketches lack visual cues, i.e., they consist of black and white lines instead of coloured pixels. A recent large-scale study on 20,000 free-hand sketches across 250 categories of daily objects puts human sketch recognition accuracy at 73.1% [2], suggesting that the task is challenging even for humans. Prior work on sketch recognition generally follows the conventional image classification paradigm, that is, extracting hand-crafted features from sketch images followed by feeding them to a classifier. Most handcrafted features traditionally used for photos (such as HOG, SIFT and shape context) have been employed, which are often coupled with Bagof-Words (BoW) to yield a final feature representations that can then be classified. However, existing hand-crafted features designed for photos do not account for the unique abstract and sparse nature of sketches. Furthermore, they ignore a key unique characteristics of sketches, that is, a sketch is essentially an ordered list of strokes; they are thus sequential in nature (See Fig 1) . In contrast with photos that consist of pixels sampled all at once, a sketch is the result of an online drawing process. It had long been recognised in psychology that such sequential ordering is a strong cue in human sketch recognition, a phenomenon that is also confirmed by recent studies in the computer vision literature [7] . However, none of the
doi:10.5244/c.29.7 dblp:conf/bmvc/YuYSXH15 fatcat:ocn4m62sqbhcda6ywob7oqmyje