A reconfigurable accelerator for neuromorphic object recognition

Jagdish Sabarad, Srinidhi Kestur, Mi Sun Park, Dharav Dantara, Vijaykrishnan Narayanan, Yang Chen, Deepak Khosla
2012 17th Asia and South Pacific Design Automation Conference  
A significant challenge in creating machines with artificial vision is designing systems which can process visual information as e ciently as the human brain. Recent advances in neuroscience have enabled researchers to develop computational models of auditory, visual and learning perceptions in the human brain. Among these models, the two widely accepted algorithms that model the process of attention and recognition in the mammalian visual pathway are -the Saliency based model for visual
more » ... on and HMAX model for object recognition. One of the major burdens of these biologically plausible models is their massive computational demands. Real time implementation of these biologically inspired vision algorithms, while challenging, can have a diverse and profound impact in applications like autonomous vehicle navigation, surveillance, robotics and face, text and gesture recognition. To mimic true biological systems, implementations of these algorithms must not only meet real-time performance goals, but also stringent power budgets and small form-factors. Previous attempts to parallelize the HMAX model on multi-core processors have been unable to provide real-time performance due to limited parallelism and high computational complexity. Researchers have leveraged graphics processors due to their ease of programmability and high parallelism. However, their excessive power consumption hinders deployment in embedded or low-power systems. The focus of this work is on the design and architecture of a reconfigurable hardware accelerator for the time consuming S2-C2 stage of the HMAX model. The accelerator leverages spatial parallelism, dedicated wide data buses with on-chip memories to provide an energy e cient iii solution to enable adoption into embedded systems. This work presents a systolic array-based architecture which includes a run-time reconfigurable convolution engine which can perform multiple variable-sized convolutions in parallel. An automation flow is described for this accelerator which can generate optimal hardware configurations for a given algorithmic specification and also perform run-time configuration and execution seamlessly. Experimental results on Virtex-6 FPGA platforms show 5X to 11X speedups and 14X to 33X higher performance-per-Watt over a CNS-based implementation on a Tesla GPU. iv
doi:10.1109/aspdac.2012.6165067 dblp:conf/aspdac/SabaradKPDNCK12 fatcat:7lc6m2ttzjgonjxkwdewb65q7i