Saliency Maps-Based Convolutional Neural Networks for Facial Expression Recognition
Facial expression recognition (FER) is one of the important research contents in affective computing. It plays a key role in many application fields of human life. As a most common expression feature extraction method, the convolutional neural network (CNN) has the following main limitation. Due to the fact that the CNN network lacks the visual attention guidance, when it gets expression information it brings background noises, resulting in the lower recognition accuracy. In order to simulate
... e attention mechanism in human visual system, a salient feature extraction model is proposed, including the dilated inception module, the Difference of Gaussian (DOG) module, and the multi-indicator saliency prediction module. This model can effectively reflect the key facial information through the increase of the receptive field, the acquisition of multiscale features, and the simulation of human vision. In addition, a novel FER method for one single person is proposed. With the prior knowledge of saliency maps and the multilayer deep features in the CNN network, the recognition accuracy is improved by obtaining more targeted and more complete deep expression information. The experimental results of saliency prediction, action unit (AU) detection, and smile intensity estimation on the CAT2000, the CK+, and the BP4D databases prove that the proposed method improves the FER performance and is more effective than the existing approaches. INDEX TERMS Facial expression recognition, saliency maps, dilated convolution, prior knowledge, the convolutional neural network.