Detection of eye contact with deep neural networks is as accurate as human experts [post]

Eunji Chong, Elysha Clark-Whitney, Audrey Southerland, Elizabeth Stubbs, Chanel Miller, Eliana L. Ajodan, Melanie R. Silverman, Catherine Lord, Agata Rozga, Rebecca Merrill Jones, James M. Rehg
2020 unpublished
Eye contact is among the most primary means of social communication that humans use from the first months of life. Quantification of eye contact is valuable in various scenarios as a part of the analysis of social roles, communication skills, and medical screening. Estimating a subject's looking direction from video is a challenging task, but eye contact can be effectively captured by a wearable point-of-view camera which provides a unique viewpoint as a result of its configuration. While
more » ... s of eye contact from this viewpoint can be hand coded, such process tends to be laborious and subjective. In this work, we developed the first deep neural network model to automatically detect eye contact in egocentric video with accuracy equivalent to that of human experts. We trained a deep convolutional neural network using a dataset of 4,339,879 annotated images, consisting of 103 subjects with diverse demographic backgrounds. 57 have a diagnosis of Autism Spectrum Disorder. The network achieves overall precision 0.936 and recall 0.943 on 18 set-aside validation subjects, and performance is on par with 10 trained human coders with a mean precision 0.918 and recall 0.946. This result passes class equivalence tests in Cohen's kappa scores (equivalence boundary of 0.025, p < .005), demonstrating that deep learning model can produce automated coding with a level of reliability comparable to human coders. The presented method will be instrumental in analyzing gaze behavior in naturalistic social settings by serving as a scalable, objective, and accessible tool for clinicians and researchers.
doi:10.31219/osf.io/5a6m7 fatcat:k6icomqqczbndb26dh67vug77y