Social content and emotional valence modulate gaze fixations in dynamic scenes

Marius Rubo, Matthias Gamer
2018 Scientific Reports  
Previous research has shown that low-level visual features (i.e., low-level visual saliency) as well as socially relevant information predict gaze allocation in free viewing conditions. However, these studies mainly used static and highly controlled stimulus material, thus revealing little about the robustness of attentional processes across diverging situations. Secondly, the influence of affective stimulus characteristics on visual exploration patterns remains poorly understood. Participants
more » ... n the present study freely viewed a set of naturalistic, contextually rich video clips from a variety of settings that were capable of eliciting different moods. Using recordings of eye movements, we quantified to what degree social information, emotional valence and low-level visual features influenced gaze allocation using generalized linear mixed models. We found substantial and similarly large regression weights for low-level saliency and social information, affirming the importance of both predictor classes under ecologically more valid dynamic stimulation conditions. Differences in predictor strength between individuals were large and highly stable across videos. Additionally, low-level saliency was less important for fixation selection in videos containing persons than in videos not containing persons, and less important for videos perceived as negative. We discuss the generalizability of these findings and the feasibility of applying this research paradigm to patient groups. Like most vertebrates, humans can only obtain a part of their visual field at a high acuity and therefore repeatedly move their eyes in order to construct a representation of their environment with sufficiently high resolution 1 . Controlling gaze along with retrieving and filtering relevant signals from the environment is a central task of the attentional system 2 . In the past, various lines of research have addressed the mechanisms driving such attentional control. As sociability is one of human's key features 3 , a large body of research has assessed how we gather social information in order to infer other persons' intentions and feelings. For instance, it was shown that socially relevant features like human heads and eyes 4,5 , gaze direction of depicted people 6 , people who are talking 7 and people with high social status 8 attract attention when freely viewing images or dynamic scenes. However, non-social cues like text 9,10 and the center of the screen 11-13 can also serve as predictors for gaze behavior. Another line of research has focused on the predictive value of low-level image features such as contrast, color, edge density and, for dynamic scenes, motion. A range of algorithms exists to extract these features in images and videos and condense them into one low-level saliency value between 0 and 1 for each pixel, resulting in topographic low-level saliency maps 14 . Low-level saliency has been shown to explain fixation patterns for a variety of naturalistic and abstract images 15,16 , as well as naturalistic videos 12,17,18 and has been argued to be a biologically plausible model of early visual processing 19 . The influence of social stimuli and visual low-level saliency on eye movements have only recently been studied within the same datasets, and rarely in direct juxtaposition. During face perception, it was shown that facial regions diagnostic for emotional expressions received enhanced attention irrespective of their physical low-level saliency 20 . Birmingham and colleagues found social areas in an image to be a better predictor for fixation behavior than low-level saliency 21, 22 . Other studies found faces to outperform low-level saliency on gaze prediction in dynamic scenes showing conversations between persons 7 and documented higher predictive power for faces than for low-level saliency for adult participants watching a comic clip, although faces were not controlled for low-level saliency in this particular analysis 23 . Several studies reported an improvement of low-level saliency-based models by including faces as predictors 9,24,25 . Xu and colleagues included a variety of predictors at pixel level (color,
doi:10.1038/s41598-018-22127-w pmid:29491440 pmcid:PMC5830578 fatcat:3yydrmb2izdhnp4va63acllj3i