116 Hits in 6.3 sec

Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet [article]

Matthias Kümmerer and Lucas Theis and Matthias Bethge
2015 arXiv   pre-print
To train our network, we build on recent work on the modeling of saliency as point processes.  ...  However, the enormous amount of training data necessary to train these networks makes them difficult to apply directly to saliency prediction.  ...  Image with fixations Deep Gaze no center bias Deep Gaze with center bias Figure 1 : Example saliency maps: The top row shows example images from the dataset by Judd et al. (2009) .  ... 
arXiv:1411.1045v4 fatcat:u4vc26rubnhllcoee75spnptmu

SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolutional Networks [article]

Guanqun Ding, Nevrez Imamoglu, Ali Caglayan, Masahiro Murakawa, Ryosuke Nakamura
2022 arXiv   pre-print
Extensive experimental results show that our SalFBNet with fewer parameters achieves competitive results on the public saliency detection benchmarks, which demonstrate the effectiveness of proposed feedback  ...  We first use the proposed feedback model to learn saliency distribution from pseudo-ground-truth. Afterwards, we fine-tune the feedback model on existing eye-fixation datasets.  ...  , where s j i denotes the predicted probability map of i-th image with j-th pre-trained model-annotator.  ... 
arXiv:2112.03731v2 fatcat:5w4k3s6fvramfkacrr44okb5c4

How is Gaze Influenced by Image Transformations? Dataset and Model [article]

Zhaohui Che and Ali Borji and Guangtao Zhai and Xiongkuo Min and Guodong Guo and Patrick Le Callet
2019 arXiv   pre-print
We find that label preserving DATs with negligible impact on human gaze boost saliency prediction, whereas some other DATs that severely impact human gaze degrade the performance.  ...  Third, we utilize the new data over transformed images, called data augmentation transformation (DAT), to train deep saliency models.  ...  We can see that for both normal and distorted test sets, valid augmented data boosts the deep models' performances. synthetic saliency maps from human gaze maps.  ... 
arXiv:1905.06803v3 fatcat:uy6pkdrw7negrbt7gd6uvccptm

DeepGaze IIE: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling [article]

Akis Linardos, Matthias Kümmerer, Ori Press, Matthias Bethge
2021 arXiv   pre-print
By replacing the VGG19 backbone of DeepGaze II with ResNet50 features we improve the performance on saliency prediction from 78% to 85%.  ...  However, as we continue to test better ImageNet models as backbones (such as EfficientNetB5) we observe no additional improvement on saliency prediction.  ...  Among the works that have focused on a principled transfer learning scheme for saliency prediction in the past was [14] , which trained a saliency model on deep features from three CNNs (AlexNet, GoogleNet  ... 
arXiv:2105.12441v3 fatcat:srjm65isxnbt3imgri2vvcfdjm

Attention Flow: End-to-End Joint Attention Estimation [article]

Ömer Sümer, Peter Gerjets, Ulrich Trautwein, Enkelejda Kasneci
2020 arXiv   pre-print
We compare the effect of saliency maps and attention mechanisms and report quantitative and qualitative results on the detection and localization of joint attention in the VideoCoAtt dataset, which contains  ...  Joint attention is the shared gaze behaviour of two or more individuals on an object or an area of interest and has a wide range of applications such as human-computer interaction, educational assessment  ...  The first three were chosen as representatives of classical computational saliency methods, whereas Deep Gaze 2 is a data-driven approach that depends on pre-trained feature representations on image classification  ... 
arXiv:2001.03960v1 fatcat:fulzfcwxovaprhrvliprt6eo7y

Attention Flow: End-to-End Joint Attention Estimation

Omer Sumer, Peter Gerjets, Ulrich Trautwein, Enkelejda Kasneci
2020 2020 IEEE Winter Conference on Applications of Computer Vision (WACV)  
We compare the effect of saliency maps and attention mechanisms and report quantitative and qualitative results on the detection and localization of joint attention in the VideoCoAtt dataset, which contains  ...  features and improve joint attention localization.  ...  The first three were chosen as representatives of classical computational saliency methods, whereas Deep Gaze 2 is a data-driven approach that depends on pre-trained feature representations on image classification  ... 
doi:10.1109/wacv45572.2020.9093515 dblp:conf/wacv/SumerGTK20 fatcat:ysny4fk2rrcvfct5c4d4gevkpu

A Simple and efficient deep Scanpath Prediction [article]

Mohamed Amine Kerkouri, Aladine Chetouani
2021 arXiv   pre-print
We experiment how well these models can predict the scanpaths on 2 datasets.  ...  Visual scanpath is the sequence of fixation points that the human gaze travels while observing an image, and its prediction helps in modeling the visual attention of an image.  ...  “Shifts in selective visual attention: to- gaze i: Boosting saliency prediction with feature maps trained on wards the underlying neural circuitry.”  ... 
arXiv:2112.04610v1 fatcat:xc7qlk4odjc6fnjvg7dgsqxzbu

Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition [article]

Swathikiran Sudhakaran, Oswald Lanz
2018 arXiv   pre-print
We learn highly specialized attention maps for each frame using class-specific activations from a CNN pre-trained for generic image recognition, and use them for spatio-temporal encoding of the video with  ...  Based on this, we develop a spatial attention mechanism that enables the network to attend to regions containing objects that are correlated with the activity under consideration.  ...  First image shows the original frame, second image shows the attention map generated with ResNet-34 trained on imagenet and the last image shows the attention map obtained using the network trained for  ... 
arXiv:1807.11794v1 fatcat:sqentueowrba5oesvn2wgxxsky

Visual Attention in Multi-Label Image Classification

Yan Luo, Ming Jiang, Qi Zhao
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)  
Specifically, we propose a dual-stream neural network that consists of two sub-networks: one is a conventional classification model, and the other is a saliency prediction model trained with human fixations  ...  Features computed with the two subnetworks are trained separately and then fine-tuned jointly using a multiple cross entropy loss.  ...  It is trained on the SALICON [8] training set and predict saliency maps of MS COCO validation images.  ... 
doi:10.1109/cvprw.2019.00110 dblp:conf/cvpr/LuoJZ19 fatcat:75xq6un35vad3dw5o4odosfxaa

StackDRL: Stacked Deep Reinforcement Learning for Fine-grained Visual Categorization

Xiangteng He, Yuxin Peng, Junjie Zhao
2018 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence  
Comparing with ten state-of-the-art methods on CUB-200-2011 dataset, our StackDRL approach achieves the best categorization accuracy.  ...  To address the "which" and "how many" problems adaptively and intelligently, this paper proposes a stacked deep reinforcement learning approach (StackDRL).  ...  The feature vector v t ∈ R d is the output of one layer in the CNN model, which is pre-trained on the ImageNet dataset [Deng et al., 2009] .  ... 
doi:10.24963/ijcai.2018/103 dblp:conf/ijcai/HePZ18 fatcat:yjlj4azwafgxzcllwugpfjatzu

Hallucinating Statistical Moment and Subspace Descriptors from Object and Saliency Detectors for Action Recognition [article]

Lei Wang, Piotr Koniusz
2020 arXiv   pre-print
In this paper, we build on a deep translational action recognition network which takes RGB frames as input to learn to predict both action concepts and auxiliary supervisory feature descriptors e.g., Optical  ...  Another descriptor encodes spatio-angular gradient distributions of saliency maps and intensity patterns.  ...  Each saliency frame is then described as a feature vector υ † = [υ /||υ ||2;I:/||I:||1] ∈ R d † , where I : denotes a vectorized low-resolution saliency map.  ... 
arXiv:2001.04627v1 fatcat:vleulkyzmjfnlftogwbrn2pvla

Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features

Hamed R. Tavakoli, Ali Borji, Jorma Laaksonen, Esa Rahtu
2017 Neurocomputing  
This paper presents a novel fixation prediction and saliency modeling framework based on inter-image similarities and ensemble of Extreme Learning Machines (ELM).  ...  Motivated by such observations, we develop a framework that estimates the saliency of a given image using an ensemble of extreme learners, each trained on an image similar to the input image.  ...  The authors would like to thank the MIT saliency benchmark team, particularly Zoya Bylinskii, for their quick response on benchmark request.  ... 
doi:10.1016/j.neucom.2017.03.018 fatcat:mmq47eo4ebdgjpxxduxbgl2tqy

Beyond Universal Saliency: Personalized Saliency Prediction with Multi-task CNN

Yanyu Xu, Nianyi Li, Junru Wu, Jingyi Yu, Shenghua Gao
2017 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence  
In this paper, we first show that such heterogeneity is common and critical for reliable saliency prediction. Our study also produces the first database of personalized saliency maps (PSMs).  ...  We model PSM based on universal saliency map (USM) shared by different participants and adopt a multi-task CNN framework to estimate the discrepancy between PSM and USM.  ...  The first two methods are based on handcrafted features, and the latter two are based on deep learning techniques. We use their code provided online to generate USMs.  ... 
doi:10.24963/ijcai.2017/543 dblp:conf/ijcai/XuLWYG17 fatcat:6aw6om2k4rc2pczilbnycqvtly

ST-MTL: Spatio-Temporal Multitask Learning Model to Predict Scanpath While Tracking Instruments in Robotic Surgery [article]

Mobarakol Islam, Vibashan VS, Chwee Ming Lim, Hongliang Ren
2021 arXiv   pre-print
We generate the task-aware saliency maps and scanpath of the instruments on the dataset of the MICCAI 2017 robotic instrument segmentation challenge.  ...  Incorporating cognitive ability to automate the camera control enables the surgeon to concentrate more on dealing with surgical instruments.  ...  We can infer that our model predicts instrument segmentation with less false positive and best saliency and task-oriented scanpath. Refined Fixation Map (I t ) Fig. 7 .  ... 
arXiv:2112.08189v1 fatcat:ck7rypz2lnfbplc4z3kejcjzae

SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks

Xun Huang, Chengyao Shen, Xavier Boix, Qi Zhao
2015 2015 IEEE International Conference on Computer Vision (ICCV)  
This paper presents a focused study to narrow the semantic gap with an architecture based on Deep Neural Network (DNN).  ...  We compare our method with 14 saliency models on 6 public eye tracking benchmark datasets.  ...  [24] initially transferred features directly from a DNN for object recognition, and showed promising results with a model that they called Deep Gaze.  ... 
doi:10.1109/iccv.2015.38 dblp:conf/iccv/HuangSBZ15 fatcat:gvg2yphivjh7vdrrt4fmdsboyq
« Previous Showing results 1 — 15 out of 116 results