Filters








4,259 Hits in 9.2 sec

Counting with Adaptive Auxiliary Learning [article]

Yanda Meng, Joshua Bridge, Meng Wei, Yitian Zhao, Yihong Qiao, Xiaoyun Yang, Xiaowei Huang, Yalin Zheng
2022 arXiv   pre-print
Unlike existing auxiliary task learning based methods, we develop an attention-enhanced adaptively shared backbone network to enable both task-shared and task-tailored features learning in an end-to-end  ...  The whole framework pays special attention to the objects' spatial locations and varied density levels, informed by object (or crowd) segmentation and density level segmentation auxiliary tasks.  ...  [21] employed an attention mask to refine the density map for adapting to different density levels. Furthermore, Zhang et al.  ... 
arXiv:2203.04061v1 fatcat:rii7aigvijc3hhbljlicyr2swa

Spatial-Temporal Deep Embedding for Vehicle Trajectory Reconstruction from High-Angle Video [article]

Tianya T. Zhang Ph.D., Peter J. Jin Ph.D., Han Zhou, Benedetto Piccoli, Ph.D
2022 arXiv   pre-print
In this paper, we developed Spatial-Temporal Deep Embedding (STDE) model that imposes parity constraints at both pixel and instance levels to generate instance-aware embeddings for vehicle stripe segmentation  ...  Spatial-temporal Map (STMap)-based methods have shown great potential to process high-angle videos for vehicle trajectory reconstruction, which can meet the needs of various data-driven modeling and imitation  ...  The initial backbone is replaced with Inception blocks to take full advantage of the flexible design of the Inception network.  ... 
arXiv:2209.08417v1 fatcat:qconn6bpq5e5rmj7hbivqcomza

Re-Identification in Urban Scenarios: A Review of Tools and Methods

Hugo S. Oliveira, José J. M. Machado, João Manuel R. S. Tavares
2021 Applied Sciences  
networks has attracted particular attention in computer vision and pattern recognition communities.  ...  With the widespread use of surveillance image cameras and enhanced awareness of public security, objects, and persons Re-Identification (ReID), the task of recognizing objects in non-overlapping camera  ...  FCN-rLSTM enables refined feature representation and a new end-to-end trainable mapping from pixels to vehicle count.  ... 
doi:10.3390/app112210809 fatcat:7eruemifjfaw5bi3dfnpuc6ttm

Congested Crowd Counting via Adaptive Multi-Scale Context Learning

Yani Zhang, Huailin Zhao, Zuodong Duan, Liangjun Huang, Jiahao Deng, Qing Zhang
2021 Sensors  
In this paper, we propose a novel congested crowd counting network for crowd density estimation, i.e., the Adaptive Multi-scale Context Aggregation Network (MSCANet).  ...  Employing multiple MSCAs in a cascaded manner, the MSCANet can deeply utilize the spatial context information and modulate preliminary features into more distinguishing and scale-sensitive features, which  ...  [63] proposed a Maximum Excess over Pixels loss to learn spatial-aware crowd features.  ... 
doi:10.3390/s21113777 pmid:34072408 fatcat:sben5kwjqnbsbmdiqvg3lcbl5y

Vehicle Instance Segmentation From Aerial Image and Video Using a Multitask Learning Residual Fully Convolutional Network

Lichao Mou, Xiao Xiang Zhu
2018 IEEE Transactions on Geoscience and Remote Sensing  
In contrast, vehicle detection and semantic segmentation each only concern one of the two. We propose to tackle this problem with a semantic boundary-aware multi-task learning network.  ...  Then, based on this network architecture, we propose a unified multi-task learning network that can simultaneously learn two complementary tasks, namely, segmenting vehicle regions and detecting semantic  ...  This is rooted in the loss of spatial details caused by max-pooling layers (downsampling) along with the feature abstraction.  ... 
doi:10.1109/tgrs.2018.2841808 fatcat:qrk42qlhsrglriju33enxtefum

An Efficient Module for Instance Segmentation Based on Multi-Level Features and Attention Mechanisms

Yingchun Sun, Wang Gao, Shuguo Pan, Tao Zhao, Yahui Peng
2021 Applied Sciences  
Firstly, we adopt a convolutional block attention module (CBAM) into feature extraction, and sequentially generate attention maps which focus on instance-related features along the channel and spatial  ...  In order to solve the problem, an attention-based feature pyramid module (AFPM) is proposed, which integrates the attention mechanism on the basis of a multi-level feature pyramid network to efficiently  ...  In the SOLOv2 network, the total loss function consists of category loss and mask loss.  ... 
doi:10.3390/app11030968 fatcat:fv3w5ss735dxth3waxbv322qia

Spatio-Contextual Deep Network Based Multimodal Pedestrian Detection For Autonomous Driving [article]

Kinjal Dasgupta, Arindam Das, Sudip Das, Ujjwal Bhattacharya, Senthil Yogamani
2022 arXiv   pre-print
The output of the last feature fusion unit of MuFEm is subsequently passed to two CRFs for their spatial refinement.  ...  Further enhancement of the features is achieved by applying channel-wise attention and extraction of contextual information with the help of four RNNs traversing in four different directions.  ...  Mask and Predict [51] is a specific strategy in curriculum learning where the pedestrian box is progressively masked, and the network is expected to predict the boxes with the visible and masked regions  ... 
arXiv:2105.12713v3 fatcat:2x3qtaupo5euvio2wrjv4dvppu

Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation [article]

Seokju Lee, Francois Rameau, Fei Pan, In So Kweon
2021 arXiv   pre-print
First, we propose a two-stage projection pipeline to explicitly disentangle the camera ego-motion and the object motions with dynamics attention module, called DAM.  ...  Specifically, we design an integrated motion model that estimates the motion of the camera and object in the first and second warping stages, respectively, controlled by the attention module through a  ...  To be specific, we first squeeze the channel dimension with two 1 × 1 conv layers and generate a spatial attention map via softmax along the spatial dimension.  ... 
arXiv:2110.06853v1 fatcat:tiigihspmjbltlgbefpqxq7nsi

Dual Convolutional LSTM Network for Referring Image Segmentation

Linwei Ye, Zhi Liu, Yang Wang
2020 IEEE transactions on multimedia  
Our model consists of an encoder network and a decoder network, where ConvLSTM is used in both encoder and decoder networks to capture spatial and sequential information.  ...  The decoder network integrates the features generated by the encoder network at multiple levels as its input and produces the final precise segmentation mask.  ...  networks with LSTM for vehicle counting [30] .  ... 
doi:10.1109/tmm.2020.2971171 fatcat:yrrgbixnnvgbjncngwpcpwj4qu

2021 Index IEEE Transactions on Multimedia Vol. 23

2021 IEEE transactions on multimedia  
The primary entry includes the coauthors' names, the title of the paper or other item, and its location, specified by the publication abbreviation, year, month, and inclusive pagination.  ...  The Subject Index contains entries describing the item under all appropriate subject headings, plus the first author's name, the publication abbreviation, month, and year, and inclusive pages.  ...  ., +, TMM 2021 1343-1353 Learning Crisp Boundaries Using Deep Refinement Network and Adaptive Weighting Loss.  ... 
doi:10.1109/tmm.2022.3141947 fatcat:lil2nf3vd5ehbfgtslulu7y3lq

A Survey of Vehicle Re-identification Based on Deep Learning

Hongbo Wang, Jiaying Hou, Na Chen
2019 IEEE Access  
learning, methods based on unsupervised learning, and methods based on attention mechanism.  ...  With the rapid development of deep learning, vehicle re-identification technologies have made significant progress in recent years.  ...  [110] proposed a Spatial and Channel Attention Network (SCAN) based on DCNN, the attention model contained a spatial attention branch and a channel attention branch, the two branches adjusted the weights  ... 
doi:10.1109/access.2019.2956172 fatcat:gxzry6py4bhrjnb2qxqtu4r27u

Surround-View Cameras based Holistic Visual Perception for Automated Driving [article]

Varun Ravi Kumar
2022 arXiv   pre-print
Near-field visual perception in the context of self-driving cars can perceive the environment in a range of 0-10 meters and 360 coverage around the vehicle.  ...  Recent advances in computer vision and deep learning, in conjunction with high-quality sensors such as cameras and LiDARs, have fueled mature visual perception solutions.  ...  Acknowledgements First and foremost, I would like to express my sincere gratitude to my Ph.D. advisor Acknowledgements xvi  ... 
arXiv:2206.05542v1 fatcat:cdpn6afpvvf7hnsvry7cqbjq3u

2021 Index IEEE Transactions on Image Processing Vol. 30

2021 IEEE Transactions on Image Processing  
The primary entry includes the coauthors' names, the title of the paper or other item, and its location, specified by the publication abbreviation, year, month, and inclusive pagination.  ...  The Subject Index contains entries describing the item under all appropriate subject headings, plus the first author's name, the publication abbreviation, month, and year, and inclusive pages.  ...  ., +, TIP 2021 8686-8701 Stimuli-Aware Visual Emotion Analysis. Yang, J., +, TIP 2021 7432-7445 Encoding Image Inpainting by End-to-End Cascaded Refinement With Mask Aware-ness.  ... 
doi:10.1109/tip.2022.3142569 fatcat:z26yhwuecbgrnb2czhwjlf73qu

Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering [article]

Yang Liu, Guanbin Li, Liang Lin
2022 arXiv   pre-print
based on the dominant visual evidence and the correct question intention.  ...  To discover the fine-grained interactions between linguistic semantics and spatial-temporal representations, we build a novel Spatial-Temporal Transformer (STT) that builds the multi-modal co-occurrence  ...  [38] , [39] released a large-scale VideoQA dataset named TGIF-QA and proposed a dual-LSTM based method with both spatial and temporal attention.  ... 
arXiv:2207.12647v2 fatcat:rkwil7hyx5dytfcsiwunapg5qq

FCAU-Net for the Semantic Segmentation of Fine-Resolution Remotely Sensed Images

Xuerui Niu, Qiaolin Zeng, Xiaobo Luo, Liangfu Chen
2022 Remote Sensing  
In this paper, we incorporate a coordinate attention (CA) mechanism, adopt an asymmetric convolution block (ACB), and design a refinement fusion block (RFB), forming a network named the fusion coordinate  ...  and asymmetry-based U-Net (FCAU-Net).  ...  a refinement fusion block (RFB), forming a network named the fusion coordinate and asymmetry-based U-Net (FCAU-Net).  ... 
doi:10.3390/rs14010215 fatcat:gfnv5kbdk5a4tepwcwd7tmiihe
« Previous Showing results 1 — 15 out of 4,259 results