165 Hits in 8.6 sec

SDGMNet: Statistic-based Dynamic Gradient Modulation for Local Descriptor Learning [article]

Jiayi Ma, Yuxin Deng
2021 arXiv   pre-print
In this paper, we propose a dynamic gradient modulation, named SDGMNet, to improve triplet loss for local descriptor learning.  ...  of proportional Siamese pairs that are believed to reach the optimum; power adjustment balances the total weights of negative pairs and positive pairs.  ...  This kind of hardness can be similarly defined with relative distance. These principles for gradient modulation are so-called hard example mining (HEM). We briefly illustrate them in Fig. 1 .  ... 
arXiv:2106.04434v2 fatcat:twkaqhn5tfcmnawhwitcokoe7u

Siamese Object Tracking for Unmanned Aerial Vehicle: A Review and Comprehensive Analysis [article]

Changhong Fu, Kunhan Lu, Guangze Zheng, Junjie Ye, Ziang Cao, Bowen Li, Geng Lu
2022 arXiv   pre-print
As an emerging force in the revolutionary trend of deep learning, Siamese networks shine in UAV-based object tracking with their promising balance of accuracy, robustness, and speed.  ...  In the end, prospects for the development of Siamese tracking for UAV-based intelligent transportation systems are deeply discussed.  ...  ACKNOWLEDGEMENT This work is supported by the National Natural Science Foundation of China (No. 62173249) and the Natural Science Foundation of Shanghai (No. 20ZR1460100).  ... 
arXiv:2205.04281v2 fatcat:kaujdfb7ivdtxeiz44lu36oqum

VisEvent: Reliable Object Tracking via Collaboration of Frame and Event Flows [article]

Xiao Wang, Jianing Li, Lin Zhu, Zhipeng Zhang, Zhe Chen, Xin Li, Yaowei Wang, Yonghong Tian, Feng Wu
2022 arXiv   pre-print
Therefore, the two sensors can cooperate with each other to achieve more reliable object tracking.  ...  In this work, we propose a large-scale Visible-Event benchmark (termed VisEvent) due to the lack of a realistic and scaled dataset for this task.  ...  After that, the deep learning trackers, especially the Siamese network based trackers began to occupy the top positions of various benchmarks.  ... 
arXiv:2108.05015v3 fatcat:kf2zzppdyjd6ro7m67aimmyjku

Visual Search at Alibaba

Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Yingya Zhang, Xiaofeng Ren, Rong Jin
2018 Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining - KDD '18  
Also, we propose a deep CNN model for joint detection and feature learning by mining user click behavior.  ...  We take advantage of large image collection of Alibaba and state-of-the-art deep learning techniques to perform visual search at scale.  ...  To learn more effective and efficient representation, some works are designed for hard sample mining, which focuses on batch of samples that are considered hard.  ... 
doi:10.1145/3219819.3219820 dblp:conf/kdd/ZhangPZZZRJ18 fatcat:caa42i3xmzfxhiddso5sjq6czy

A Survey on Deep Learning Technique for Video Segmentation [article]

Wenguan Wang, Tianfei Zhou, Fatih Porikli, David Crandall, Luc Van Gool
2021 arXiv   pre-print
Recently, due to the renaissance of connectionism in computer vision, there has been an influx of deep learning based approaches for video segmentation that have delivered compelling performance.  ...  Finally, we point out a set of unsolved open issues in this field, and suggest possible opportunities for further research.  ...  For example, convolutional recurrent neural networks (RNNs) were used to learn spatial and temporal visual patterns jointly [84] , [95] .  ... 
arXiv:2107.01153v3 fatcat:nry4yjhq7zhtzbfh53wf7ie3um

Re-Identification in Urban Scenarios: A Review of Tools and Methods

Hugo S. Oliveira, José J. M. Machado, João Manuel R. S. Tavares
2021 Applied Sciences  
With the advent of Deep Neural Networks (DNN), there have been many proposals for different network architectures achieving high-performance levels.  ...  With the aim of identifying the most promising methods for ReID for future robust implementations, a review study is presented, mainly focusing on the person and multi-object ReID and auxiliary methods  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/app112210809 fatcat:7eruemifjfaw5bi3dfnpuc6ttm

All You Can Embed: Natural Language based Vehicle Retrieval with Spatio-Temporal Transformers [article]

Carmelo Scribano, Davide Sapienza, Giorgia Franchini, Micaela Verucchi, Marko Bertogna
2021 arXiv   pre-print
For the training of the retrieval model, a variation of the Triplet Margin Loss is proposed to learn a distance measure between the visual and language embeddings.  ...  The AI City Challenge Track 5 for Natural Language-Based Vehicle Retrieval focuses on the problem of combining visual and textual information, applied to a smart-city use case.  ...  Acknowledgments This work has been partially supported by the CINECA grant number HP10BSTS2W and by POR-FSE 2014-2020 funds of Emilia-Romagna region (Deliberazione di Giunta Regionale n. 255-30/03/2020  ... 
arXiv:2106.10153v1 fatcat:3dovt6ygkrcjffsq3ixcxaruau

Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events [article]

Weiyao Lin, Huabin Liu, Shizhan Liu, Yuxi Li, Rui Qian, Tao Wang, Ning Xu, Hongkai Xiong, Guo-Jun Qi, Nicu Sebe
2021 arXiv   pre-print
Along with the development of modern smart cities, human-centric video analysis has been encountering the challenge of analyzing diverse and complex events in real scenes.  ...  It contains a record number of poses (>1M), the largest number of action instances (>56k) under complex events, as well as one of the largest numbers of trajectories lasting for longer time (with an average  ...  The top-3 sequences with the highest CrowdIndex could be naturally regarded as relatively hard examples in the test set.  ... 
arXiv:2005.04490v5 fatcat:4yjayreakney3bztfnxjrc22ru

A Cross-camera Multi-face Tracking System Based on Double Triplet Networks

Guoyin Ren, Xiaoqi Lu, Yuhao Li
2021 IEEE Access  
Double Triplet Networks (DTN) designed in this study is used to learn the depth features of human face.  ...  DTN is trained on LFW data set, and the model trained can improve its recognition accuracy to 99.51% by Margin Sample Mining Loss (MSML) and Focal Loss hard sample equalization.  ...  It can be used to search for missing persons, including the elderly and children, and to track down suspects.  ... 
doi:10.1109/access.2021.3061572 fatcat:2zi3ga3ilvdp5lrnwqew65r6ai

Recent Advances of Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective

Wu Liu, Tao Mei
2022 ACM Computing Surveys  
Recently, benefiting from the deep learning technologies, a significant amount of research efforts have advanced the monocular human pose estimation both in 2D and 3D areas.  ...  Finally, we discuss the challenges and give deep thinking of promising directions for future research.  ...  For example, for pose estimation, the deep learning network is designed as a pose encoder followed by a pose decoder module.  ... 
doi:10.1145/3524497 fatcat:4pbvntngrnfp7lqhcpjmy7p2fq

Recent Advances in Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective [article]

Wu Liu, Qian Bao, Yu Sun, Tao Mei
2021 arXiv   pre-print
Finally, we discuss the challenges and give deep thinking of promising directions for future research.  ...  We believe this survey will provide the readers with a deep and insightful understanding of monocular human pose estimation.  ...  It also proposes an online hard keypoints mining (OHKM) loss to deal with hard keypoints. CPN achieves the 1st place in the COCO 2017 keypoint challenge.  ... 
arXiv:2104.11536v1 fatcat:tdag2jq2vjdrjekwukm5nu7l6a

Visual and Object Geo-localization: A Comprehensive Survey [article]

Daniel Wilson, Xiaohan Zhang, Waqas Sultani, Safwan Wshah
2021 arXiv   pre-print
As massive datasets of GPS tagged media have rapidly become available due to smartphones and the internet, and deep learning has risen to enhance the performance capabilities of machine learning models  ...  , the fields of visual and object geo-localization have emerged due to its significant impact on a wide range of applications such as augmented reality, robotics, self-driving vehicles, road maintenance  ...  With the development of CNNs [76] and Unlike most of the methods described in this section which siamese [72, 77, 73] networks, deep siamese-like networks  ... 
arXiv:2112.15202v1 fatcat:ipwas72ro5ho5fjiakm6de7ji4

VoxCeleb: Large-scale Speaker Verification in the Wild

Arsha Nagrani, Joon Son Chung, Weidi Xie, Andrew Zisserman
2019 Computer Speech and Language  
Our pipeline involves obtaining videos from YouTube; performing active speaker verification using a two-stream synchronization Convolutional Neural Network (CNN), and confirming the identity of the speaker  ...  Second, we develop and compare different CNN architectures with various aggregation methods and training loss functions that can effectively recognise identities from voice under various conditions.  ...  Fellowship in Machine Perception, Speech Technology and Computer Vision.  ... 
doi:10.1016/j.csl.2019.101027 fatcat:ih2gshb7pfhgdlnsx7hj3s7oka

Hard-Aware Deeply Cascaded Embedding [article]

Yuhui Yuan, Kuiyuan Yang, Chao Zhang
2017 arXiv   pre-print
Riding on the waves of deep neural networks, deep metric learning has also achieved promising results in various tasks using triplet network or Siamese network.  ...  This motivates us to ensemble a set of models with different complexities in cascaded manner and mine hard examples adaptively, a sample is judged by a series of models with increasing complexities and  ...  Only top 50 percent examples with larger loss are chosen as hard examples to update the model.  ... 
arXiv:1611.05720v2 fatcat:mgpxrhxksff4loe24pich7njmy

Fashion Meets Computer Vision: A Survey [article]

Wen-Huang Cheng, Sijie Song, Chieh-Yun Chen, Shintami Chusnul Hidayati, Jiaying Liu
2021 arXiv   pre-print
Fashion is the way we present ourselves to the world and has become one of the world's largest industries.  ...  For each task, the benchmark datasets and the evaluation protocols are summarized. Furthermore, we highlight promising directions for future research.  ...  For the same goal of learning the deep feature representation, Wang et al. [188] adopted a Siamese network that contained two copies of the Inception-6 network with shared weights.  ... 
arXiv:2003.13988v2 fatcat:ajzvyn4ck5gqxk5ht5u3mrdmba
« Previous Showing results 1 — 15 out of 165 results