66 Hits in 1.8 sec

GAN Inversion: A Survey [article]

Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, Ming-Hsuan Yang
2022 arXiv   pre-print
GAN inversion aims to invert a given image back into the latent space of a pretrained GAN model, for the image to be faithfully reconstructed from the inverted code by the generator.  ...  As an emerging technique to bridge the real and fake image domains, GAN inversion plays an essential role in enabling the pretrained GAN models such as StyleGAN and BigGAN to be used for real image editing  ...  Similar to GAN inversion, some tasks also aim to learn the inverse mapping of GAN models.  ... 
arXiv:2101.05278v5 fatcat:ff3evb2nv5ezzaxju2cucbizde

ActFormer: A GAN Transformer Framework towards General Action-Conditioned 3D Human Motion Generation [article]

Ziyang Song, Dongliang Wang, Nan Jiang, Zhicheng Fang, Chenjing Ding, Weihao Gan, Wei Wu
2022 arXiv   pre-print
Our approach consists of a powerful Action-conditioned motion transFormer (ActFormer) under a GAN training scheme, equipped with a Gaussian Process latent prior.  ...  We present a GAN Transformer framework for general action-conditioned 3D human motion generation, including not only single-person actions but also multi-person interactive actions.  ...  Transformer in GANs The success of Transformers [46] in visual recognition tasks inspires its application in generation tasks. Trans-GAN [26] is the first pure Transformer-based GAN architecture.  ... 
arXiv:2203.07706v1 fatcat:cvwttdq7jzgnpjmc45mwjuvtw4

Learning Statistical Texture for Semantic Segmentation [article]

Lanyun Zhu, Deyi Ji, Shiping Zhu, Weihao Gan, Wei Wu, Junjie Yan
2021 arXiv   pre-print
Existing semantic segmentation works mainly focus on learning the contextual information in high-level semantic features with CNNs. In order to maintain a precise boundary, low-level texture features are directly skip-connected into the deeper layers. Nevertheless, texture features are not only about local structure, but also include global statistical knowledge of the input image. In this paper, we fully take advantages of the low-level texture features and propose a novel Statistical Texture
more » ... earning Network (STLNet) for semantic segmentation. For the first time, STLNet analyzes the distribution of low level information and efficiently utilizes them for the task. Specifically, a novel Quantization and Counting Operator (QCO) is designed to describe the texture information in a statistical manner. Based on QCO, two modules are introduced: (1) Texture Enhance Module (TEM), to capture texture-related information and enhance the texture details; (2) Pyramid Texture Feature Extraction Module (PTFEM), to effectively extract the statistical texture features from multiple scales. Through extensive experiments, we show that the proposed STLNet achieves state-of-the-art performance on three semantic segmentation benchmarks: Cityscapes, PASCAL Context and ADE20K.
arXiv:2103.04133v1 fatcat:75vtx6zblvghdjywbl3b7kshmm

Hierarchical Feature Embedding for Attribute Recognition [article]

Jie Yang, Jiarou Fan, Yiru Wang, Yige Wang, Weihao Gan, Lin Liu, Wei Wu
2020 arXiv   pre-print
Attribute recognition is a crucial but challenging task due to viewpoint changes, illumination variations and appearance diversities, etc. Most of previous work only consider the attribute-level feature embedding, which might perform poorly in complicated heterogeneous conditions. To address this problem, we propose a hierarchical feature embedding (HFE) framework, which learns a fine-grained feature embedding by combining attribute and ID information. In HFE, we maintain the inter-class and
more » ... ra-class feature embedding simultaneously. Not only samples with the same attribute but also samples with the same ID are gathered more closely, which could restrict the feature embedding of visually hard samples with regard to attributes and improve the robustness to variant conditions. We establish this hierarchical structure by utilizing HFE loss consisted of attribute-level and ID-level constraints. We also introduce an absolute boundary regularization and a dynamic loss weight as supplementary components to help build up the feature embedding. Experiments show that our method achieves the state-of-the-art results on two pedestrian attribute datasets and a facial attribute dataset.
arXiv:2005.11576v1 fatcat:ritcfns4pnd5hfc7yq5ct5uyke

Class-wise Dynamic Graph Convolution for Semantic Segmentation [article]

Hanzhe Hu, Deyi Ji, Weihao Gan, Shuai Bai, Wei Wu, Junjie Yan
2020 arXiv   pre-print
Recent works have made great progress in semantic segmentation by exploiting contextual information in a local or global manner with dilated convolutions, pyramid pooling or self-attention mechanism. In order to avoid potential misleading contextual information aggregation in previous works, we propose a class-wise dynamic graph convolution (CDGC) module to adaptively propagate information. The graph reasoning is performed among pixels in the same class. Based on the proposed CDGC module, we
more » ... ther introduce the Class-wise Dynamic Graph Convolution Network(CDGCNet), which consists of two main parts including the CDGC module and a basic segmentation network, forming a coarse-to-fine paradigm. Specifically, the CDGC module takes the coarse segmentation result as class mask to extract node features for graph construction and performs dynamic graph convolutions on the constructed graph to learn the feature aggregation and weight allocation. Then the refined feature and the original feature are fused to get the final prediction. We conduct extensive experiments on three popular semantic segmentation benchmarks including Cityscapes, PASCAL VOC 2012 and COCO Stuff, and achieve state-of-the-art performance on all three benchmarks.
arXiv:2007.09690v1 fatcat:2nkmvl73dzcarnlp6kogcwvevu

Dynamic Curriculum Learning for Imbalanced Data Classification [article]

Yiru Wang, Weihao Gan, Jie Yang, Wei Wu, Junjie Yan
2019 arXiv   pre-print
Human attribute analysis is a challenging task in the field of computer vision, since the data is largely imbalance-distributed. Common techniques such as re-sampling and cost-sensitive learning require prior-knowledge to train the system. To address this problem, we propose a unified framework called Dynamic Curriculum Learning (DCL) to online adaptively adjust the sampling strategy and loss learning in single batch, which resulting in better generalization and discrimination. Inspired by the
more » ... urriculum learning, DCL consists of two level curriculum schedulers: (1) sampling scheduler not only manages the data distribution from imbalanced to balanced but also from easy to hard; (2) loss scheduler controls the learning importance between classification and metric learning loss. Learning from these two schedulers, we demonstrate our DCL framework with the new state-of-the-art performance on the widely used face attribute dataset CelebA and pedestrian attribute dataset RAP.
arXiv:1901.06783v2 fatcat:7mdcol5jjvhtzdnzf5ls7fvwdm

SAMOT: Switcher-Aware Multi-Object Tracking and Still Another MOT Measure [article]

Weitao Feng, Zhihao Hu, Baopu Li, Weihao Gan, Wei Wu, Wanli Ouyang
2020 arXiv   pre-print
Multi-Object Tracking (MOT) is a popular topic in computer vision. However, identity issue, i.e., an object is wrongly associated with another object of a different identity, still remains to be a challenging problem. To address it, switchers, i.e., confusing targets thatmay cause identity issues, should be focused. Based on this motivation,this paper proposes a novel switcher-aware framework for multi-object tracking, which consists of Spatial Conflict Graph model (SCG) and Switcher-Aware
more » ... iation (SAA). The SCG eliminates spatial switch-ers within one frame by building a conflict graph and working out the optimal subgraph. The SAA utilizes additional information from potential temporal switcher across frames, enabling more accurate data association. Besides, we propose a new MOT evaluation measure, Still Another IDF score (SAIDF), aiming to focus more on identity issues.This new measure may overcome some problems of the previous measures and provide a better insight for identity issues in MOT. Finally,the proposed framework is tested under both the traditional measures and the new measure we proposed. Extensive experiments show that ourmethod achieves competitive results on all measure.
arXiv:2009.10338v1 fatcat:hnkopwfqgjcbjh2byzujjytjpq

STM: SpatioTemporal and Motion Encoding for Action Recognition [article]

Boyuan Jiang, Mengmeng Wang, Weihao Gan, Wei Wu, Junjie Yan
2019 arXiv   pre-print
Spatiotemporal and motion features are two complementary and crucial information for video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn spatiotemporal features and another flow stream to learn motion features. In this work, we aim to efficiently encode these two features in a unified 2D framework. To this end, we first propose an STM block, which contains a Channel-wise SpatioTemporal Module (CSTM) to present the spatiotemporal features and a Channel-wise
more » ... tion Module (CMM) to efficiently encode motion features. We then replace original residual blocks in the ResNet architecture with STM blcoks to form a simple yet effective STM network by introducing very limited extra computation cost. Extensive experiments demonstrate that the proposed STM network outperforms the state-of-the-art methods on both temporal-related datasets (i.e., Something-Something v1 & v2 and Jester) and scene-related datasets (i.e., Kinetics-400, UCF-101, and HMDB-51) with the help of encoding spatiotemporal and motion features together.
arXiv:1908.02486v2 fatcat:fkdgd3czuveknh2ywenxzexxee

Cross Domain Object Detection by Target-Perceived Dual Branch Distillation [article]

Mengzhe He, Yali Wang, Jiaxi Wu, Yiru Wang, Hanqing Li, Bo Li, Weihao Gan, Wei Wu, Yu Qiao
2022 arXiv   pre-print
Cross domain object detection is a realistic and challenging task in the wild. It suffers from performance degradation due to large shift of data distributions and lack of instance-level annotations in the target domain. Existing approaches mainly focus on either of these two difficulties, even though they are closely coupled in cross domain object detection. To solve this problem, we propose a novel Target-perceived Dual-branch Distillation (TDD) framework. By integrating detection branches of
more » ... both source and target domains in a unified teacher-student learning scheme, it can reduce domain shift and generate reliable supervision effectively. In particular, we first introduce a distinct Target Proposal Perceiver between two domains. It can adaptively enhance source detector to perceive objects in a target image, by leveraging target proposal contexts from iterative cross-attention. Afterwards, we design a concise Dual Branch Self Distillation strategy for model training, which can progressively integrate complementary object knowledge from different domains via self-distillation in two branches. Finally, we conduct extensive experiments on a number of widely-used scenarios in cross domain object detection. The results show that our TDD significantly outperforms the state-of-the-art methods on all the benchmarks. Our code and model will be available at
arXiv:2205.01291v1 fatcat:abd7bluierdkrnqfmpxhyzdmau

Collaborative Distillation in the Parameter and Spectrum Domains for Video Action Recognition [article]

Haisheng Su, Jing Su, Dongliang Wang, Weihao Gan, Wei Wu, Mengmeng Wang, Junjie Yan, Yu Qiao
2020 arXiv   pre-print
(Lin, Gan, and Han 2018) shift part of the channels along the temporal dimension to enable the temporal information transmission between neighboring frames. Jiang el al.  ... 
arXiv:2009.06902v1 fatcat:ekr4p5r3hvgu3mfszugamlmq6m

Dynamic Curriculum Learning for Imbalanced Data Classification

Yiru Wang, Weihao Gan, Jie Yang, Wei Wu, Junjie Yan
2019 2019 IEEE/CVF International Conference on Computer Vision (ICCV)  
Human attribute analysis is a challenging task in the field of computer vision. One of the significant difficulties is brought from largely imbalance-distributed data. Conventional techniques such as re-sampling and cost-sensitive learning require prior-knowledge to train the system. To address this problem, we propose a unified framework called Dynamic Curriculum Learning (DCL) to adaptively adjust the sampling strategy and loss weight in each batch, which results in better ability of
more » ... ation and discrimination. Inspired by curriculum learning, DCL consists of two-level curriculum schedulers: (1) sampling scheduler which manages the data distribution not only from imbalance to balance but also from easy to hard; (2) loss scheduler which controls the learning importance between classification and metric learning loss. With these two schedulers, we achieve state-of-the-art performance on the widely used face attribute dataset CelebA and pedestrian attribute dataset RAP.
doi:10.1109/iccv.2019.00512 dblp:conf/iccv/WangGYWY19 fatcat:kzthfmteorerdkbsg4h2ogtl24

Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection [article]

Jiaxi Wu, Jiaxin Chen, Mengzhe He, Yiru Wang, Bo Li, Bingqi Ma, Weihao Gan, Wei Wu, Yali Wang, Di Huang
2022 arXiv   pre-print
Domain adaptive object detection (DAOD) is a promising way to alleviate performance drop of detectors in new scenes. Albeit great effort made in single source domain adaptation, a more generalized task with multiple source domains remains not being well explored, due to knowledge degradation during their combination. To address this issue, we propose a novel approach, namely target-relevant knowledge preservation (TRKP), to unsupervised multi-source DAOD. Specifically, TRKP adopts the
more » ... udent framework, where the multi-head teacher network is built to extract knowledge from labeled source domains and guide the student network to learn detectors in unlabeled target domain. The teacher network is further equipped with an adversarial multi-source disentanglement (AMSD) module to preserve source domain-specific knowledge and simultaneously perform cross-domain alignment. Besides, a holistic target-relevant mining (HTRM) scheme is developed to re-weight the source images according to the source-target relevance. By this means, the teacher network is enforced to capture target-relevant knowledge, thus benefiting decreasing domain shift when mentoring object detection in the target domain. Extensive experiments are conducted on various widely used benchmarks with new state-of-the-art scores reported, highlighting the effectiveness.
arXiv:2204.07964v1 fatcat:f5wogrhjargn7fbg45wdboco4q

Challenges on Large Scale Surveillance Video Analysis

Weitao Feng, Deyi Ji, Yiru Wang, Shuorong Chang, Hansheng Ren, Weihao Gan
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)  
Large scale surveillance video analysis is one of the most important components in the future artificial intelligent city. It is a very challenging but practical system, consists of multiple functionalities such as object detection, tracking, identification and behavior analysis. In this paper, we try to address three tasks hosted in NVIDIA AI City Challenge contest. First, a system that transforming the image coordinate to world coordinate has been proposed, which is useful to estimate the
more » ... cle speed on the road. Second, anomalies like car crash event and stalled vehicles can be found by the proposed anomaly detector framework . Third, multiple camera vehicle re-identification problem has been investigated and a matching algorithm is explained. All these tasks are based on our proposed online single camera multiple object tracking (MOT) system, which has been evaluated on the widely used MOT16 challenge benchmark. We show that it achieves the best performance compared to the stateof-the-art methods. Besides of MOT, we evaluate the proposed vehicle re-identification model on VeRi-776 dataset and it outperforms all other methods with a large margin.
doi:10.1109/cvprw.2018.00017 dblp:conf/cvpr/FengJWCRG18 fatcat:k6rgabsprvdbzpzsnkkubee2mi

Transferable Knowledge-Based Multi-Granularity Aggregation Network for Temporal Action Localization: Submission to ActivityNet Challenge 2021 [article]

Haisheng Su, Peiqin Zhuang, Yukun Li, Dongliang Wang, Weihao Gan, Wei Wu, Yu Qiao
2021 arXiv   pre-print
This technical report presents an overview of our solution used in the submission to 2021 HACS Temporal Action Localization Challenge on both Supervised Learning Track and Weakly-Supervised Learning Track. Temporal Action Localization (TAL) requires to not only precisely locate the temporal boundaries of action instances, but also accurately classify the untrimmed videos into specific categories. However, Weakly-Supervised TAL indicates locating the action instances using only video-level class
more » ... labels. In this paper, to train a supervised temporal action localizer, we adopt Temporal Context Aggregation Network (TCANet) to generate high-quality action proposals through "local and global" temporal context aggregation and complementary as well as progressive boundary refinement. As for the WSTAL, a novel framework is proposed to handle the poor quality of CAS generated by simple classification network, which can only focus on local discriminative parts, rather than locate the entire interval of target actions. Further inspired by the transfer learning method, we also adopt an additional module to transfer the knowledge from trimmed videos (HACS Clips dataset) to untrimmed videos (HACS Segments dataset), aiming at promoting the classification performance on untrimmed videos. Finally, we employ a boundary regression module embedded with Outer-Inner-Contrastive (OIC) loss to automatically predict the boundaries based on the enhanced CAS. Our proposed scheme achieves 39.91 and 29.78 average mAP on the challenge testing set of supervised and weakly-supervised temporal action localization track respectively.
arXiv:2107.12618v1 fatcat:c5xm6eixyjfojobuegdam3duli

Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection [article]

Shoubin Yu, Zhongyin Zhao, Haoshu Fang, Andong Deng, Haisheng Su, Dongliang Wang, Weihao Gan, Cewu Lu, Wei Wu
2021 arXiv   pre-print
Anomaly detection in surveillance videos is challenging and important for ensuring public security. Different from pixel-based anomaly detection methods, pose-based methods utilize highly-structured skeleton data, which decreases the computational burden and also avoids the negative impact of background noise. However, unlike pixel-based methods, which could directly exploit explicit motion features such as optical flow, pose-based methods suffer from the lack of alternative dynamic
more » ... on. In this paper, a novel Motion Embedder (ME) is proposed to provide a pose motion representation from the probability perspective. Furthermore, a novel task-specific Spatial-Temporal Transformer (STT) is deployed for self-supervised pose sequence reconstruction. These two modules are then integrated into a unified framework for pose regularity learning, which is referred to as Motion Prior Regularity Learner (MoPRL). MoPRL achieves the state-of-the-art performance by an average improvement of 4.7% AUC on several challenging datasets. Extensive experiments validate the versatility of each proposed module.
arXiv:2112.03649v2 fatcat:wgdridft3zaenls7kts5badtbq
« Previous Showing results 1 — 15 out of 66 results