983 Hits in 1.1 sec

Attributes in Multiple Facial Images [article]

Xudong Liu, Guodong Guo
2018 arXiv   pre-print
Facial attribute recognition is conventionally computed from a single image. In practice, each subject may have multiple face images. Taking the eye size as an example, it should not change, but it may have different estimation in multiple images, which would make a negative impact on face recognition. Thus, how to compute these attributes corresponding to each subject rather than each single image is a profound work. To address this question, we deploy deep training for facial attributes
more » ... tion, and we explore the inconsistency issue among the attributes computed from each single image. Then, we develop two approaches to address the inconsistency issue. Experimental results show that the proposed methods can handle facial attribute estimation on either multiple still images or video frames, and can correct the incorrectly annotated labels. The experiments are conducted on two large public databases with annotations of facial attributes.
arXiv:1805.09203v1 fatcat:3gs3hvjh3nezbis3e3zmlqdscq

Face Detection on Surveillance Images [article]

Mohammad Iqbal Nouyed, Guodong Guo
2019 arXiv   pre-print
In last few decades, a lot of progress has been made in the field of face detection. Various face detection methods have been proposed by numerous researchers working in this area. The two well-known benchmarking platform: the FDDB and WIDER face detection provide quite challenging scenarios to assess the efficacy of the detection methods. These benchmarking data sets are mostly created using images from the public network ie. the Internet. A recent, face detection and open-set recognition
more » ... enge has shown that those same face detection algorithms produce high false alarms for images taken in surveillance scenario. This shows the difficult nature of the surveillance environment. Our proposed body pose based face detection method was one of the top performers in this competition. In this paper, we perform a comparative performance analysis of some of the well known face detection methods including the few used in that competition, and, compare them to our proposed body pose based face detection method. Experiment results show that, our proposed method that leverages body information to detect faces, is the most realistic approach in terms of accuracy, false alarms and average detection time, when surveillance scenario is in consideration.
arXiv:1910.11121v1 fatcat:df4sevu5dbfpxclohlohjmcr2u

EAN: Event Adaptive Network for Enhanced Action Recognition [article]

Yuan Tian, Yichao Yan, Xiongkuo Min, Guo Lu, Guangtao Zhai, Guodong Guo, Zhiyong Gao
2021 arXiv   pre-print
Efficiently modeling spatial-temporal information in videos is crucial for action recognition. To achieve this goal, state-of-the-art methods typically employ the convolution operator and the dense interaction modules such as non-local blocks. However, these methods cannot accurately fit the diverse events in videos. On the one hand, the adopted convolutions are with fixed scales, thus struggling with events of various scales. On the other hand, the dense interaction modeling paradigm only
more » ... ves sub-optimal performance as action-irrelevant parts bring additional noises for the final prediction. In this paper, we propose a unified action recognition framework to investigate the dynamic nature of video content by introducing the following designs. First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events. Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer, which yields a sparse paradigm. We call the proposed framework as Event Adaptive Network (EAN) because both key designs are adaptive to the input video content. To exploit the short-term motions within local segments, we propose a novel and efficient Latent Motion Code (LMC) module, further improving the performance of the framework. Extensive experiments on several large-scale video datasets, e.g., Something-to-Something V1&V2, Kinetics, and Diving48, verify that our models achieve state-of-the-art or competitive performances at low FLOPs. Codes are available at:
arXiv:2107.10771v1 fatcat:ywxhs7sbw5habbaokadhq6lpdy

Self-supervised Video Object Segmentation [article]

Fangrui Zhu, Li Zhang, Yanwei Fu, Guodong Guo, Weidi Xie
2020 arXiv   pre-print
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking). We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching, which resolves the challenge caused by the dis-appearance and reappearance of objects; (ii) by augmenting the self-supervised approach with an
more » ... nline adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity, e.g. occlusions or dis-occlusions, fast motions; (iii) we explore the efficiency of self-supervised representation learning for dense tracking, surprisingly, we show that a powerful tracking model can be trained with as few as 100 raw video clips (equivalent to a duration of 11mins), indicating that low-level statistics have already been effective for tracking tasks; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube-VOS, as well as surpassing most of methods trained with millions of manual segmentation annotations, further bridging the gap between self-supervised and supervised learning. Codes are released to foster any further research (
arXiv:2006.12480v1 fatcat:eeivxrmcrbdyfbcuynxmrn4xhm

Leaf senescence: progression, regulation, and application

Yongfeng Guo, Guodong Ren, Kewei Zhang, Zhonghai Li, Ying Miao, Hongwei Guo
2021 Molecular Horticulture  
and Gan 2005; Lim et al. 2007; Guo and Gan 2012) (Fig. 3) .  ...  However, each factor does not work independently, but has mutual promotion or inhibition (Guo and Gan 2012) .  ... 
doi:10.1186/s43897-021-00006-9 fatcat:tuqr7nfxjrewbbqgwolkdylfky

Face Cyclographs for Recognition [chapter]

Guodong Guo, Charles R. Dyer
2007 Information Sciences 2007  
A new representation of faces, called face cyclographs, is introduced for face recognition that incorporates all views of a rotating face into a single image. The main motivation for this representation comes from recent psychophysical studies that show that humans use continuous image sequences in object recognition. Face cyclographs are created by slicing spatiotemporal face volumes that are constructed automatically based on real-time face detection. This representation is a compact,
more » ... spective, spatiotemporal description. To use face cyclographs for recognition, a dynamic programming based algorithm is developed. The motion trajectory image of the eye slice is used to analyze the approximate single-axis motion and normalize the face cyclographs. Using normalized face cyclographs can speed up the matching process. Experimental results on more than 100 face videos show that this representation efficiently encodes the continuous views of faces.
doi:10.1142/9789812709677_0129 fatcat:xlmakb67kfc5dj5uhsqlo2bfki

Strong Interaction between Surface Plasmons and Chiral Molecules [article]

Yangzhe Guo, Guodong Zhu, Yurui Fang
2020 arXiv   pre-print
In plasmonic chirality, the phenomenon of circular dichroism for achiral nanoparitcles caused by Coulomb interaction between metal nanoparticles (NPs) and chiral molecules have been studied. At the same time, under the resonance condition, the dye molecules and metal NPs will produce huge Rabi splitting due to strong coupling. If the chiral molecules are at the resonance of the plasmon, what will happen for the strong interaction between the plasmon and molecules with chirality introduced? In
more » ... is paper, we investigate a spherical core-shell model and analyze its spectral phenomena under the excitation of circularly polarized light (CPL). Based on Coulomb interaction between NPs and chiral molecules, we will show how the various factors affect the strong coupling. We have obtained three mechanisms for the interaction between plasmons and chiral molecules: strong coupling (Rabi splitting up to 243mev), enhanced absorption and induced transparency. The interaction between CPL and chiral molecules with the opposite chirality to CPL is stronger than that of the same chirality, and the line width of the two peaks is closer than that of the same chirality, which shows that for the Rabi splitting with chirality, there are deeper mechanisms for the interaction. This result will be helpful for further research on the interaction between plasmon and molecules with chirality.
arXiv:2009.01518v1 fatcat:7wvoj2sa7bev3byhovmp24yjvm

Spectral mesh deformation

Guodong Rong, Yan Cao, Xiaohu Guo
2008 The Visual Computer  
In this paper, we present a novel spectral method for mesh deformation based on manifold harmonics transform. The eigenfunctions of the Laplace-Beltrami operator give orthogonal bases for parameterizing the space of functions defined on the surfaces. The geometry and motion of the original irregular meshes can be compactly encoded using the low-frequency spectrum of the manifold harmonics. Using the spectral method, the size of the linear deformation system can be significantly reduced to
more » ... e interactive computational speed for manipulating large triangle meshes. Our experimental results demonstrate that only a small spectrum is needed to achieve undistinguishable deformations for large triangle meshes. The spectral mesh deformation approach shows great performance improvement on computational speed over its spatial counterparts.
doi:10.1007/s00371-008-0260-x fatcat:h2jpcvnrenephg3fwsggt73qom

TransFER: Learning Relation-aware Facial Expression Representations with Transformers [article]

Fanglei Xue, Qiangchang Wang, Guodong Guo
2021 arXiv   pre-print
Facial expression recognition (FER) has received increasing interest in computer vision. We propose the TransFER model which can learn rich relation-aware local representations. It mainly consists of three components: Multi-Attention Dropping (MAD), ViT-FER, and Multi-head Self-Attention Dropping (MSAD). First, local patches play an important role in distinguishing various expressions, however, few existing works can locate discriminative and diverse local patches. This can cause serious
more » ... s when some patches are invisible due to pose variations or viewpoint changes. To address this issue, the MAD is proposed to randomly drop an attention map. Consequently, models are pushed to explore diverse local patches adaptively. Second, to build rich relations between different local patches, the Vision Transformers (ViT) are used in FER, called ViT-FER. Since the global scope is used to reinforce each local patch, a better representation is obtained to boost the FER performance. Thirdly, the multi-head self-attention allows ViT to jointly attend to features from different information subspaces at different positions. Given no explicit guidance, however, multiple self-attentions may extract similar relations. To address this, the MSAD is proposed to randomly drop one self-attention module. As a result, models are forced to learn rich relations among diverse local patches. Our proposed TransFER model outperforms the state-of-the-art methods on several FER benchmarks, showing its effectiveness and usefulness.
arXiv:2108.11116v1 fatcat:q2ite65cuzgolnmifpbxbjdgge

LAE : Long-tailed Age Estimation [article]

Zenghao Bao, Zichang Tan, Yu Zhu, Jun Wan, Xibo Ma, Zhen Lei, Guodong Guo
2021 arXiv   pre-print
Facial age estimation is an important yet very challenging problem in computer vision. To improve the performance of facial age estimation, we first formulate a simple standard baseline and build a much strong one by collecting the tricks in pre-training, data augmentation, model architecture, and so on. Compared with the standard baseline, the proposed one significantly decreases the estimation errors. Moreover, long-tailed recognition has been an important topic in facial age datasets, where
more » ... he samples often lack on the elderly and children. To train a balanced age estimator, we propose a two-stage training method named Long-tailed Age Estimation (LAE), which decouples the learning procedure into representation learning and classification. The effectiveness of our approach has been demonstrated on the dataset provided by organizers of Guess The Age Contest 2021.
arXiv:2110.12741v1 fatcat:znvfprjsgfdgfahxgg5imdorbe

Bayesian Optimized 1-Bit CNNs [article]

Jiaxin Gu, Junhe Zhao, Xiaolong Jiang, Baochang Zhang, Jianzhuang Liu, Guodong Guo, Rongrong Ji
2019 arXiv   pre-print
Deep convolutional neural networks (DCNNs) have dominated the recent developments in computer vision through making various record-breaking models. However, it is still a great challenge to achieve powerful DCNNs in resource-limited environments, such as on embedded devices and smart phones. Researchers have realized that 1-bit CNNs can be one feasible solution to resolve the issue; however, they are baffled by the inferior performance compared to the full-precision DCNNs. In this paper, we
more » ... ose a novel approach, called Bayesian optimized 1-bit CNNs (denoted as BONNs), taking the advantage of Bayesian learning, a well-established strategy for hard problems, to significantly improve the performance of extreme 1-bit CNNs. We incorporate the prior distributions of full-precision kernels and features into the Bayesian framework to construct 1-bit CNNs in an end-to-end manner, which have not been considered in any previous related methods. The Bayesian losses are achieved with a theoretical support to optimize the network simultaneously in both continuous and discrete spaces, aggregating different losses jointly to improve the model capacity. Extensive experiments on the ImageNet and CIFAR datasets show that BONNs achieve the best classification performance compared to state-of-the-art 1-bit CNNs.
arXiv:1908.06314v1 fatcat:m6kppdpzmnb3vdoq3u2ksmvf4y

Self-Conditioned Probabilistic Learning of Video Rescaling [article]

Yuan Tian, Guo Lu, Xiongkuo Min, Zhaohui Che, Guangtao Zhai, Guodong Guo, Zhiyong Gao
2021 arXiv   pre-print
Bicubic downscaling is a prevalent technique used to reduce the video storage burden or to accelerate the downstream processing speed. However, the inverse upscaling step is non-trivial, and the downscaled video may also deteriorate the performance of downstream tasks. In this paper, we propose a self-conditioned probabilistic framework for video rescaling to learn the paired downscaling and upscaling procedures simultaneously. During the training, we decrease the entropy of the information
more » ... in the downscaling by maximizing its probability conditioned on the strong spatial-temporal prior information within the downscaled video. After optimization, the downscaled video by our framework preserves more meaningful information, which is beneficial for both the upscaling step and the downstream tasks, e.g., video action recognition task. We further extend the framework to a lossy video compression system, in which a gradient estimator for non-differential industrial lossy codecs is proposed for the end-to-end training of the whole system. Extensive experimental results demonstrate the superiority of our approach on video rescaling, video compression, and efficient action recognition tasks.
arXiv:2107.11639v2 fatcat:k7g4ewgwhbatzlyk2z26pub2j4

Adversarial Attacks against Deep Saliency Models [article]

Zhaohui Che, Ali Borji, Guangtao Zhai, Suiyi Ling, Guodong Guo, Patrick Le Callet
2019 arXiv   pre-print
Currently, a plethora of saliency models based on deep neural networks have led great breakthroughs in many complex high-level vision tasks (e.g. scene description, object detection). The robustness of these models, however, has not yet been studied. In this paper, we propose a sparse feature-space adversarial attack method against deep saliency models for the first time. The proposed attack only requires a part of the model information, and is able to generate a sparser and more insidious
more » ... sarial perturbation, compared to traditional image-space attacks. These adversarial perturbations are so subtle that a human observer cannot notice their presences, but the model outputs will be revolutionized. This phenomenon raises security threats to deep saliency models in practical applications. We also explore some intriguing properties of the feature-space attack, e.g. 1) the hidden layers with bigger receptive fields generate sparser perturbations, 2) the deeper hidden layers achieve higher attack success rates, and 3) different loss functions and different attacked layers will result in diverse perturbations. Experiments indicate that the proposed method is able to successfully attack different model architectures across various image scenes.
arXiv:1904.01231v1 fatcat:i2rlrymxcbh7hb4koqcq3tadxm

Cogradient Descent for Bilinear Optimization [article]

Li'an Zhuo, Baochang Zhang, Linlin Yang, Hanlin Chen, Qixiang Ye, David Doermann, Guodong Guo, Rongrong Ji
2020 arXiv   pre-print
Conventional learning methods simplify the bilinear model by regarding two intrinsically coupled factors independently, which degrades the optimization procedure. One reason lies in the insufficient training due to the asynchronous gradient descent, which results in vanishing gradients for the coupled variables. In this paper, we introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem, based on a theoretical framework to coordinate the gradient of hidden variables via a
more » ... rojection function. We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent to facilitate the optimization procedure. Our algorithm is applied to solve problems with one variable under the sparsity constraint, which is widely used in the learning paradigm. We validate our CoGD considering an extensive set of applications including image reconstruction, inpainting, and network pruning. Experiments show that it improves the state-of-the-art by a significant margin.
arXiv:2006.09142v1 fatcat:h2vvyxczhratlp7jvzictfukti

CATrans: Context and Affinity Transformer for Few-Shot Segmentation [article]

Shan Zhang, Tianyi Wu, Sitong Wu, Guodong Guo
2022 arXiv   pre-print
Few-shot segmentation (FSS) aims to segment novel categories given scarce annotated support images. The crux of FSS is how to aggregate dense correlations between support and query images for query segmentation while being robust to the large variations in appearance and context. To this end, previous Transformer-based methods explore global consensus either on context similarity or affinity map between support-query pairs. In this work, we effectively integrate the context and affinity
more » ... ion via the proposed novel Context and Affinity Transformer (CATrans) in a hierarchical architecture. Specifically, the Relation-guided Context Transformer (RCT) propagates context information from support to query images conditioned on more informative support features. Based on the observation that a huge feature distinction between support and query pairs brings barriers for context knowledge transfer, the Relation-guided Affinity Transformer (RAT) measures attention-aware affinity as auxiliary information for FSS, in which the self-affinity is responsible for more reliable cross-affinity. We conduct experiments to demonstrate the effectiveness of the proposed model, outperforming the state-of-the-art methods.
arXiv:2204.12817v1 fatcat:n3fghwxvpramxjcul74kxe2mfi
« Previous Showing results 1 — 15 out of 983 results