Filters








1,204 Hits in 1.6 sec

Predicting Human Activities Using Stochastic Grammar [article]

Siyuan Qi, Siyuan Huang, Ping Wei, Song-Chun Zhu
2017 arXiv   pre-print
This paper presents a novel method to predict future human activities from partially observed RGB-D videos. Human activity prediction is generally difficult due to its non-Markovian property and the rich context between human and environments. We use a stochastic grammar model to capture the compositional structure of events, integrating human actions, objects, and their affordances. We represent the event by a spatial-temporal And-Or graph (ST-AOG). The ST-AOG is composed of a temporal
more » ... ic grammar defined on sub-activities, and spatial graphs representing sub-activities that consist of human actions, objects, and their affordances. Future sub-activities are predicted using the temporal grammar and Earley parsing algorithm. The corresponding action, object, and affordance labels are then inferred accordingly. Extensive experiments are conducted to show the effectiveness of our model on both semantic event parsing and future activity prediction.
arXiv:1708.00945v1 fatcat:vxhmp54o3bey3iefhxgpbqb4lm

Human-centric Indoor Scene Synthesis Using Stochastic Grammar [article]

Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, Song-Chun Zhu
2018 arXiv   pre-print
Supplementary Material for Human-centric Indoor Scene Synthesis Using Stochastic Grammar Siyuan Qi 1 Yixin Zhu 1 Siyuan Huang 1 Chenfanfu Jiang 2 Song-Chun Zhu 1 1 1 UCLA Center for Vision, Cognition,  ... 
arXiv:1808.08473v1 fatcat:rocsqcw7w5gevgwe5asxvkv3qq

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image [article]

Siyuan Huang, Siyuan Qi, Yixin Zhu, Yinxue Xiao, Yuanlu Xu, Song-Chun Zhu
2018 arXiv   pre-print
., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al.: Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015) 53.  ...  In: CVPR Workshop. 1,2 , Siyuan Qi 1,2 , Yixin Zhu 1,2 , Yinxue Xiao 1 , Yuanlu Xu 1,2 , and Song-Chun Zhu 1,University of California, Los Angeles 2 International Center for AI and Robot Autonomy (CARA  ... 
arXiv:1808.02201v1 fatcat:ulqbp66cnfbnxmqnu4np6enxji

PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points [article]

Siyuan Huang, Yixin Chen, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu
2019 arXiv   pre-print
In Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [38] Siyuan Huang, Siyuan Qi, Yixin Zhu, Yinxue Xiao, Yuanlu Xu, and Song-Chun Zhu.  ...  In Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [36] Siyuan Huang, Siyuan Qi, Yinxue Xiao, Yixin Zhu, Ying Nian Wu, and Song-Chun Zhu.  ... 
arXiv:1912.07744v1 fatcat:6v6rh2uiunfwhbsokfnbcws43m

Image Set Querying Based Localization [article]

Lei Deng, Siyuan Huang, Yueqi Duan, Baohua Chen, Jie Zhou
2015 arXiv   pre-print
Conventional single image based localization methods usually fail to localize a querying image when there exist large variations between the querying image and the pre-built scene. To address this, we propose an image-set querying based localization approach. When the localization by a single image fails to work, the system will ask the user to capture more auxiliary images. First, a local 3D model is established for the querying image set. Then, the pose of the querying image set is estimated
more » ... y solving a nonlinear optimization problem, which aims to match the local 3D model against the pre-built scene. Experiments have shown the effectiveness and feasibility of the proposed approach.
arXiv:1509.06016v1 fatcat:4ucsltjevvfpbiy7fozibthkau

ALID: Scalable Dominant Cluster Detection [article]

Lingyang Chu, Shuhui Wang, Siyuan Liu, Qingming Huang, Jian Pei
2014 arXiv   pre-print
Detecting dominant clusters is important in many analytic applications. The state-of-the-art methods find dense subgraphs on the affinity graph as the dominant clusters. However, the time and space complexity of those methods are dominated by the construction of the affinity graph, which is quadratic with respect to the number of data points, and thus impractical on large data sets. To tackle the challenge, in this paper, we apply Evolutionary Game Theory (EGT) and develop a scalable algorithm,
more » ... Approximate Localized Infection Immunization Dynamics (ALID). The major idea is to perform Localized Infection Immunization Dynamics (LID) to find dense subgraph within local range of the affinity graph. LID is further scaled up with guaranteed high efficiency and detection quality by an estimated Region of Interest (ROI) and a carefully designed Candidate Infective Vertex Search method (CIVS). ALID only constructs small local affinity graphs and has a time complexity of O(C(a^*+ δ)n) and a space complexity of O(a^*(a^*+ δ)), where a^* is the size of the largest dominant cluster and C << n and δ << n are small constants. We demonstrate by extensive experiments on both synthetic data and real world data that ALID achieves state-of-the-art detection quality with much lower time and space cost on single machine. We also demonstrate the encouraging parallelization performance of ALID by implementing the Parallel ALID (PALID) on Apache Spark. PALID processes 50 million SIFT data points in 2.29 hours, achieving a speedup ratio of 7.51 with 8 executors.
arXiv:1411.0064v1 fatcat:aonwzh6nbfeojgzhuesnzyksdq

An Enhanced Knowledge Injection Model for Commonsense Generation [article]

Zhihao Fan, Yeyun Gong, Zhongyu Wei, Siyuan Wang, Yameng Huang, Jian Jiao, Xuanjing Huang, Nan Duan, Ruofei Zhang
2020 arXiv   pre-print
Commonsense generation aims at generating plausible everyday scenario description based on a set of provided concepts. Digging the relationship of concepts from scratch is non-trivial, therefore, we retrieve prototypes from external knowledge to assist the understanding of the scenario for better description generation. We integrate two additional modules, namely position indicator and scaling module, into the pretrained encoder-decoder model for prototype modeling to enhance the knowledge
more » ... tion procedure. We conduct experiment on CommonGen benchmark, and experimental results show that our method significantly improves the performance on all the metrics.
arXiv:2012.00366v1 fatcat:pda7ejfhtvdrzgw2llsdiug4la

Enhancer-promoter association determines Sox2 transcription regulation in mouse pluripotent cells [article]

Lei Huang, Qing Li, Qitong Huang, Siyuan Kong, Xiusheng Zhu, Yanling Peng, Yubo Zhang
2019 bioRxiv   pre-print
GO and KEGG enrichment analyses of the DEGs are performed using DAVID 6.8 (Huang da et al., 2009b; Huang da et al., 2009a) and KOBAS 3.0 (Wu et al., 2006; Xie et al., 2011) software, respectively.  ... 
doi:10.1101/590745 fatcat:nh5rkzcqdbfddgzhsff47fxlge

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image [chapter]

Siyuan Huang, Siyuan Qi, Yixin Zhu, Yinxue Xiao, Yuanlu Xu, Song-Chun Zhu
2018 Lecture Notes in Computer Science  
We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set of CAD models using a stochastic grammar model. Specifically, we introduce a Holistic Scene Grammar (HSG) to represent the 3D scene structure, which characterizes a joint distribution over the functional and geometric space of indoor scenes. The proposed HSG captures three essential and often latent dimensions of the indoor scenes: i) latent human context,
more » ... bing the affordance and the functionality of a room arrangement, ii) geometric constraints over the scene configurations, and iii) physical constraints that guarantee physically plausible parsing and reconstruction. We solve this joint parsing and reconstruction problem in an analysis-by-synthesis fashion, seeking to minimize the differences between the input image and the rendered images generated by our 3D representation, over the space of depth, surface normal, and object segmentation map. The optimal configuration, represented by a parse graph, is inferred using Markov chain Monte Carlo (MCMC), which efficiently traverses through the non-differentiable solution space, jointly optimizing object localization, 3D layout, and hidden human context. Experimental results demonstrate that the proposed algorithm improves the generalization ability and significantly outperforms prior methods on 3D layout estimation, 3D object detection, and holistic scene understanding.
doi:10.1007/978-3-030-01234-2_12 fatcat:n2lyd2g6sve6lartrjexgdal3q

Tracking Every Thing in the Wild [article]

Siyuan Li, Martin Danelljan, Henghui Ding, Thomas E. Huang, Fisher Yu
2022 arXiv   pre-print
Current multi-category Multiple Object Tracking (MOT) metrics use class labels to group tracking results for per-class evaluation. Similarly, MOT methods typically only associate objects with the same class predictions. These two prevalent strategies in MOT implicitly assume that the classification performance is near-perfect. However, this is far from the case in recent large-scale MOT datasets, which contain large numbers of classes with many rare or semantically similar categories.
more » ... the resulting inaccurate classification leads to sub-optimal tracking and inadequate benchmarking of trackers. We address these issues by disentangling classification from tracking. We introduce a new metric, Track Every Thing Accuracy (TETA), breaking tracking measurement into three sub-factors: localization, association, and classification, allowing comprehensive benchmarking of tracking performance even under inaccurate classification. TETA also deals with the challenging incomplete annotation problem in large-scale tracking datasets. We further introduce a Track Every Thing tracker (TETer), that performs association using Class Exemplar Matching (CEM). Our experiments show that TETA evaluates trackers more comprehensively, and TETer achieves significant improvements on the challenging large-scale datasets BDD100K and TAO compared to the state-of-the-art.
arXiv:2207.12978v1 fatcat:bqalvyjyofcjtdlghvegn7irmi

VLGrammar: Grounded Grammar Induction of Vision and Language [article]

Yining Hong, Qing Li, Song-Chun Zhu, Siyuan Huang
2021 arXiv   pre-print
Cognitive grammar suggests that the acquisition of language grammar is grounded within visual structures. While grammar is an essential representation of natural language, it also exists ubiquitously in vision to represent the hierarchical part-whole structure. In this work, we study grounded grammar induction of vision and language in a joint learning framework. Specifically, we present VLGrammar, a method that uses compound probabilistic context-free grammars (compound PCFGs) to induce the
more » ... guage grammar and the image grammar simultaneously. We propose a novel contrastive learning framework to guide the joint learning of both modules. To provide a benchmark for the grounded grammar induction task, we collect a large-scale dataset, PartIt, which contains human-written sentences that describe part-level semantics for 3D objects. Experiments on the PartIt dataset show that VLGrammar outperforms all baselines in image grammar induction and language grammar induction. The learned VLGrammar naturally benefits related downstream tasks. Specifically, it improves the image unsupervised clustering accuracy by 30%, and performs well in image retrieval and text retrieval. Notably, the induced grammar shows superior generalizability by easily generalizing to unseen categories.
arXiv:2103.12975v1 fatcat:mx6q5dm3hrbi7lrtsm3ned7pja

Nonlinear Local Metric Learning for Person Re-identification [article]

Siyuan Huang, Jiwen Lu, Jie Zhou, Anil K. Jain
2015 arXiv   pre-print
Person re-identification aims at matching pedestrians observed from non-overlapping camera views. Feature descriptor and metric learning are two significant problems in person re-identification. A discriminative metric learning method should be capable of exploiting complex nonlinear transformations due to the large variations in feature space. In this paper, we propose a nonlinear local metric learning (NLML) method to improve the state-of-the-art performance of person re-identification on
more » ... ic datasets. Motivated by the fact that local metric learning has been introduced to handle the data which varies locally and deep neural network has presented outstanding capability in exploiting the nonlinearity of samples, we utilize the merits of both local metric learning and deep neural network to learn multiple sets of nonlinear transformations. By enforcing a margin between the distances of positive pedestrian image pairs and distances of negative pairs in the transformed feature subspace, discriminative information can be effectively exploited in the developed neural networks. Our experiments show that the proposed NLML method achieves the state-of-the-art results on the widely used VIPeR, GRID, and CUHK 01 datasets.
arXiv:1511.05169v1 fatcat:bigdznbdoveyzksjajrlvkrkka

Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation [article]

Siyuan Huang, Siyuan Qi, Yinxue Xiao, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu
2019 arXiv   pre-print
., 2017 , Huang et al., 2018 apply sampling or optimization methods to infer the geometry and semantics of indoor scenes.  ...  We compare the estimation of the proposed model with three previous methods-3DGP [Choi et al., 2013] , IM2CAD [Izadinia et al., 2017] and HoPR [Huang [Choi et al., 2013] 19.2 2.1 0.7 0.6 13.9 HoPR  ... 
arXiv:1810.13049v2 fatcat:x5robvibl5hb3p4v6javs72kfa

Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense [article]

Yixin Chen, Siyuan Huang, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu
2019 arXiv   pre-print
A more recent work by Huang et al.  ...  Table 1 . 1 Quantitative Results of 3D Scene Reconstruction Methods Huang et al. [15] Ours Metric 2D IoU (%) 3D IoU (%) Depth (m) 2D IOU (%) 3D IoU (%) Depth (m) PiGraphs 68.6 21.4 - 75.1 24.9 - SUN RGB-D  ... 
arXiv:1909.01507v1 fatcat:svbd33j7hvaz5jbysjwwhqhnoy

ALID

Lingyang Chu, Shuhui Wang, Siyuan Liu, Qingming Huang, Jian Pei
2015 Proceedings of the VLDB Endowment  
Detecting dominant clusters is important in many analytic applications. The state-of-the-art methods find dense subgraphs on the affinity graph as dominant clusters. However, the time and space complexities of those methods are dominated by the construction of affinity graph, which is quadratic with respect to the number of data points, and thus are impractical on large data sets. To tackle the challenge, in this paper, we apply Evolutionary Game Theory (EGT) and develop a scalable algorithm,
more » ... proximate Localized Infection Immunization Dynamics (ALID). The major idea is to perform Localized Infection Immunization Dynamics (LID) to find dense subgraphs within local ranges of the affinity graph. LID is further scaled up with guaranteed high efficiency and detection quality by an estimated Region of Interest (ROI) and a Candidate Infective Vertex Search method (CIVS). ALID only constructs small local affinity graphs and has time complexity O(C(a * + δ)n) and space complexity O(a * (a * + δ)), where a * is the size of the largest dominant cluster, and C n and δ n are small constants. We demonstrate by extensive experiments on both synthetic data and real world data that ALID achieves the state-of-theart detection quality with much lower time and space cost on single machine. We also demonstrate the encouraging parallelization performance of ALID by implementing the Parallel ALID (PALID) on Apache Spark. PALID processes 50 million SIFT data points in 2.29 hours, achieving a speedup ratio of 7.51 with 8 executors.
doi:10.14778/2757807.2757808 fatcat:evfiqvvid5c43gyeuwdit6ybra
« Previous Showing results 1 — 15 out of 1,204 results