Filters








846 Hits in 1.9 sec

Learning of Visual Relations: The Devil is in the Tails [article]

Alakh Desai, Tz-Ying Wu, Subarna Tripathi, Nuno Vasconcelos
2021 arXiv   pre-print
Significant effort has been recently devoted to modeling visual relations. This has mostly addressed the design of architectures, typically by adding parameters and increasing model complexity. However, visual relation learning is a long-tailed problem, due to the combinatorial nature of joint reasoning about groups of objects. Increasing model complexity is, in general, ill-suited for long-tailed problems due to their tendency to overfit. In this paper, we explore an alternative hypothesis,
more » ... oted the Devil is in the Tails. Under this hypothesis, better performance is achieved by keeping the model simple but improving its ability to cope with long-tailed distributions. To test this hypothesis, we devise a new approach for training visual relationships models, which is inspired by state-of-the-art long-tailed recognition literature. This is based on an iterative decoupled training scheme, denoted Decoupled Training for Devil in the Tails (DT2). DT2 employs a novel sampling approach, Alternating Class-Balanced Sampling (ACBS), to capture the interplay between the long-tailed entity and predicate distributions of visual relations. Results show that, with an extremely simple architecture, DT2-ACBS significantly outperforms much more complex state-of-the-art methods on scene graph generation tasks. This suggests that the development of sophisticated models must be considered in tandem with the long-tailed nature of the problem.
arXiv:2108.09668v1 fatcat:4vhp4x2ombevbjsm347ccdhx2i

Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier [article]

Tz-Ying Wu, Pedro Morgado, Pei Wang, Chih-Hui Ho, Nuno Vasconcelos
2020 arXiv   pre-print
Long-tail recognition tackles the natural non-uniformly distributed data in real-world scenarios. While modern classifiers perform well on populated classes, its performance degrades significantly on tail classes. Humans, however, are less affected by this since, when confronted with uncertain examples, they simply opt to provide coarser predictions. Motivated by this, a deep realistic taxonomic classifier (Deep-RTC) is proposed as a new solution to the long-tail problem, combining realism with
more » ... hierarchical predictions. The model has the option to reject classifying samples at different levels of the taxonomy, once it cannot guarantee the desired performance. Deep-RTC is implemented with a stochastic tree sampling during training to simulate all possible classification conditions at finer or coarser levels and a rejection mechanism at inference time. Experiments on the long-tailed version of four datasets, CIFAR100, AWA2, Imagenet, and iNaturalist, demonstrate that the proposed approach preserves more information on all classes with different popularity levels. Deep-RTC also outperforms the state-of-the-art methods in longtailed recognition, hierarchical classification, and learning with rejection literature using the proposed correctly predicted bits (CPB) metric.
arXiv:2007.09898v1 fatcat:tttr66jedfgwrom46ymt4vzofq

Class-Incremental Learning with Strong Pre-trained Models [article]

Tz-Ying Wu, Gurumurthy Swaminathan, Zhizhong Li, Avinash Ravichandran, Nuno Vasconcelos, Rahul Bhotika, Stefano Soatto
2022 arXiv   pre-print
Class-incremental learning (CIL) has been widely studied under the setting of starting from a small number of classes (base classes). Instead, we explore an understudied real-world setting of CIL that starts with a strong model pre-trained on a large number of base classes. We hypothesize that a strong base model can provide a good representation for novel classes and incremental learning can be done with small adaptations. We propose a 2-stage training scheme, i) feature augmentation --
more » ... part of the backbone and fine-tuning it on the novel data, and ii) fusion -- combining the base and novel classifiers into a unified classifier. Experiments show that the proposed method significantly outperforms state-of-the-art CIL methods on the large-scale ImageNet dataset (e.g. +10% overall accuracy than the best). We also propose and analyze understudied practical CIL scenarios, such as base-novel overlap with distribution shift. Our proposed method is robust and generalizes to all analyzed CIL settings.
arXiv:2204.03634v1 fatcat:dbk353t6jngwdj2uv5qa2jvuoy

Liquid Pouring Monitoring via Rich Sensory Inputs [article]

Tz-Ying Wu, Juan-Ting Lin, Tsun-Hsuang Wang, Chan-Wei Hu, Juan Carlos Niebles, Min Sun
2018 arXiv   pre-print
Humans have the amazing ability to perform very subtle manipulation task using a closed-loop control system with imprecise mechanics (i.e., our body parts) but rich sensory information (e.g., vision, tactile, etc.). In the closed-loop system, the ability to monitor the state of the task via rich sensory information is important but often less studied. In this work, we take liquid pouring as a concrete example and aim at learning to continuously monitor whether liquid pouring is successful
more » ... no spilling) or not via rich sensory inputs. We mimic humans' rich sensories using synchronized observation from a chest-mounted camera and a wrist-mounted IMU sensor. Given many success and failure demonstrations of liquid pouring, we train a hierarchical LSTM with late fusion for monitoring. To improve the robustness of the system, we propose two auxiliary tasks during training: inferring (1) the initial state of containers and (2) forecasting the one-step future 3D trajectory of the hand with an adversarial training procedure. These tasks encourage our method to learn representation sensitive to container states and how objects are manipulated in 3D. With these novel components, our method achieves ~8% and ~11% better monitoring accuracy than the baseline method without auxiliary tasks on unseen containers and unseen users respectively.
arXiv:1808.01725v1 fatcat:wrslze6le5e35ipvzhyr7ghwpa

Explainable Object-induced Action Decision for Autonomous Vehicles [article]

Yiran Xu, Xiaoyin Yang, Lihang Gong, Hsuan-Chu Lin, Tz-Ying Wu, Yunsheng Li, Nuno Vasconcelos
2020 arXiv   pre-print
A new paradigm is proposed for autonomous driving. The new paradigm lies between the end-to-end and pipelined approaches, and is inspired by how humans solve the problem. While it relies on scene understanding, the latter only considers objects that could originate hazard. These are denoted as action-inducing, since changes in their state should trigger vehicle actions. They also define a set of explanations for these actions, which should be produced jointly with the latter. An extension of
more » ... BDD100K dataset, annotated for a set of 4 actions and 21 explanations, is proposed. A new multi-task formulation of the problem, which optimizes the accuracy of both action commands and explanations, is then introduced. A CNN architecture is finally proposed to solve this problem, by combining reasoning about action inducing objects and global scene context. Experimental results show that the requirement of explanations improves the recognition of action-inducing objects, which in turn leads to better action predictions.
arXiv:2003.09405v1 fatcat:mwas7jknmncjnaciggmma7jbmu

Anticipating Daily Intention using On-Wrist Motion Triggered Sensing [article]

Tz-Ying Wu, Ting-An Chien, Cheng-Sheng Chan, Chan-Wei Hu, Min Sun
2017 arXiv   pre-print
Anticipating human intention by observing one's actions has many applications. For instance, picking up a cellphone, then a charger (actions) implies that one wants to charge the cellphone (intention). By anticipating the intention, an intelligent system can guide the user to the closest power outlet. We propose an on-wrist motion triggered sensing system for anticipating daily intentions, where the on-wrist sensors help us to persistently observe one's actions. The core of the system is a
more » ... Recurrent Neural Network (RNN) and Policy Network (PN), where the RNN encodes visual and motion observation to anticipate intention, and the PN parsimoniously triggers the process of visual observation to reduce computation requirement. We jointly trained the whole network using policy gradient and cross-entropy loss. To evaluate, we collect the first daily "intention" dataset consisting of 2379 videos with 34 intentions and 164 unique action sequences. Our method achieves 92.68%, 90.85%, 97.56% accuracy on three users while processing only 29% of the visual observation on average.
arXiv:1710.07477v1 fatcat:apq4uszvzbfyzbxl2oyknjyoli

Exploit Clues from Views: Self-Supervised and Regularized Learning for Multiview Object Recognition [article]

Chih-Hui Ho, Bo Liu, Tz-Ying Wu, Nuno Vasconcelos
2020 arXiv   pre-print
Multiview recognition has been well studied in the literature and achieves decent performance in object recognition and retrieval task. However, most previous works rely on supervised learning and some impractical underlying assumptions, such as the availability of all views in training and inference time. In this work, the problem of multiview self-supervised learning (MV-SSL) is investigated, where only image to object association is given. Given this setup, a novel surrogate task for
more » ... ervised learning is proposed by pursuing "object invariant" representation. This is solved by randomly selecting an image feature of an object as object prototype, accompanied with multiview consistency regularization, which results in view invariant stochastic prototype embedding (VISPE). Experiments shows that the recognition and retrieval results using VISPE outperform that of other self-supervised learning methods on seen and unseen data. VISPE can also be applied to semi-supervised scenario and demonstrates robust performance with limited data available. Code is available at https://github.com/chihhuiho/VISPE
arXiv:2003.12735v1 fatcat:yrvmzctjxzemfbaj5l64zjvigy

Quality of life among infertile women with endometriosis undergoing IVF treatment and their pregnancy outcomes

Meng-Hsing Wu, Pei-Fang Su, Wei-Ying Chu, Chih-Wei Lin, New Geok Huey, Chung-Ying Lin, Huang-Tz Ou
2020 Figshare  
Objective: We assessed the quality of life (QoL) and pregnancy outcomes of in vitro fertilization (IVF) treatment among infertile women with endometriosis, as compared to infertile women without endometriosis. Study design: Eighty-one (81) endometriosis women (with 142 embryo transfer [ET] cycles) and 605 non-endometriosis women (with 1063 ET cycles) were included. QoL was measured by FertiQoL at the date before ET. Pregnancy outcomes included biochemical pregnancy, ongoing pregnancy and live
more » ... rth. Generalized estimating equation analyses were performed to assess the association between QoL and IVF pregnancy. Results: Endometriosis-affected women had significantly lower QoL, as indicated by mind/body, treatment environment and total treatment scores, and total scores of FertiQoL (p p Conclusions: Lower QoL among women with endometriosis versus non-endometriosis during IVF treatment highlights the importance of developing strategies to improve their QoL, which may enhance following pregnancy rates in this population.
doi:10.6084/m9.figshare.12212498 fatcat:f7gsng3ewnf43n5vszj4rjax2e

High dynamic range image reconstruction from hand-held cameras

Pei-Ying Lu, Tz-Huan Huang, Meng-Sung Wu, Yi-Ting Cheng, Yung-Yu Chuang
2009 2009 IEEE Conference on Computer Vision and Pattern Recognition  
This paper presents a technique for reconstructing a high-quality high dynamic range (HDR) image from a set of differently exposed and possibly blurred images taken with a hand-held camera. Recovering an HDR image from differently exposed photographs has become very popular. However, it often requires a tripod to keep the camera still when taking photographs of different exposures. To ease the process, it is often preferred to use a hand-held camera. This, however, leads to two problems,
more » ... ned photographs and blurred long-exposed photographs. To overcome these problems, this paper adapts an alignment method and proposes a method for HDR reconstruction from possibly blurred images. We use Bayesian framework to formulate the problem and apply a maximumlikelihood approach to iteratively perform blur kernel estimation, HDR image reconstruction and camera curve recovery. When convergence, we simultaneously obtain an HDR image with rich and clear structures, the camera response curve and blur kernels. To show the effectiveness of our method, we test our method on both synthetic and real photographs. The proposed method compares favorably to two other related methods in the experiments.
doi:10.1109/cvpr.2009.5206768 dblp:conf/cvpr/LuHWCC09 fatcat:5uzvryqfffcmxfep6kp644tunq

High dynamic range image reconstruction from hand-held cameras

Pei-Ying Lu, Tz-Huan Huang, Meng-Sung Wu, Yi-Ting Cheng, Yung-Yu Chuang
2009 2009 IEEE Conference on Computer Vision and Pattern Recognition  
This paper presents a technique for reconstructing a high-quality high dynamic range (HDR) image from a set of differently exposed and possibly blurred images taken with a hand-held camera. Recovering an HDR image from differently exposed photographs has become very popular. However, it often requires a tripod to keep the camera still when taking photographs of different exposures. To ease the process, it is often preferred to use a hand-held camera. This, however, leads to two problems,
more » ... ned photographs and blurred long-exposed photographs. To overcome these problems, this paper adapts an alignment method and proposes a method for HDR reconstruction from possibly blurred images. We use Bayesian framework to formulate the problem and apply a maximumlikelihood approach to iteratively perform blur kernel estimation, HDR image reconstruction and camera curve recovery. When convergence, we simultaneously obtain an HDR image with rich and clear structures, the camera response curve and blur kernels. To show the effectiveness of our method, we test our method on both synthetic and real photographs. The proposed method compares favorably to two other related methods in the experiments.
doi:10.1109/cvprw.2009.5206768 fatcat:lsar4qj4kjechoacyebrcngwky

Explainable Object-Induced Action Decision for Autonomous Vehicles

Yiran Xu, Xiaoyin Yang, Lihang Gong, Hsuan-Chu Lin, Tz-Ying Wu, Yunsheng Li, Nuno Vasconcelos
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
A new paradigm is proposed for autonomous driving. The new paradigm lies between the end-to-end and pipelined approaches, and is inspired by how humans solve the problem. While it relies on scene understanding, the latter only considers objects that could originate hazard. These are denoted as action inducing, since changes in their state should trigger vehicle actions. They also define a set of explanations for these actions, which should be produced jointly with the latter. An extension of
more » ... BDD100K dataset, annotated for a set of 4 actions and 21 explanations, is proposed. A new multi-task formulation of the problem, which optimizes the accuracy of both action commands and explanations, is then introduced. A CNN architecture is finally proposed to solve this problem, by combining reasoning about action inducing objects and global scene context. Experimental results show that the requirement of explanations improves the recognition of actioninducing objects, which in turn leads to better action predictions.
doi:10.1109/cvpr42600.2020.00954 dblp:conf/cvpr/XuYGLWLV20 fatcat:ftapbczu4nd7zhqofgxefupj3a

Exploit Clues From Views: Self-Supervised and Regularized Learning for Multiview Object Recognition

Chih-Hui Ho, Bo Liu, Tz-Ying Wu, Nuno Vasconcelos
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
Multiview recognition has been well studied in the literature and achieves decent performance in object recognition and retrieval task. However, most previous works rely on supervised learning and some impractical underlying assumptions, such as the availability of all views in training and inference time. In this work, the problem of multiview self-supervised learning (MV-SSL) is investigated, where only image to object association is given. Given this setup, a novel surrogate task for
more » ... ervised learning is proposed by pursuing "object invariant" representation. This is solved by randomly selecting an image feature of an object as object prototype, accompanied with multiview consistency regularization, which results in view invariant stochastic prototype embedding (VISPE). Experiments shows that the categorization and retrieval results using VISPE outperform that of other self-supervised learning methods on seen and unseen data. VISPE can also be applied to semi-supervised scenario and demonstrates robust performance with limited data available. Code is available at https://github.com/chihhuiho/VISPE
doi:10.1109/cvpr42600.2020.00911 dblp:conf/cvpr/HoLWV20 fatcat:y3xne5j6jrfehn7bmm56jqfp44

Validation of Chinese Version of Polycystic Ovary Syndrome Health-Related Quality of Life Questionnaire (Chi-PCOSQ)

Chung-Ying Lin, Huang-tz Ou, Meng-Hsing Wu, Pei-Chi Chen, Stephen L Atkin
2016 PLoS ONE  
Objectives To evaluate the responsiveness, longitudinal validity, and measurement invariance of the Chinese version of the Polycystic Ovary Syndrome Health-related Quality of Life Questionnaire (Chi-PCOSQ). Research Design and Method This prospective study was conducted in a medical center in southern Taiwan. 102 women aged 18-45 years and diagnosed with PCOS were enrolled. Objective indicators for clinical changes of PCOS included assessing the 2-hour glucose and insulin levels before and
more » ... treatment. The responsiveness of Chi-PCOSQ and WHOQOL-BREF was analyzed using paired t-tests and the standard response mean. Confirmatory factor analysis was performed to assess the measurement invariance of Chi-PCOSQ. Results With improved 2-hour glucose and insulin levels, we also found significantly increased Chi-PCOSQ total and individual domain scores (total score: t (49) = 5.20; p < 0.001, domain scores: t (49) = 2.72 to 3.87; p < 0.01), except for hair growth. Half of the domains scores (3 of 6) and the total score of Chi-PCOSQ had a medium responsiveness, but WHOQOL-BREF was not sufficiently responsive to clinical changes of PCOS. Improved PCOS-specific health-related quality of life (HRQoL), as indicated by Chi-PCOSQ scores, was significantly associated with improved 2-hour glucose and insulin levels. All indices of the datamodel fit of the Chi-PCOSQ structure were satisfactory, except for the slightly high standardized root mean square residual values (0.087 to 0.088). The measurement invariance of Chi-PCOSQ was supported across time. Conclusion Chi-PCOSQ is sufficiently sensitive in detecting clinical changes and its measurement structure is suitable for Chinese women with PCOS. It is thus a promising tool for assessing the HRQoL of ethnic Chinese women with PCOS. Validation of Chi-PCOSQ PLOS ONE |
doi:10.1371/journal.pone.0154343 pmid:27124836 pmcid:PMC4849642 fatcat:qq52ekhwjfdapk3binx6bik4x4

Development of Chinese Version of Polycystic Ovary Syndrome Health-Related Quality of Life Questionnaire (Chi-PCOSQ)

Huang-tz Ou, Meng-Hsing Wu, Chung-Ying Lin, Pei-Chi Chen, David J. Handelsman
2015 PLoS ONE  
Objectives To develop the Chinese version of the Polycystic Ovary Syndrome Health-related Quality of Life Questionnaire (Chi-PCOSQ). Research Design and Method This cross-sectional study was conducted in a medical center in Taiwan. Eighty women who met the criteria were enrolled: female, age range of 18-45 years, competent in the Chinese language, had been diagnosed with polycystic ovary syndrome (PCOS), and were regularly followed at outpatient clinics (defined as at least two outpatient
more » ... before enrollment). The PCOSQ was translated and culturally adapted according to standard procedures. A semi-structured interview was applied to assess face validity. Exploratory factor analysis (EFA) was applied to determine scale constructs. Measurements of internal consistency via Cronbach's α, test-retest reliability via intraclass correlation coefficient (ICC), construct validity, and discriminative validity were performed. Results Five additional items, representing the issues of acne, hair loss, and fear of getting diabetes, were incorporated into the original scale. A six-factor structure emerged as a result of the EFA, explaining 71.9% of the variance observed. The reliability analyses demonstrated satisfactory results for Cronbach's α ranging from 0.78-0.96, and for ICC ranging from 0.73-0.86. Construct validity was confirmed by significant correlation between the domains of the Chi-PCOSQ and generic health-related quality of life (HRQoL) measures (WHOQOL-BREF, EQ-5D) and clinical parameters (body mass index, waist-hip ratio, blood pressure).
doi:10.1371/journal.pone.0137772 pmid:26452153 pmcid:PMC4599828 fatcat:2ouh3uhzprf4fpwkht73jdvfui

Quality of life and pregnancy outcomes among women undergoing in vitro fertilization treatment: A longitudinal cohort study

Meng-Hsing Wu, Pei-Fang Su, Wei-Ying Chu, New Geok Huey, Chih-Wei Lin, Huang-Tz Ou, Chung-Ying Lin
2019 Journal of the Formosan Medical Association  
This study assessed the quality of life (QoL) and pregnancy outcomes among infertile women undergoing in vitro fertilization (IVF) treatment to investigate the association between QoL and IVF pregnancy outcomes. This study included 686 women with 1205 embryo transfers (ETs). QoL was measured using the fertility quality of life (FertiQoL) tool before ET. FertiQoL comprises two modules: a Core module (including mind/body, emotional, relational, and social domains) and a Treatment module (covering
more » ... treatment environment and tolerability domains). The FertiQol total and subscale scores were computed and scored in the range of 0-100 (higher scores indicate better QoL). Multivariate generalized estimating equation analyses were carried out to assess the association between QoL and IVF pregnancy outcomes, with adjustment for time-varying factors across multiple ETs for a given person. The lowest score in the core module was for the emotional domain (62.0), and that in the Treatment module was for the tolerability domain (59.4). QoL scores were significantly and positively associated with pregnancy outcomes (i.e., ongoing pregnancy, live birth); with a one unit increase in the emotional domain score, the probabilities of ongoing pregnancy and live birth significantly increased by 2.4% and 2.6%, respectively (p < 0.05). This study evaluated the prospective association between QoL and IVF pregnancy outcomes among infertile women. The results highlight the importance of developing clinical strategies to improve QoL among infertile women undergoing IVF treatment, which may further improve the pregnancy rates of this population.
doi:10.1016/j.jfma.2019.06.015 pmid:31300324 fatcat:ncsceouo2vgm7ousxdrl66anca
« Previous Showing results 1 — 15 out of 846 results