Filters








57 Hits in 1.2 sec

Weakly Supervised Object Boundaries [article]

Anna Khoreva, Rodrigo Benenson, Mohamed Omran, Matthias Hein, Bernt Schiele
2015 arXiv   pre-print
State-of-the-art learning based boundary detection methods require extensive training data. Since labelling object boundaries is one of the most expensive types of annotations, there is a need to relax the requirement to carefully annotate images to make both the training more affordable and to extend the amount of training data. In this paper we propose a technique to generate weakly supervised annotations and show that bounding box annotations alone suffice to reach high-quality object
more » ... ies without using any object-specific boundary annotations. With the proposed weak supervision techniques we achieve the top performance on the object boundary detection task, outperforming by a large margin the current fully supervised state-of-the-art methods.
arXiv:1511.07803v1 fatcat:rpqncsk5xrcxfno3fhrmzbejkm

Learning to Refine Human Pose Estimation [article]

Mihai Fieraru, Anna Khoreva, Leonid Pishchulin, Bernt Schiele
2018 arXiv   pre-print
Multi-person pose estimation in images and videos is an important yet challenging task with many applications. Despite the large improvements in human pose estimation enabled by the development of convolutional neural networks, there still exist a lot of difficult cases where even the state-of-the-art models fail to correctly localize all body joints. This motivates the need for an additional refinement step that addresses these challenging cases and can be easily applied on top of any existing
more » ... method. In this work, we introduce a pose refinement network (PoseRefiner) which takes as input both the image and a given pose estimate and learns to directly predict a refined pose by jointly reasoning about the input-output space. In order for the network to learn to refine incorrect body joint predictions, we employ a novel data augmentation scheme for training, where we model "hard" human pose cases. We evaluate our approach on four popular large-scale pose estimation benchmarks such as MPII Single- and Multi-Person Pose Estimation, PoseTrack Pose Estimation, and PoseTrack Pose Tracking, and report systematic improvement over the state of the art.
arXiv:1804.07909v1 fatcat:jz7qr44ypzestda7bpjjo6jalu

Learning Video Object Segmentation from Static Images [article]

Anna Khoreva, Federico Perazzi, Rodrigo Benenson, Bernt Schiele, Alexander Sorkine-Hornung
2016 arXiv   pre-print
Inspired by recent advances of deep learning in instance segmentation and object tracking, we introduce video object segmentation problem as a concept of guided instance segmentation. Our model proceeds on a per-frame basis, guided by the output of the previous frame towards the object of interest in the next frame. We demonstrate that highly accurate object segmentation in videos can be enabled by using a convnet trained with static images only. The key ingredient of our approach is a
more » ... on of offline and online learning strategies, where the former serves to produce a refined mask from the previous frame estimate and the latter allows to capture the appearance of the specific object instance. Our method can handle different types of input annotations: bounding boxes and segments, as well as incorporate multiple annotated frames, making the system suitable for diverse applications. We obtain competitive results on three different datasets, independently from the type of input annotation.
arXiv:1612.02646v1 fatcat:adpt3artlndsfld6ttxuffan5e

Progressive Augmentation of GANs [article]

Dan Zhang, Anna Khoreva
2019 arXiv   pre-print
Training of Generative Adversarial Networks (GANs) is notoriously fragile, requiring to maintain a careful balance between the generator and the discriminator in order to perform well. To mitigate this issue we introduce a new regularization technique - progressive augmentation of GANs (PA-GAN). The key idea is to gradually increase the task difficulty of the discriminator by progressively augmenting its input or feature space, thus enabling continuous learning of the generator. We show that
more » ... proposed progressive augmentation preserves the original GAN objective, does not compromise the discriminator's optimality and encourages a healthy competition between the generator and discriminator, leading to the better-performing generator. We experimentally demonstrate the effectiveness of PA-GAN across different architectures and on multiple benchmarks for the image synthesis task, on average achieving ~3 point improvement of the FID score.
arXiv:1901.10422v3 fatcat:eg35ntfhovdtbbfnvjp2srw5iu

Video Object Segmentation with Language Referring Expressions [article]

Anna Khoreva, Anna Rohrbach, Bernt Schiele
2019 arXiv   pre-print
Most state-of-the-art semi-supervised video object segmentation methods rely on a pixel-accurate mask of a target object provided for the first frame of a video. However, obtaining a detailed segmentation mask is expensive and time-consuming. In this work we explore an alternative way of identifying a target object, namely by employing language referring expressions. Besides being a more practical and natural way of pointing out a target object, using language specifications can help to avoid
more » ... ift as well as make the system more robust to complex dynamics and appearance variations. Leveraging recent advances of language grounding models designed for images, we propose an approach to extend them to video data, ensuring temporally coherent predictions. To evaluate our method we augment the popular video object segmentation benchmarks, DAVIS'16 and DAVIS'17 with language descriptions of target objects. We show that our language-supervised approach performs on par with the methods which have access to a pixel-level mask of the target object on DAVIS'16 and is competitive to methods using scribbles on the challenging DAVIS'17 dataset.
arXiv:1803.08006v3 fatcat:qzv4vpl4ojap3lriyexugtycby

Learning to Generate Novel Scene Compositions from Single Images and Videos [article]

Vadim Sushko, Juergen Gall, Anna Khoreva
2021 arXiv   pre-print
Training GANs in low-data regimes remains a challenge, as overfitting often leads to memorization or training divergence. In this work, we introduce One-Shot GAN that can learn to generate samples from a training set as little as one image or one video. We propose a two-branch discriminator, with content and layout branches designed to judge the internal content separately from the scene layout realism. This allows synthesis of visually plausible, novel compositions of a scene, with varying
more » ... ent and layout, while preserving the context of the original sample. Compared to previous single-image GAN models, One-Shot GAN achieves higher diversity and quality of synthesis. It is also not restricted to the single image setting, successfully learning in the introduced setting of a single video.
arXiv:2105.05847v1 fatcat:4aprmi23mzgpzj2yfmj2jtry6y

Improved Image Boundaries for Better Video Segmentation [article]

Anna Khoreva, Rodrigo Benenson, Fabio Galasso, Matthias Hein, Bernt Schiele
2016 arXiv   pre-print
Graph-based video segmentation methods rely on superpixels as starting point. While most previous work has focused on the construction of the graph edges and weights as well as solving the graph partitioning problem, this paper focuses on better superpixels for video segmentation. We demonstrate by a comparative analysis that superpixels extracted from boundaries perform best, and show that boundary estimation can be significantly improved via image and time domain cues. With superpixels
more » ... ed from our better boundaries we observe consistent improvement for two video segmentation methods in two different datasets.
arXiv:1605.03718v2 fatcat:7wrnfe7aerc5hitja64aeyr6v4

A U-Net Based Discriminator for Generative Adversarial Networks [article]

Edgar Schönfeld, Bernt Schiele, Anna Khoreva
2021 arXiv   pre-print
Among the major remaining challenges for generative adversarial networks (GANs) is the capacity to synthesize globally and locally coherent images with object shapes and textures indistinguishable from real images. To target this issue we propose an alternative U-Net based discriminator architecture, borrowing the insights from the segmentation literature. The proposed U-Net based architecture allows to provide detailed per-pixel feedback to the generator while maintaining the global coherence
more » ... f synthesized images, by providing the global image feedback as well. Empowered by the per-pixel response of the discriminator, we further propose a per-pixel consistency regularization technique based on the CutMix data augmentation, encouraging the U-Net discriminator to focus more on semantic and structural changes between real and fake images. This improves the U-Net discriminator training, further enhancing the quality of generated samples. The novel discriminator improves over the state of the art in terms of the standard distribution and image quality metrics, enabling the generator to synthesize images with varying structure, appearance and levels of detail, maintaining global and local realism. Compared to the BigGAN baseline, we achieve an average improvement of 2.7 FID points across FFHQ, CelebA, and the newly introduced COCO-Animals dataset. The code is available at https://github.com/boschresearch/unetgan.
arXiv:2002.12655v2 fatcat:yf7y7qpvxzemxlxgvhn2aqd4ya

Weakly Supervised Object Boundaries

Anna Khoreva, Rodrigo Benenson, Mohamed Omran, Matthias Hein, Bernt Schiele
2016 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
State-of-the-art learning based boundary detection methods require extensive training data. Since labelling object boundaries is one of the most expensive types of annotations, there is a need to relax the requirement to carefully annotate images to make both the training more affordable and to extend the amount of training data. In this paper we propose a technique to generate weakly supervised annotations and show that bounding box annotations alone suffice to reach high-quality object
more » ... ies without using any object-specific boundary annotations. With the proposed weak supervision techniques we achieve the top performance on the object boundary detection task, outperforming by a large margin the current fully supervised state-of-theart methods.
doi:10.1109/cvpr.2016.27 dblp:conf/cvpr/KhorevaBO0S16 fatcat:vbyolr7dqfet7geuut7zmvceye

Grid Saliency for Context Explanations of Semantic Segmentation [article]

Lukas Hoyer, Mauricio Munoz, Prateek Katiyar, Anna Khoreva, Volker Fischer
2019 arXiv   pre-print
Recently, there has been a growing interest in developing saliency methods that provide visual explanations of network predictions. Still, the usability of existing methods is limited to image classification models. To overcome this limitation, we extend the existing approaches to generate grid saliencies, which provide spatially coherent visual explanations for (pixel-level) dense prediction networks. As the proposed grid saliency allows to spatially disentangle the object and its context, we
more » ... pecifically explore its potential to produce context explanations for semantic segmentation networks, discovering which context most influences the class predictions inside a target object area. We investigate the effectiveness of grid saliency on a synthetic dataset with an artificially induced bias between objects and their context as well as on the real-world Cityscapes dataset using state-of-the-art segmentation networks. Our results show that grid saliency can be successfully used to provide easily interpretable context explanations and, moreover, can be employed for detecting and localizing contextual biases present in the data.
arXiv:1907.13054v2 fatcat:ik7vn7rqwrbi3irkfmm6gnnkhu

Improved Image Boundaries for Better Video Segmentation [chapter]

Anna Khoreva, Rodrigo Benenson, Fabio Galasso, Matthias Hein, Bernt Schiele
2016 Lecture Notes in Computer Science  
Graph-based video segmentation methods rely on superpixels as starting point. While most previous work has focused on the construction of the graph edges and weights as well as solving the graph partitioning problem, this paper focuses on better superpixels for video segmentation. We demonstrate by a comparative analysis that superpixels extracted from boundaries perform best, and show that boundary estimation can be significantly improved via image and time domain cues. With superpixels
more » ... ed from our better boundaries we observe consistent improvement for two video segmentation methods in two different datasets. Fig. 1 . Graph based video segmentation relies on having high quality superpixels/voxels as starting point (graph nodes). We explore diverse techniques to improve boundary estimates, which result in better superpixels, which in turn has a significant impact on final video segmentation.
doi:10.1007/978-3-319-49409-8_64 fatcat:ow7hvcxcvva7lj5v24ioa25rli

Lucid Data Dreaming for Video Object Segmentation

Anna Khoreva, Rodrigo Benenson, Eddy Ilg, Thomas Brox, Bernt Schiele
2019 International Journal of Computer Vision  
B Anna Khoreva khoreva@mpi-inf.mpg.de Rodrigo Benenson benenson@google.com Eddy Ilg ilg@cs.uni-freiburg.com Thomas Brox brox@cs.uni-freiburg.com Bernt Schiele schiele@mpi-inf.mpg.de 1  ...  A summary of the proposed approach was provided online (Khoreva et al. 2017 ).  ... 
doi:10.1007/s11263-019-01164-6 fatcat:oavuhac5xjgyzigyt7iqovlz5u

Simple Does It: Weakly Supervised Instance and Semantic Segmentation [article]

Anna Khoreva, Rodrigo Benenson, Jan Hosang, Matthias Hein, Bernt Schiele
2016 arXiv   pre-print
Semantic labelling and instance segmentation are two tasks that require particularly costly annotations. Starting from weak supervision in the form of bounding box detection annotations, we propose a new approach that does not require modification of the segmentation training procedure. We show that when carefully designing the input labels from given bounding boxes, even a single round of training is enough to improve over previously reported weakly supervised results. Overall, our weak
more » ... sion approach reaches ~95% of the quality of the fully supervised model, both for semantic labelling and instance segmentation.
arXiv:1603.07485v2 fatcat:i3bsyd6jzbhlrj7zfw2xanr3wy

Learning to Refine Human Pose Estimation

Mihai Fieraru, Anna Khoreva, Leonid Pishchulin, Bernt Schiele
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)  
Multi-person pose estimation in images and videos is an important yet challenging task with many applications. Despite the large improvements in human pose estimation enabled by the development of convolutional neural networks, there still exist a lot of difficult cases where even the state-of-the-art models fail to correctly localize all body joints. This motivates the need for an additional refinement step that addresses these challenging cases and can be easily applied on top of any existing
more » ... method. In this work, we introduce a pose refinement network (PoseRefiner) which takes as input both the image and a given pose estimate and learns to directly predict a refined pose by jointly reasoning about the input-output space. In order for the network to learn to refine incorrect body joint predictions, we employ a novel data augmentation scheme for training, where we model "hard" human pose cases. We evaluate our approach on four popular large-scale pose estimation benchmarks such as MPII Single-and Multi-Person Pose Estimation, PoseTrack Pose Estimation, and PoseTrack Pose Tracking, and report systematic improvement over the state of the art.
doi:10.1109/cvprw.2018.00058 dblp:conf/cvpr/FieraruKPS18 fatcat:44wzkwdgrbgh5dw2arvakp75vy

Classifier based graph construction for video segmentation

Anna Khoreva, Fabio Galasso, Matthias Hein, Bernt Schiele
2015 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
0.45 0.51 0.42 80.17(37.56) 8.00 Segm. propagation [20] 0.61 0.65 0.59 0.59 0.62 0.56 25.50(36.48) 258.05 Galasso et al.'14 [19] 0.62 0.66 0.54 0.55 0.59 0.55 61.25(40.87) 80.00 Khoreva  ... 
doi:10.1109/cvpr.2015.7298697 dblp:conf/cvpr/KhorevaG0S15 fatcat:i3mekzwiubasrmobrejh4l7oba
« Previous Showing results 1 — 15 out of 57 results