Filters








39 Hits in 1.9 sec

Single-Shot Panoptic Segmentation [article]

Mark Weber, Jonathon Luiten, Bastian Leibe
2020 arXiv   pre-print
We present a novel end-to-end single-shot method that segments countable object instances (things) as well as background regions (stuff) into a non-overlapping panoptic segmentation at almost video frame rate. Current state-of-the-art methods are far from reaching video frame rate and mostly rely on merging instance segmentation with semantic background segmentation, making them impractical to use in many applications such as robotics. Our approach relaxes this requirement by using an object
more » ... ector but is still able to resolve inter- and intra-class overlaps to achieve a non-overlapping segmentation. On top of a shared encoder-decoder backbone, we utilize multiple branches for semantic segmentation, object detection, and instance center prediction. Finally, our panoptic head combines all outputs into a panoptic segmentation and can even handle conflicting predictions between branches as well as certain false predictions. Our network achieves 32.6% PQ on MS-COCO at 23.5 FPS, opening up panoptic segmentation to a broader field of applications.
arXiv:1911.00764v2 fatcat:i2y2t2gb6rcfhhsuwhn45aw2qq

UnOVOST: Unsupervised Offline Video Object Segmentation and Tracking [article]

Jonathon Luiten, Idil Esen Zulfikar, Bastian Leibe
2020 arXiv   pre-print
We address Unsupervised Video Object Segmentation (UVOS), the task of automatically generating accurate pixel masks for salient objects in a video sequence and of tracking these objects consistently through time, without any input about which objects should be tracked. Towards solving this task, we present UnOVOST (Unsupervised Offline Video Object Segmentation and Tracking) as a simple and generic algorithm which is able to track and segment a large variety of objects. This algorithm builds up
more » ... tracks in a number stages, first grouping segments into short tracklets that are spatio-temporally consistent, before merging these tracklets into long-term consistent object tracks based on their visual similarity. In order to achieve this we introduce a novel tracklet-based Forest Path Cutting data association algorithm which builds up a decision forest of track hypotheses before cutting this forest into paths that form long-term consistent object tracks. When evaluating our approach on the DAVIS 2017 Unsupervised dataset we obtain state-of-the-art performance with a mean J &F score of 67.9% on the val, 58% on the test-dev and 56.4% on the test-challenge benchmarks, obtaining first place in the DAVIS 2019 Unsupervised Video Object Segmentation Challenge. UnOVOST even performs competitively with many semi-supervised video object segmentation algorithms even though it is not given any input as to which objects should be tracked and segmented.
arXiv:2001.05425v1 fatcat:f44uqw4hujbrdhehqojdkjwxgu

Towards Large-Scale Video Video Object Mining [article]

Aljosa Osep, Paul Voigtlaender, Jonathon Luiten, Stefan Breuers, Bastian Leibe
2018 arXiv   pre-print
We propose to leverage a generic object tracker in order to perform object mining in large-scale unlabeled videos, captured in a realistic automotive setting. We present a dataset of more than 360'000 automatically mined object tracks from 10+ hours of video data (560'000 frames) and propose a method for automated novel category discovery and detector learning. In addition, we show preliminary results on using the mined tracks for object detector adaptation.
arXiv:1809.07316v1 fatcat:mlm5oscinrfzlfqpihr6arpqhq

Differentiable Soft-Masked Attention [article]

Ali Athar, Jonathon Luiten, Alexander Hermans, Deva Ramanan, Bastian Leibe
2022 arXiv   pre-print
Transformers have become prevalent in computer vision due to their performance and flexibility in modelling complex operations. Of particular significance is the 'cross-attention' operation, which allows a vector representation (e.g. of an object in an image) to be learned by attending to an arbitrarily sized set of input features. Recently, "Masked Attention" was proposed in which a given object representation only attends to those image pixel features for which the segmentation mask of that
more » ... ject is active. This specialization of attention proved beneficial for various image and video segmentation tasks. In this paper, we propose another specialization of attention which enables attending over 'soft-masks' (those with continuous mask probabilities instead of binary values), and is also differentiable through these mask probabilities, thus allowing the mask used for attention to be learned within the network without requiring direct loss supervision. This can be useful for several applications. Specifically, we employ our "Differentiable Soft-Masked Attention" for the task of Weakly-Supervised Video Object Segmentation (VOS), where we develop a transformer-based network for VOS which only requires a single annotated image frame for training, but can also benefit from cycle consistency training on a video with just one annotated frame. Although there is no loss for masks in unlabeled frames, the network is still able to segment objects in those frames due to our novel attention formulation. Code: https://github.com/Ali2500/HODOR/blob/main/hodor/modelling/encoder/soft_masked_attention.py
arXiv:2206.00182v2 fatcat:l3eiak3zjjf5rpm6kht3zrvxci

BoLTVOS: Box-Level Tracking for Video Object Segmentation [article]

Paul Voigtlaender and Jonathon Luiten and Bastian Leibe
2019 arXiv   pre-print
In order to produce segmentation masks for the VOS task, we use an off-the-shelf bounding-box-tosegmentation-mask network by adopting the code and pretrained weights from Luiten et al. [28] .  ... 
arXiv:1904.04552v2 fatcat:tgf74nhxkvclpkdbv2lbgfkrpu

4D Generic Video Object Proposals [article]

Aljosa Osep, Paul Voigtlaender, Mark Weber, Jonathon Luiten, Bastian Leibe
2020 arXiv   pre-print
Many high-level video understanding methods require input in the form of object proposals. Currently, such proposals are predominantly generated with the help of networks that were trained for detecting and segmenting a set of known object classes, which limits their applicability to cases where all objects of interest are represented in the training set. This is a restriction for automotive scenarios, where unknown objects can frequently occur. We propose an approach that can reliably extract
more » ... patio-temporal object proposals for both known and unknown object categories from stereo video. Our 4D Generic Video Tubes (4D-GVT) method leverages motion cues, stereo data, and object instance segmentation to compute a compact set of video-object proposals that precisely localizes object candidates and their contours in 3D space and time. We show that given only a small amount of labeled data, our 4D-GVT proposal generator generalizes well to real-world scenarios, in which unknown categories appear. It outperforms other approaches that try to detect as many objects as possible by increasing the number of classes in the training set to several thousand.
arXiv:1901.09260v3 fatcat:itjnypqps5g5fo7iep54ifujni

Siam R-CNN: Visual Tracking by Re-Detection [article]

Paul Voigtlaender, Jonathon Luiten, Philip H.S. Torr, Bastian Leibe
2020 arXiv   pre-print
We present Siam R-CNN, a Siamese re-detection architecture which unleashes the full power of two-stage object detection approaches for visual object tracking. We combine this with a novel tracklet-based dynamic programming algorithm, which takes advantage of re-detections of both the first-frame template and previous-frame predictions, to model the full history of both the object to be tracked and potential distractor objects. This enables our approach to make better tracking decisions, as well
more » ... as to re-detect tracked objects after long occlusion. Finally, we propose a novel hard example mining strategy to improve Siam R-CNN's robustness to similar looking objects. Siam R-CNN achieves the current best performance on ten tracking benchmarks, with especially strong results for long-term tracking. We make our code and models available at www.vision.rwth-aachen.de/page/siamrcnn.
arXiv:1911.12836v2 fatcat:alfpbtovnnaergxyha77o3qz2u

Forecasting from LiDAR via Future Object Detection [article]

Neehar Peri, Jonathon Luiten, Mengtian Li, Aljoša Ošep, Laura Leal-Taixé, Deva Ramanan
2022 arXiv   pre-print
Object detection and forecasting are fundamental components of embodied perception. These two problems, however, are largely studied in isolation by the community. In this paper, we propose an end-to-end approach for detection and motion forecasting based on raw sensor measurement as opposed to ground truth tracks. Instead of predicting the current frame locations and forecasting forward in time, we directly predict future object locations and backcast to determine where each trajectory began.
more » ... ur approach not only improves overall accuracy compared to other modular or end-to-end baselines, it also prompts us to rethink the role of explicit tracking for embodied perception. Additionally, by linking future and current locations in a many-to-one manner, our approach is able to reason about multiple futures, a capability that was previously considered difficult for end-to-end approaches. We conduct extensive experiments on the popular nuScenes dataset and demonstrate the empirical effectiveness of our approach. In addition, we investigate the appropriateness of reusing standard forecasting metrics for an end-to-end setup, and find a number of limitations which allow us to build simple baselines to game these metrics. We address this issue with a novel set of joint forecasting and detection metrics that extend the commonly used AP metrics from the detection community to measuring forecasting accuracy. Our code is available at https://github.com/neeharperi/FutureDet
arXiv:2203.16297v2 fatcat:lxrzwuzlyvbpfpmr3mujnfdipm

Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video [article]

Aljoša Ošep and Paul Voigtlaender and Jonathon Luiten and Stefan Breuers and Bastian Leibe
2017 arXiv   pre-print
We explore object discovery and detector adaptation based on unlabeled video sequences captured from a mobile platform. We propose a fully automatic approach for object mining from video which builds upon a generic object tracking approach. By applying this method to three large video datasets from autonomous driving and mobile robotics scenarios, we demonstrate its robustness and generality. Based on the object mining results, we propose a novel approach for unsupervised object discovery by
more » ... earance-based clustering. We show that this approach successfully discovers interesting objects relevant to driving scenarios. In addition, we perform self-supervised detector adaptation in order to improve detection performance on the KITTI dataset for existing categories. Our approach has direct relevance for enabling large-scale object learning for autonomous driving.
arXiv:1712.08832v1 fatcat:5uj26t7vijajrdcu3umhyexwha

MOTS: Multi-Object Tracking and Segmentation [article]

Paul Voigtlaender, Michael Krause, Aljosa Osep, Jonathon Luiten, Berin Balachandar Gnana Sekar, Andreas Geiger, Bastian Leibe
2019 arXiv   pre-print
This paper extends the popular task of multi-object tracking to multi-object tracking and segmentation (MOTS). Towards this goal, we create dense pixel-level annotations for two existing tracking datasets using a semi-automatic annotation procedure. Our new annotations comprise 65,213 pixel masks for 977 distinct objects (cars and pedestrians) in 10,870 video frames. For evaluation, we extend existing multi-object tracking metrics to this new task. Moreover, we propose a new baseline method
more » ... h jointly addresses detection, tracking, and segmentation with a single convolutional network. We demonstrate the value of our datasets by achieving improvements in performance when training on MOTS annotations. We believe that our datasets, metrics and baseline will become a valuable resource towards developing multi-object tracking approaches that go beyond 2D bounding boxes. We make our annotations, code, and models available at https://www.vision.rwth-aachen.de/page/mots.
arXiv:1902.03604v2 fatcat:vbbe6fems5gltb6nilt45g56oy

PReMVOS: Proposal-generation, Refinement and Merging for Video Object Segmentation [article]

Jonathon Luiten, Paul Voigtlaender, Bastian Leibe
2018 arXiv   pre-print
We address semi-supervised video object segmentation, the task of automatically generating accurate and consistent pixel masks for objects in a video sequence, given the first-frame ground truth annotations. Towards this goal, we present the PReMVOS algorithm (Proposal-generation, Refinement and Merging for Video Object Segmentation). Our method separates this problem into two steps, first generating a set of accurate object segmentation mask proposals for each video frame and then selecting
more » ... merging these proposals into accurate and temporally consistent pixel-wise object tracks over a video sequence in a way which is designed to specifically tackle the difficult challenges involved with segmenting multiple objects across a video sequence. Our approach surpasses all previous state-of-the-art results on the DAVIS 2017 video object segmentation benchmark with a J & F mean score of 71.6 on the test-dev dataset, and achieves first place in both the DAVIS 2018 Video Object Segmentation Challenge and the YouTube-VOS 1st Large-scale Video Object Segmentation Challenge.
arXiv:1807.09190v2 fatcat:yhw4l5nb5fg6tphdbvwwbxj2bi

BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video [article]

Ali Athar, Jonathon Luiten, Paul Voigtlaender, Tarasha Khurana, Achal Dave, Bastian Leibe, Deva Ramanan
2022 arXiv   pre-print
We refer the reader to Luiten et al. [20] for detailed explanations. Object Classes.  ... 
arXiv:2209.12118v1 fatcat:ymfftq7ztvev5died224nbnfwq

Track to Reconstruct and Reconstruct to Track [article]

Jonathon Luiten and Tobias Fischer and Bastian Leibe
2019 arXiv   pre-print
Equal Contribution 1 RWTH Aachen University {luiten, leibe}@vision.rwth-aachen.de tobias.fischer@rwth-aachen.de Code available: https://github.com/tobiasfshr/MOTSFusion * TABLE I MASK I TRACKING RESULTS  ... 
arXiv:1910.00130v2 fatcat:5qgijnzeevgj7eqhwvcybtun7i

Large-Scale Object Mining for Object Discovery from Unlabeled Video [article]

Aljosa Osep, Paul Voigtlaender, Jonathon Luiten, Stefan Breuers, Bastian Leibe
2019 arXiv   pre-print
This paper addresses the problem of object discovery from unlabeled driving videos captured in a realistic automotive setting. Identifying recurring object categories in such raw video streams is a very challenging problem. Not only do object candidates first have to be localized in the input images, but many interesting object categories occur relatively infrequently. Object discovery will therefore have to deal with the difficulties of operating in the long tail of the object distribution. We
more » ... demonstrate the feasibility of performing fully automatic object discovery in such a setting by mining object tracks using a generic object tracker. In order to facilitate further research in object discovery, we release a collection of more than 360,000 automatically mined object tracks from 10+ hours of video data (560,000 frames). We use this dataset to evaluate the suitability of different feature representations and clustering strategies for object discovery.
arXiv:1903.00362v2 fatcat:mn7gephdsfcadfar7g4yzyrd4y

Opening up Open-World Tracking [article]

Yang Liu and Idil Esen Zulfikar and Jonathon Luiten and Achal Dave and Deva Ramanan and Bastian Leibe and Aljoša Ošep and Laura Leal-Taixé
2022 arXiv   pre-print
Tracking and detecting any object, including ones never-seen-before during model training, is a crucial but elusive capability of autonomous systems. An autonomous agent that is blind to never-seen-before objects poses a safety hazard when operating in the real world - and yet this is how almost all current systems work. One of the main obstacles towards advancing tracking any object is that this task is notoriously difficult to evaluate. A benchmark that would allow us to perform an
more » ... pples comparison of existing efforts is a crucial first step towards advancing this important research field. This paper addresses this evaluation deficit and lays out the landscape and evaluation methodology for detecting and tracking both known and unknown objects in the open-world setting. We propose a new benchmark, TAO-OW: Tracking Any Object in an Open World, analyze existing efforts in multi-object tracking, and construct a baseline for this task while highlighting future challenges. We hope to open a new front in multi-object tracking research that will hopefully bring us a step closer to intelligent systems that can operate safely in the real world. https://openworldtracking.github.io/
arXiv:2104.11221v2 fatcat:5apds4w2nzcv7bteobuz3jvyiy
« Previous Showing results 1 — 15 out of 39 results