6,556 Hits in 8.0 sec

SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings

Lorenzo Papa, Edoardo Alati, Paolo Russo, Irene Amerini
2022 IEEE Access  
This paper presents SPEED, a Separable Pyramidal pooling EncodEr-Decoder architecture designed to achieve real-time frequency performances on multiple hardware platforms.  ...  Approaches based on the state of the art vision transformer architectures are extremely deep and complex not suitable for real-time inference operations on edge and autonomous systems equipped with low  ...  AKNOWLEDGMENT The authors would like to thank Fabiana Di Ciaccio, University La Parthenope, Naples, Italy, for the extensive editing performed on the text.  ... 
doi:10.1109/access.2022.3170425 fatcat:23vapuasjjabdit5wgwx2mhn4u

DTS-Depth: Real-Time Single-Image Depth Estimation Using Depth-to-Space Image Construction

Hatem Ibrahem, Ahmed Salem, Hyun-Soo Kang
2022 Sensors  
We compare our method with the state-of-the-art methods on depth estimation, showing that our method outperforms those methods. However, the architecture is less complex and works in real time.  ...  The proposed method efficiently constructs a high-resolution depth map using a small encoding architecture and eliminates the need for a decoder, which is typically used in the encoderdecoder architectures  ...  It employs a lightweight encoder-decoder architecture, which is appropriate for embedded devices.  ... 
doi:10.3390/s22051914 pmid:35271061 pmcid:PMC8914965 fatcat:775ewmewyba35dcaxybxp2e2hm

Depth estimation on embedded computers for robot swarms in forest [article]

Chaoyue Niu, Danesh Tarapore, Klaus-Peter Zauner
2021 arXiv   pre-print
This paper mainly describes depth estimation models trained on our own dataset recorded in forest, and their performance on embedded on-board computers.  ...  We develop two depth estimation models and evaluate their performance on Raspberry Pi 4 and Jetson Nano in terms of accuracy, runtime and model size of depth estimation models, as well as memory consumption  ...  An encoder-decoder architecture (EDA) [27] is comprised of encoder MobileNetV2 [28] , and learnable decoder, which deployed, compiled, optimized by TVM [29] on NVIDIA Jetson TX2 runs at power consumption  ... 
arXiv:2012.02907v2 fatcat:3hddvyi45jb3zbp6ornkja4rsm

Visual Domain Adaptation for Monocular Depth Estimation on Resource-Constrained Hardware [article]

Julia Hornauer, Lazaros Nalpantidis, Vasileios Belagiannis
2021 arXiv   pre-print
Then, we present an adversarial learning approach that is adapted for training on the device with limited resources.  ...  Real-world perception systems in many cases build on hardware with limited resources to adhere to cost and power limitations of their carrying system.  ...  For example, lightweight architectures have been proposed in [29] , [26] and [23] with the aim to deploy models in real-time on embedded devices, where there is only a minor performance loss.  ... 
arXiv:2108.02671v1 fatcat:yl63zv5jb5gojojvkoy4hi4hmy

Fast Scene Understanding for Autonomous Driving [article]

Davy Neven, Bert De Brabandere, Stamatios Georgoulis, Marc Proesmans, Luc Van Gool
2017 arXiv   pre-print
Our approach builds upon a branched ENet architecture with a shared encoder but different decoder branches for each of the three tasks.  ...  Motivated by this observation and inspired by recent works that tackle multiple tasks with a single integrated architecture, in this paper we present a real-time efficient implementation based on ENet  ...  Acknowledgement: The work was supported by Toyota, and was carried out at the TRACE Lab at KU Leuven (Toyota Research on Automated Cars in Europe -Leuven).  ... 
arXiv:1708.02550v1 fatcat:zzf6rwdalvhvhey466jwrzanbm

Lightweight Monocular Depth Estimation through Guided Decoding [article]

Michael Rudolph, Youssef Dawoud, Ronja Güldenring, Lazaros Nalpantidis, Vasileios Belagiannis
2022 arXiv   pre-print
We present a lightweight encoder-decoder archi- tecture for monocular depth estimation, specifically designed for embedded platforms.  ...  Motivated by the concept of guided image filtering, GUB relies on the image to guide the decoder on upsampling the feature representation and the depth map reconstruction, achieving high resolution results  ...  Consequently, these approaches cannot deliver real-time execution on devices with constrained resources.  ... 
arXiv:2203.04206v1 fatcat:ebmu66r2evfwhfo6rttaxlgqjy

FastMDE: A Fast CNN Architecture for Monocular Depth Estimation at High Resolution

Thien-Thanh Dao, Quoc-Viet Pham, Won-Joo Hwang
2022 IEEE Access  
This study proposes a fast monocular depth estimation model named FastMDE by optimizing the deep convolutional neural network according to the encoder-decoder architecture.  ...  The model can facilitate the development and applications with superior performances and easy deployment on an embedded platform.  ...  It is essential to develop an efficient convolutional neural network (CNN) model that can run in real-time on embedded devices.  ... 
doi:10.1109/access.2022.3145969 fatcat:hsf5y25otbgtlfmcaq3haw2e4q

Efficient 2.5D Hand Pose Estimation via Auxiliary Multi-Task Training for Embedded Devices [article]

Prajwal Chidananda, Ayan Sinha, Adithya Rao, Douglas Lee, Andrew Rabinovich
2019 arXiv   pre-print
In this work, we discuss the data, architecture, and training procedure necessary to deploy extremely efficient 2.5D hand pose estimation on embedded devices with highly constrained memory and compute  ...  Our 2.5D hand pose estimation consists of 2D key-point estimation of joint positions on an egocentric image, captured by a depth sensor, and lifted to 2.5D using the corresponding depth values.  ...  Acknowledgements We would like to acknowledge Lexin Tang at Magic Leap, Inc. for her work on the embedded code implementation.  ... 
arXiv:1909.05897v1 fatcat:qdtln75z4fdpvmeaz7fdveyogm

Cloud-Assisted Smart Camera Networks for Energy-Efficient 3D Video Streaming

2014 Computer  
The cloud server must consider all these views concurrently as well as differences among display devices to achieve optimal high-quality scalable video streaming in real time.  ...  We looked at architectural components that support cloud-assisted video encoding on the client side, cloud-based video decoding on the server side, and scalable cloud-client networking.  ...  The cloud server must consider all these views concurrently as well as differences among display devices to achieve optimal high-quality scalable video streaming in real time.  ... 
doi:10.1109/mc.2014.114 fatcat:dspyqfhz5vf6llok7paratrfre

Towards real-time unsupervised monocular depth estimation on CPU [article]

Matteo Poggi, Filippo Aleotti, Fabio Tosi, Stefano Mattoccia
2018 arXiv   pre-print
To tackle this issue, in this paper we propose a novel architecture capable to quickly infer an accurate depth map on a CPU, even of an embedded system, using a pyramid of features extracted from a single  ...  Unsupervised depth estimation from a single image is a very attractive technique with several implications in robotic, autonomous navigation, augmented reality and so on.  ...  ACKNOWLEDGMENT We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X GPU used for this research. We also thank Andrea Guccini for Figure 2 .  ... 
arXiv:1806.11430v3 fatcat:ycjwqa7p5vayvk3n2zrfags7by

Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report [article]

Andrey Ignatov, Grigory Malivenko, David Plowman, Samarth Shukla, Radu Timofte, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo, Gang Yu, Bin Fu, Yiran Wang (+26 others)
2021 arXiv   pre-print
Depth estimation is an important computer vision problem with many practical applications to mobile devices.  ...  To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based depth estimation solutions that can demonstrate a nearly real-time  ...  Quantized Image Super-Resolution on Edge SoC NPUs [31] • Real-Time Video Super-Resolution on Mobile GPUs [28] • Single-Image Depth Estimation on Mobile Devices • Quantized Camera Scene Detection on  ... 
arXiv:2105.08630v1 fatcat:kfxoerr7ijh3to5wzzrkezore4

Learning Depth for Scene Reconstruction using an Encoder-decoder Model

Xiaohan Tu, Cheng Xu, Siping Liu, Guoqi Xie, Jing Huang, Renfa Li, Junsong Yuan
2020 IEEE Access  
INDEX TERMS Convolutional neural networks, depth estimation, decoder, encoder, simultaneous localization and mapping.  ...  To accurately achieve scene reconstruction based on monocular depth estimation, this paper makes three contributions. (1) We design a depth estimation model (DEM), consisting of a precise encoder to re-exploit  ...  Therefore, DEM is real-time with low consumption of GPU, power, and energy for fast depth estimation on embedded devices. VI.  ... 
doi:10.1109/access.2020.2993494 fatcat:wlri3m4fr5bpraul2lqr3snp4i

Real-Time Semantic Stereo Matching [article]

Pier Luigi Dovesi, Matteo Poggi, Lorenzo Andraghetti, Miquel Martí, Hedvig Kjellström, Alessandro Pieropan, Stefano Mattoccia
2020 arXiv   pre-print
Our framework relies on coarse-to-fine estimations in a multi-stage fashion, allowing: i) very fast inference even on embedded devices, with marginal drops in accuracy, compared to state-of-the-art networks  ...  In this paper, we propose a single compact and lightweight architecture for real-time semantic stereo matching.  ...  embedded devices like the NVIDIA Jetson TX2.  ... 
arXiv:1910.00541v2 fatcat:am6gdhajmzbjxjpzy35fpaxnfq

Investigations on the inference optimization techniques and their impact on multiple hardware platforms for Semantic Segmentation [article]

Sethu Hareesh Kolluru
2019 arXiv   pre-print
Fully Convolutional Network (FCN-8s, FCN-16s, and FCN-32s) with a VGG16 encoder architecture and skip connections is trained and validated on the Cityscapes dataset.  ...  Finally, the trained network is ported on to an embedded platform (Nvidia Jetson TX1) and the inference time, as well as the total energy consumed for inference across hardware platforms, are compared.  ...  In SegNet [2] , a similar approach with an encoder-decoder architecture is used to address the loss of detailed structures of an object due to a coarse feature map; The decoder network, however, uses  ... 
arXiv:1911.12993v1 fatcat:ydjwlwwkcfcffbzj5eosin5xia

SpEx: Multi-Scale Time Domain Speaker Extraction Network

Chenglin Xu, Wei Rao, Eng Siong Chng, Haizhou Li
2020 IEEE/ACM Transactions on Audio Speech and Language Processing  
Specifically, the speech encoder converts the mixture speech into multi-scale embedding coefficients, the speaker encoder learns to represent the target speaker with a speaker embedding.  ...  In this way, we avoid phase estimation. The SpEx network consists of four network components, namely speaker encoder, speech encoder, speaker extractor, and speech decoder.  ...  [43] , [44] , [46] , [48] as a single task. 3) Multi-Scale Encoding and Decoding: The TCN architecture in Conv-TasNet works well for single time scale embedding coefficients [41] , [42] .  ... 
doi:10.1109/taslp.2020.2987429 fatcat:xlsfk6ulufeb3cmxhbrhicnfza
« Previous Showing results 1 — 15 out of 6,556 results