A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds
2022
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
unpublished
Dense captioning in 3D point clouds is an emerging vision-and-language task involving object-level 3D scene understanding. Apart from coarse semantic class prediction and bounding box regression as in traditional 3D object detection, 3D dense captioning aims at producing a further and finer instance-level label of natural language description on visual appearance and spatial relations for each scene object of interest. To detect and describe objects in a scene, following the spirit of neural
doi:10.24963/ijcai.2022/191
fatcat:jehaitgimvd2rhihtrpni5ocfi