2 Hits in 5.7 sec

ShapeCaptioner: Generative Caption Network for 3D Shapes by Learning a Mapping from Parts Detected in Multiple Views to Sentences [article]

Zhizhong Han, Chao Chen, Yu-Shen Liu, Matthias Zwicker
2019 arXiv   pre-print
To resolve this issue, we propose ShapeCaptioner, a generative caption network, to perform 3D shape captioning from semantic parts detected in multiple views.  ...  Our novelty lies in learning the knowledge of part detection in multiple views from 3D shape segmentations and transferring this knowledge to facilitate learning the mapping from 3D shapes to sentences  ...  By representing a 3D shape as a view sequence, ShapeCaptioner aims to learn a mapping from parts detected in the view sequence to a caption describing the 3D shape.  ... 
arXiv:1908.00120v1 fatcat:dexy2q5ucrbi7mtmxjmpq5yei4

Part2Word: Learning Joint Embedding of Point Clouds and Text by Matching Parts to Words [article]

Chuan Tang, Xi Yang, Bojian Wu, Zhizhong Han, Yi Chang
2021 arXiv   pre-print
Current multi-view based methods learn a mapping from multiple rendered views to text.  ...  To resolve this issue, we propose a method to learn joint embedding of point clouds and text by matching parts from shapes to words from sentences in a common space.  ...  OUR METHOD Inspired by the framework of SCAN [25] , we introduce a crossattention mechanism to learn the joint embedding of 3D shapes and text by matching parts from shapes to words from sentences.  ... 
arXiv:2107.01872v1 fatcat:c35e3atc7bahrhdl76tfjsvpoe