Development of MPEG standards for 3D and free viewpoint video

Aljoscha Smolic, Hideaki Kimata, Anthony Vetro, Bahram Javidi, Fumio Okano, Jung-Young Son
2005 Three-Dimensional TV, Video, and Display IV  
An overview of 3D and free viewpoint video is given in this paper with special focus on related standardization activities in MPEG. Free viewpoint video allows the user to freely navigate within real world visual scenes, as known from virtual worlds in computer graphics. Suitable 3D scene representation formats are classified and the processing chain is explained. Examples are shown for image-based and model-based free viewpoint video systems, highlighting standards conform realization using
more » ... G-4. Then the principles of 3D video are introduced providing the user with a 3D depth impression of the observed scene. Example systems are described again focusing on their realization based on MPEG-4. Finally multi-view video coding is described as a key component for 3D and free viewpoint video systems. MPEG is currently working on a new standard for multi-view video coding. The conclusion is that the necessary technology including standard media formats for 3D and free viewpoint is available or will be available in the near future, and that there is a clear demand from industry and user side for such applications. 3DTV at home and free viewpoint video on DVD will be available soon, and will create huge new markets. This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. ABSTRACT An overview of 3D and free viewpoint video is given in this paper with special focus on related standardization activities in MPEG. Free viewpoint video allows the user to freely navigate within real world visual scenes, as known from virtual worlds in computer graphics. Suitable 3D scene representation formats are classified and the processing chain is explained. Examples are shown for image-based and model-based free viewpoint video systems, highlighting standards conform realization using MPEG-4. Then the principles of 3D video are introduced providing the user with a 3D depth impression of the observed scene. Example systems are described again focusing on their realization based on MPEG-4. Finally multi-view video coding is described as a key component for 3D and free viewpoint video systems. MPEG is currently working on a new standard for multi-view video coding. The conclusion is that the necessary technology including standard media formats for 3D and free viewpoint is available or will be available in the near future, and that there is a clear demand from industry and user side for such applications. 3DTV at home and free viewpoint video on DVD will be available soon, and will create huge new markets. Keywords: 3D video, 3DTV, free viewpoint video, MPEG, 3DAV, multi-view video coding, 3D video objects FREE VIEWPOINT VIDEO Free viewpoint video (FVV) offers the same functionality that is known from 3D computer graphics. The user can choose an own viewpoint and viewing direction within a visual scene, meaning interactive free navigation. In contrast to pure computer graphics applications, FVV targets real world scenes as captured by real cameras. This is interesting for user applications (DVD of an opera/concert where the user can freely chose the viewpoint) as well as for (post-) production. Systems for the latter are already being used (e.g. for sports, movies, EyeVision, Matrix-effects). 2.1 Acquisition for Free Viewpoint Video Different technologies can be used for acquisition, processing, representation, and rendering, but all make use of multiple views of the same visual scene [1], as illustrated in Fig. 2 and Fig. 3 . The multiple camera signals are processed and transformed into a specific scene representation format that allows for rendering of virtual intermediate views, i.e. in between the real existing camera positions. With that the user can navigate the scene freely, meaning choosing an individual viewpoint and viewing direction. The camera setting (e.g. array type as in Fig. 2 or dome type as in Fig. 3) and density (i.e. number of cameras) imposes practical limitations to navigation and quality of rendered views at a certain virtual position. For instance the setting in Fig. 2 would not allow rendering a virtual view from inside the aquarium looking towards the cameras. Therefore there is a classical trade-off to consider between costs (for equipment, cameras, processors, etc.) and benefits (navigation range, quality of virtual views).
doi:10.1117/12.631192 fatcat:6woocpm2ezhe5epqnzucwjdnvy