A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit <a rel="external noopener" href="https://pure.mpg.de/rest/items/item_3025802_1/component/file_3025803/content">the original URL</a>. The file type is <code>application/pdf</code>.
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ilwxppn4d5hizekyd3ndvy2mii" style="color: black;">2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</a>
Figure 1 . We propose multi-frame self-supervised training of a deep network based on in-the-wild video data for jointly learning a face model and 3D face reconstruction. Our approach successfully disentangles facial shape, appearance, expression, and scene illumination. Abstract Monocular image-based 3D reconstruction of faces is a long-standing problem in computer vision. Since image data is a 2D projection of a 3D face, the resulting depth ambiguity makes the problem ill-posed. Most existing<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/cvpr.2019.01107">doi:10.1109/cvpr.2019.01107</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/cvpr/TewariB0BESPZT19.html">dblp:conf/cvpr/TewariB0BESPZT19</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/6gf5b75bkzbldhzbyqnun4okzm">fatcat:6gf5b75bkzbldhzbyqnun4okzm</a> </span>
more »... methods rely on data-driven priors that are built from limited 3D face scans. In contrast, we propose multi-frame video-based self-supervised training of a deep network that (i) learns a face identity model both in shape and appearance while (ii) jointly learning to reconstruct 3D faces. Our face model is learned using only corpora of in-the-wild video clips collected from the Internet. This virtually endless source of training data enables learning of a highly general 3D face model. In order to achieve this, we propose a novel multiframe consistency loss that ensures consistent shape and appearance across multiple frames of a subject's face, thus minimizing depth ambiguity. At test time we can use an arbitrary number of frames, so that we can perform both monocular as well as multi-frame reconstruction.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200510141056/https://pure.mpg.de/rest/items/item_3025802_1/component/file_3025803/content" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/af/06/af06db796c9da52bd53b190ce5540ac7df8f2114.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/cvpr.2019.01107"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>