Filters








4,620 Hits in 6.3 sec

Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning [article]

Hao Zhu, Huaibo Huang, Yi Li, Aihua Zheng, Ran He
2020 arXiv   pre-print
In this paper, we propose a novel arbitrary talking face generation framework by discovering the audio-visual coherence via the proposed Asymmetric Mutual Information Estimator (AMIE).  ...  Talking face generation aims to synthesize a face video with precise lip synchronization as well as a smooth transition of facial motion over the entire video via the given speech clip and facial image  ...  Herein, we propose audio-visual coherence learning, which learns the shared entropy between audio and visual modalities to generate talking face video with precise lips shape.  ... 
arXiv:1812.06589v2 fatcat:appwjt472fachmqia2tusbksfa

Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning

Hao Zhu, Huaibo Huang, Yi Li, Aihua Zheng, Ran He
2020 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence  
In this paper, we propose a novel arbitrary talking face generation framework by discovering the audio-visual coherence via the proposed Asymmetric Mutual Information Estimator (AMIE).  ...  Talking face generation aims to synthesize a face video with precise lip synchronization as well as a smooth transition of facial motion over the entire video via the given speech clip and facial image  ...  [Chen et al., 2019] employed attention mechanism to predict users' category-wise intention, which was defined as an arbitrary pair of action type and item category, totally different from the Psychology  ... 
doi:10.24963/ijcai.2020/323 dblp:conf/ijcai/WangHWSOC20 fatcat:h3ks7i7oovbebk75b45af3smqy

Knowledge-Based Regularization in Generative Modeling

Naoya Takeishi, Yoshinobu Kawahara
2020 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence  
Prior domain knowledge can greatly help to learn generative models.  ...  In this paper, we propose a method to incorporate prior knowledge of feature relations into the learning of general-purpose generative models.  ...  Herein, we propose audio-visual coherence learning, which learns the shared entropy between audio and visual modalities to generate talking face video with precise lips shape.  ... 
doi:10.24963/ijcai.2020/327 dblp:conf/ijcai/ZhuHLZH20 fatcat:i6og4d2xxrb2vboqp342qvp22i

One-Shot Talking Face Generation from Single-Speaker Audio-Visual Correlation Learning

Suzhen Wang, Lincheng Li, Yu Ding, Xin Yu
2022 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Audio-driven one-shot talking face generation methods are usually trained on video resources of various persons.  ...  Hence, we propose a novel one-shot talking face generation framework by exploring consistent correlations between audio and visual motions from a specific speaker and then transferring audio-driven motion  ...  Proposed Method We propose a new talking face generation framework to make audio-driven portrait videos for arbitrary speakers by learning audio-visual correlations on a specific speaker.  ... 
doi:10.1609/aaai.v36i3.20154 fatcat:e6matchjvzgnpgju7d4ygcdyzm

Deep Audio-Visual Learning: A Survey [article]

Hao Zhu, Mandi Luo, Rui Wang, Aihua Zheng, Ran He
2020 arXiv   pre-print
We divide the current audio-visual learning tasks into four different subfields: audio-visual separation and localization, audio-visual correspondence learning, audio-visual generation, and audio-visual  ...  Audio-visual learning, aimed at exploiting the relationship between audio and visual modalities, has drawn considerable attention since deep learning started to be used successfully.  ...  Synthesis of talking faces of arbitrary identities has recently drawn significant attention. Chen et al.  ... 
arXiv:2001.04758v1 fatcat:p6ph5cujl5do3pzlpvcce35nvi

Deep Audio-visual Learning: A Survey

Hao Zhu, Man-Di Luo, Rui Wang, Ai-Hua Zheng, Ran He
2021 International Journal of Automation and Computing  
We divide the current audio-visual learning tasks into four different subfields: audio-visual separation and localization, audio-visual correspondence learning, audio-visual generation, and audio-visual  ...  AbstractAudio-visual learning, aimed at exploiting the relationship between audio and visual modalities, has drawn considerable attention since deep learning started to be used successfully.  ...  The synthesis of talking faces of arbitrary identities has recently drawn significant attention. Chen et al.  ... 
doi:10.1007/s11633-021-1293-0 fatcat:an5lfyf4m5fh7mlngmdcbx7joy

A Survey on Audio Synthesis and Audio-Visual Multimodal Processing [article]

Zhaofeng Shi
2021 arXiv   pre-print
This review focuses on text to speech(TTS), music generation and some tasks that combine visual and acoustic information.  ...  With the development of deep learning and artificial intelligence, audio synthesis has a pivotal role in the area of machine learning and shows strong applicability in the industry.  ...  [112] proposed a novel talking face generative framework, which discovers audio-visual coherence via asymmetrical mutual information estimator.  ... 
arXiv:2108.00443v1 fatcat:5xkj7lf7pfgpppvfqwynoqkqjm

Talking Faces: Audio-to-Video Face Generation [chapter]

Yuxin Wang, Linsen Song, Wayne Wu, Chen Qian, Ran He, Chen Change Loy
2022 Advances in Computer Vision and Pattern Recognition  
The emergence of deep learning and cross-modality research has led to many interesting works that address talking face generation.  ...  Despite great research efforts in talking face generation, the problem remains challenging due to the need for fine-grained control of face components and the generalization to arbitrary sentences.  ...  [8] employed the idea of mutual information to capture the audio-visual coherence and design a GAN-based framework to generate talking face videos that are robust to pose variations.  ... 
doi:10.1007/978-3-030-87664-7_8 fatcat:5qh2bxrthrbthgjwjzlmm3je4i

Deep Learning for Visual Speech Analysis: A Survey [article]

Changchong Sheng, Gangyao Kuang, Liang Bai, Chenping Hou, Yulan Guo, Xin Xu, Matti Pietikäinen, Li Liu
2022 arXiv   pre-print
Over the past five years, numerous deep learning based methods have been proposed to address various problems in this area, especially automatic visual speech recognition and generation.  ...  Visual speech, referring to the visual domain of speech, has attracted increasing attention due to its wide applications, such as public security, medical treatment, military defense, and film entertainment  ...  Talking Face Generation is also called talking face synthesis, talking head generation, or talking portraits generation.  ... 
arXiv:2205.10839v1 fatcat:l5m4ohtcvnevrliaiwawg3phjq

Towards Realistic Visual Dubbing with Heterogeneous Sources [article]

Tianyi Xie, Liucheng Liao, Cheng Bi, Benlai Tang, Xiang Yin, Jianfei Yang, Mingjie Wang, Jiali Yao, Yang Zhang, Zejun Ma
2022 arXiv   pre-print
The task of few-shot visual dubbing focuses on synchronizing the lip movements with arbitrary speech input for any talking head video.  ...  generation.  ...  talking-head video that corresponds to the driving source. [32] proposed a meta-learning framework with AdaIN [9] technique to solve the few-shot image translation issue. [7] utilizes an image attention  ... 
arXiv:2201.06260v1 fatcat:nzenrmsfinbqrnynka5aw7cmce

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation [article]

Yuanxun Lu, Jinxiang Chai, Xun Cao
2021 arXiv   pre-print
To the best of our knowledge, we first present a live system that generates personalized photorealistic talking-head animation only driven by audio signals at over 30 fps.  ...  In the second stage, we learn facial dynamics and motions from the projected audio features.  ...  They first learned a generalized 3D face model from audio sequences and then fine-tuned the model on the target clip via learning a personspecific blendshape basis, in which case the talking style of the  ... 
arXiv:2109.10595v2 fatcat:s35nqajynjeefcx67k42rpr7r4

Lets Play Music: Audio-driven Performance Video Generation [article]

Hao Zhu, Yi Li, Feixia Zhu, Aihua Zheng, Ran He
2020 arXiv   pre-print
Then, we propose to transformthe generated keypoints to heatmap via a differentiable spacetransformer, since the heatmap offers more spatial informationbut is harder to generate directly from audio.  ...  They are obtained via graph-based structure module,and CNN-GRU based high-level temporal module respectively forfinal video generation.  ...  Talking Face Generation. Given a audio clip, talking face generation aims to synthesize a realistic talking face video with lip synchronization of facial motion over the entire video speech.  ... 
arXiv:2011.02631v1 fatcat:bcstyoexffaezacx6i7eybighi

FaceFormer: Speech-Driven 3D Facial Animation with Transformers [article]

Yingruo Fan, Zhaojiang Lin, Jun Saito, Wenping Wang, Taku Komura
2022 arXiv   pre-print
Speech-driven 3D facial animation is challenging due to the complex geometry of human faces and the limited availability of 3D audio-visual data.  ...  The former effectively aligns the audio-motion modalities, whereas the latter offers abilities to generalize to longer audio sequences.  ...  Our method can animate a realistic 3D talking face from an arbitrary audio signal. However, there is a risk that such techniques could potentially be misused to cause embarrassment.  ... 
arXiv:2112.05329v4 fatcat:v5lnzdznhvfkthimsgxxlaft2m

Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion [article]

Suzhen Wang, Lincheng Li, Yu Ding, Changjie Fan, Xin Yu
2021 arXiv   pre-print
We propose an audio-driven talking-head method to generate photo-realistic talking-head videos from a single reference image.  ...  Then, we develop a motion field generator to produce the dense motion fields from input audio, head poses, and a reference image.  ...  Given an audio clip and one image of an arbitrary speaker, authentic audio-visual content creation has received great attention recently and also has widespread applications, such as human-machine interaction  ... 
arXiv:2107.09293v1 fatcat:wtuzcajtqbfkvjqzxiojja33sy

Parallel and High-Fidelity Text-to-Lip Generation

Jinglin Liu, Zhiying Zhu, Yi Ren, Wencan Huang, Baoxing Huai, Nicholas Yuan, Zhou Zhao
2022 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
As a key component of talking face generation, lip movements generation determines the naturalness and coherence of the generated talking face video.  ...  Furthermore, we incorporate the structural similarity index loss and adversarial learning to improve perceptual quality of generated lip frames and alleviate the blurry prediction problem.  ...  As a key component of talking face generation, lip movements generation (a.k.a. lip generation) determines the naturalness and coherence of the generated talking face video.  ... 
doi:10.1609/aaai.v36i2.20066 fatcat:dwt2crldbjgvznycqnitnfsqwq
« Previous Showing results 1 — 15 out of 4,620 results