语音驱动表情和嘴型开源项目LiveSpeechPortraits


2023-04-20 00:23:38 •  165次阅读    评论
这是一个由开源分享家,转载的作品信息,您可以通过本页信息及教程所示,来了解和使用这个作品! 
语音驱动表情和嘴型开源项目LiveSpeechPortraits
开发语言:Python
操作系统: Web端
使用说明:

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation

This repository contains the implementation of the following paper:


Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation


Yuanxun Lu, Jinxiang Chai, Xun Cao (SIGGRAPH Asia 2021)


Abstract: To the best of our knowledge, we first present a live system that generates personalized photorealistic talking-head animation only driven by audio signals at over 30 fps. Our system contains three stages. The first stage is a deep neural network that extracts deep audio features along with a manifold projection to project the features to the target person's speech space. In the second stage, we learn facial dynamics and motions from the projected audio features. The predicted motions include head poses and upper body motions, where the former is generated by an autoregressive probabilistic model which models the head pose distribution of the target person. Upper body motions are deduced from head poses. In the final stage, we generate conditional feature maps from previous predictions and send them with a candidate image set to an image-to-image translation network to synthesize photorealistic renderings. Our method generalizes well to wild audio and successfully synthesizes high-fidelity personalized facial details, e.g., wrinkles, teeth. Our method also allows explicit control of head poses. Extensive qualitative and quantitative evaluations, along with user studies, demonstrate the superiority of our method over state-of-the-art techniques.


[Project Page] [Paper] [Arxiv] [Web Demo]


Teaser


Figure 1. Given an arbitrary input audio stream, our system generates personalized and photorealistic talking-head animation in real-time. Right: May and Obama are driven by the same utterance but present different speaking characteristics.




Requirements

This project is successfully trained and tested on Windows10 with PyTorch 1.7 (Python 3.6). Linux and lower version PyTorch should also work (not tested). We recommend creating a new environment:

conda create -n LSP python=3.6

conda activate LSP

Clone the repository:

git clone https://github.com/YuanxunLu/LiveSpeechPortraits.git

cd LiveSpeechPortraits

FFmpeg is required to combine the audio and the silent generated videos. Please check FFmpeg for installation. For Linux users, you can also:

sudo apt-get install ffmpeg

Install the dependences:

pip install -r requirements.txt

Demo

Download the pre-trained models and data from Google Drive to the data folder. Five subjects data are released (May, Obama1, Obama2, Nadella and McStay).


Run the demo:


python demo.py --id May --driving_audio ./data/Input/00083.wav --device cuda

Results can be found under the results folder.


(New!) Docker and Web Demo


We are really grateful to Andreas from Replicate for his amazing job to make the web demo! Now you can run the Demo on the browser.


For the orginal links of these videos, please check issue #7.


Citation

If you find this project useful for your research, please consider citing:


@article{lu2021live,

 author = {Lu, Yuanxun and Chai, Jinxiang and Cao, Xun},

 title = {{Live Speech Portraits}: Real-Time Photorealistic Talking-Head Animation},

 journal = {ACM Transactions on Graphics},

 numpages = {17},

 volume={40},

 number={6},

 month = December,

 year = {2021},

 doi={10.1145/3478513.3480484}

Acknowledgment

This repo was built based on the framework of pix2pix-pytorch.

Thanks the authors of MakeItTalk, ATVG, RhythmicHead, Speech-Driven Animation for making their excellent work and codes publicly available.

Thanks Andreas for the efforts of the web demo.


我也想创建自己的作品主页,了解创建和发布作品的方法 <- go! 

用户评论

开源分享家

该作品的相关教程
该作品暂时没有相关可用教程 您可以选择贡献 +[协助补充]
有穹平台赞助商
暂无赞助商,如需成为赞助商可点击 此链接 了解。