Sahil Goyal, Shagun Uppal, Sarthak Bhagat, Yi Yu, Yifang Yin, Rajiv Ratn Shah
Several works have developed end-to-end pipelines for generating lip-synced talking faces with various real-world applications, such as teaching and language translation in videos. However, these prior works fail to create realistic-looking videos since they focus little on people's expressions and emotions. Moreover, these methods' effectiveness largely depends on the faces in the training dataset, which means they may not perform well on unseen faces. To mitigate this, we build a talking face generation framework conditioned on a categorical emotion to generate videos with appropriate expressions, making them more realistic and convincing. With a broad range of six emotions, i.e., \emph{happiness}, \emph{sadness}, \emph{fear}, \emph{anger}, \emph{disgust}, and \emph{neutral}, we show that our model can adapt to arbitrary identities, emotions, and languages. Our proposed framework is equipped with a user-friendly web interface with a real-time experience for talking face generation with emotions. We also conduct a user study for subjective evaluation of our interface's usability, design, and functionality. Project page: https://midas.iiitd.edu.in/emo/
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Facial Recognition and Modelling | CREMA-D | EmoAcc | 83.2 | EmoGen |
| Facial Recognition and Modelling | CREMA-D | FID | 5.29 | EmoGen |
| Facial Recognition and Modelling | CREMA-D | LSE-C | 6.663 | EmoGen |
| Image Generation | CREMA-D | EmoAcc | 83.2 | EmoGen |
| Image Generation | CREMA-D | FID | 5.29 | EmoGen |
| Image Generation | CREMA-D | LSE-C | 6.663 | EmoGen |
| Face Generation | CREMA-D | EmoAcc | 83.2 | EmoGen |
| Face Generation | CREMA-D | FID | 5.29 | EmoGen |
| Face Generation | CREMA-D | LSE-C | 6.663 | EmoGen |
| Face Reconstruction | CREMA-D | EmoAcc | 83.2 | EmoGen |
| Face Reconstruction | CREMA-D | FID | 5.29 | EmoGen |
| Face Reconstruction | CREMA-D | LSE-C | 6.663 | EmoGen |
| 3D | CREMA-D | EmoAcc | 83.2 | EmoGen |
| 3D | CREMA-D | FID | 5.29 | EmoGen |
| 3D | CREMA-D | LSE-C | 6.663 | EmoGen |
| 3D Face Modelling | CREMA-D | EmoAcc | 83.2 | EmoGen |
| 3D Face Modelling | CREMA-D | FID | 5.29 | EmoGen |
| 3D Face Modelling | CREMA-D | LSE-C | 6.663 | EmoGen |
| 3D Face Reconstruction | CREMA-D | EmoAcc | 83.2 | EmoGen |
| 3D Face Reconstruction | CREMA-D | FID | 5.29 | EmoGen |
| 3D Face Reconstruction | CREMA-D | LSE-C | 6.663 | EmoGen |
| Talking Face Generation | CREMA-D | EmoAcc | 83.2 | EmoGen |
| Talking Face Generation | CREMA-D | FID | 5.29 | EmoGen |
| Talking Face Generation | CREMA-D | LSE-C | 6.663 | EmoGen |
| 10-shot image generation | CREMA-D | EmoAcc | 83.2 | EmoGen |
| 10-shot image generation | CREMA-D | FID | 5.29 | EmoGen |
| 10-shot image generation | CREMA-D | LSE-C | 6.663 | EmoGen |
| 1 Image, 2*2 Stitchi | CREMA-D | EmoAcc | 83.2 | EmoGen |
| 1 Image, 2*2 Stitchi | CREMA-D | FID | 5.29 | EmoGen |
| 1 Image, 2*2 Stitchi | CREMA-D | LSE-C | 6.663 | EmoGen |