Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Audio
/
Audio Generation
/
AudioCaps
Audio Generation on AudioCaps
Metric: FAD (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
FAD (best first)
FAD (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
FAD
▼
Extra Data
Paper
Date
↕
Code
1
Diffsound
7.75
No
Diffsound: Discrete Diffusion Model for Text-to-...
2022-07-20
Code
2
AudioGen
3.13
No
AudioGen: Textually Guided Audio Generation
2022-09-30
Code
3
Make-An-Audio
2.66
No
Make-An-Audio: Text-To-Audio Generation with Pro...
2023-01-30
Code
4
Tango-AF&AC-FT-AC
2.54
No
Improving Text-To-Audio Models with Synthetic Ca...
2024-06-18
Code
5
ETTA
2.51
No
ETTA: Elucidating the Design Space of Text-to-Au...
2024-12-26
Code
6
Consistency TTA (Single-step generation)
2.18
No
ConsistencyTTA: Accelerating Diffusion-Based Tex...
2023-09-19
Code
7
ETTA-FT-AC-100k
2.03
No
ETTA: Elucidating the Design Space of Text-to-Au...
2024-12-26
Code
8
AudioLDM2-large
2.02
No
AudioLDM 2: Learning Holistic Audio Generation w...
2023-08-10
Code
9
AudioLDM-L-Full
1.96
No
AudioLDM: Text-to-Audio Generation with Latent D...
2023-01-29
Code
10
Make-An-Audio 2
1.8
No
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio...
2023-05-29
Code
11
CoDi
1.8
No
Any-to-Any Generation via Composable Diffusion
2023-05-19
Code
12
Auffusion-Full
1.76
No
Auffusion: Leveraging the Power of Diffusion and...
2024-01-02
Code
13
Auffusion
1.63
No
Auffusion: Leveraging the Power of Diffusion and...
2024-01-02
Code
14
TANGO
1.59
No
Text-to-Audio Generation using Instruction-Tuned...
2023-04-24
Code
15
AudioLDM 2-AC-Large
1.42
No
AudioLDM 2: Learning Holistic Audio Generation w...
2023-08-10
Code
16
Re-AudioLDM-L
1.37
No
Retrieval-Augmented Text-to-Audio Generation
2023-09-14
-
17
GenAu-Large
1.21
No
Taming Data and Transformers for Audio Generation
2024-06-27
Code
18
Audiobox Sound
0.77
No
Audiobox: Unified Audio Generation with Natural ...
2023-12-25
-
#1
Diffsound
SOTA
7.75
FAD
· 2022-07-20
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Code
#2
AudioGen
3.13
FAD
· 2022-09-30
AudioGen: Textually Guided Audio Generation
Code
#3
Make-An-Audio
2.66
FAD
· 2023-01-30
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Code
#4
Tango-AF&AC-FT-AC
2.54
FAD
· 2024-06-18
Improving Text-To-Audio Models with Synthetic Captions
Code
#5
ETTA
2.51
FAD
· 2024-12-26
ETTA: Elucidating the Design Space of Text-to-Audio Models
Code
#6
Consistency TTA (Single-step generation)
2.18
FAD
· 2023-09-19
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
Code
#7
ETTA-FT-AC-100k
2.03
FAD
· 2024-12-26
ETTA: Elucidating the Design Space of Text-to-Audio Models
Code
#8
AudioLDM2-large
2.02
FAD
· 2023-08-10
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Code
#9
AudioLDM-L-Full
1.96
FAD
· 2023-01-29
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
Code
#10
Make-An-Audio 2
1.8
FAD
· 2023-05-29
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
Code
#11
CoDi
1.8
FAD
· 2023-05-19
Any-to-Any Generation via Composable Diffusion
Code
#12
Auffusion-Full
1.76
FAD
· 2024-01-02
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
Code
#13
Auffusion
1.63
FAD
· 2024-01-02
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
Code
#14
TANGO
1.59
FAD
· 2023-04-24
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Code
#15
AudioLDM 2-AC-Large
1.42
FAD
· 2023-08-10
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Code
#16
Re-AudioLDM-L
1.37
FAD
· 2023-09-14
Retrieval-Augmented Text-to-Audio Generation
#17
GenAu-Large
1.21
FAD
· 2024-06-27
Taming Data and Transformers for Audio Generation
Code
#18
Audiobox Sound
0.77
FAD
· 2023-12-25
Audiobox: Unified Audio Generation with Natural Language Prompts