Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Audio
/
Audio Generation
/
AudioCaps
Audio Generation on AudioCaps
Metric: FD (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
FD (best first)
FD (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
FD
▼
Extra Data
Paper
Date
↕
Code
1
Diffsound
47.68
No
Diffsound: Discrete Diffusion Model for Text-to-...
2022-07-20
Code
2
AudioLDM2-large
26.18
No
AudioLDM 2: Learning Holistic Audio Generation w...
2023-08-10
Code
3
TANGO
24.52
No
Text-to-Audio Generation using Instruction-Tuned...
2023-04-24
Code
4
AudioLDM-L-Full
23.31
No
AudioLDM: Text-to-Audio Generation with Latent D...
2023-01-29
Code
5
Auffusion-Full
23.08
No
Auffusion: Leveraging the Power of Diffusion and...
2024-01-02
Code
6
CoDi
22.9
No
Any-to-Any Generation via Composable Diffusion
2023-05-19
Code
7
Auffusion
21.99
No
Auffusion: Leveraging the Power of Diffusion and...
2024-01-02
Code
8
Consistency TTA (Single-step generation)
20.44
No
ConsistencyTTA: Accelerating Diffusion-Based Tex...
2023-09-19
Code
9
Make-An-Audio
18.32
No
Make-An-Audio: Text-To-Audio Generation with Pro...
2023-01-30
Code
10
Tango-AF&AC-FT-AC
17.19
No
Improving Text-To-Audio Models with Synthetic Ca...
2024-06-18
Code
11
GenAu-Large
16.51
No
Taming Data and Transformers for Audio Generation
2024-06-27
Code
12
ETTA
13.12
No
ETTA: Elucidating the Design Space of Text-to-Au...
2024-12-26
Code
13
Make-An-Audio 2
11.75
No
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio...
2023-05-29
Code
14
ETTA-FT-AC-100k
10.1
No
ETTA: Elucidating the Design Space of Text-to-Au...
2024-12-26
Code
15
Audiobox Sound
8.3
No
Audiobox: Unified Audio Generation with Natural ...
2023-12-25
-
#1
Diffsound
SOTA
47.68
FD
· 2022-07-20
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Code
#2
AudioLDM2-large
26.18
FD
· 2023-08-10
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Code
#3
TANGO
24.52
FD
· 2023-04-24
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Code
#4
AudioLDM-L-Full
23.31
FD
· 2023-01-29
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
Code
#5
Auffusion-Full
23.08
FD
· 2024-01-02
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
Code
#6
CoDi
22.9
FD
· 2023-05-19
Any-to-Any Generation via Composable Diffusion
Code
#7
Auffusion
21.99
FD
· 2024-01-02
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
Code
#8
Consistency TTA (Single-step generation)
20.44
FD
· 2023-09-19
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
Code
#9
Make-An-Audio
18.32
FD
· 2023-01-30
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Code
#10
Tango-AF&AC-FT-AC
17.19
FD
· 2024-06-18
Improving Text-To-Audio Models with Synthetic Captions
Code
#11
GenAu-Large
16.51
FD
· 2024-06-27
Taming Data and Transformers for Audio Generation
Code
#12
ETTA
13.12
FD
· 2024-12-26
ETTA: Elucidating the Design Space of Text-to-Audio Models
Code
#13
Make-An-Audio 2
11.75
FD
· 2023-05-29
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
Code
#14
ETTA-FT-AC-100k
10.1
FD
· 2024-12-26
ETTA: Elucidating the Design Space of Text-to-Audio Models
Code
#15
Audiobox Sound
8.3
FD
· 2023-12-25
Audiobox: Unified Audio Generation with Natural Language Prompts