Consists of more than 210k videos for 310 audio classes.
Source: VGGSound: A Large-scale Audio-Visual Dataset