Twitch-FIFA
TextsVideos
Twitch-FIFA is video-context, many-speaker dialogue dataset based on live-broadcast soccer game videos and chats from Twitch.tv. This dataset can be used to train visually-grounded dialogue models that generate relevant temporal and spatial event language from the live video, while also being relevant to the chat history.
Source: https://github.com/ramakanth-pasunuru/video-dialogue