Video Dataset
Storytelling Video Dataset (Russian, Emotion, Gesture, Speech)
The Storytelling Video Dataset is a high-quality, human-reviewed multimodal dataset featuring over 700 full-body video recordings of native Russian speakers. Each video is 10+ minutes long and includes synchronized speech, facial expressions, gestures, and emotional variation. The dataset is ideal for research and development in:
🗣️ Speech-to-text & voice modeling (Russian)
😃 Emotion & gesture recognition
🤖 Multimodal learning & LLM alignment
🧍 Avatar generation & digital human training
Key Features:
Full-body framing (waist-up or knee-up)
Clear facial expressions and gestures
High-quality speech audio in Russian
Natural storytelling with emotional diversity
Preview (10 Participants): We’ve prepared a short compilation of 10 different speakers from the dataset. Each clip is taken from a unique 10-minute unscripted video, featuring full-body framing, gestures, and emotional speech.
📺 Watch the sample: https://drive.google.com/drive/folders/1QtC2il-Qb62nZNJlOtG8WvOf-SSkSGM-?usp=drive_link – Video Preview
Dataset Pages:
🐙 GitHub: github.com/MaratDV
📊 DataHub: datahub.io/@MaratDV/storytelling-video-dataset
email chinzad@gmail.com or telegram @Marat_DV