Datasets

486 machine learning datasets

486 dataset results

VocSim (Vocal Similarity Benchmark)

$\textbf{VocSim (Vocal Similarity Benchmark)}$ is a benchmark designed to evaluate the ability of neural audio embeddings to capture acoustic and perceptual similarity in a $\textbf{zero-shot setting}$, without task-specific fine-tuning. It addresses the challenge of creating audio representations that $\textbf{generalize across diverse sound types}$, aiming to mirror the flexibility and nuanced sensitivity of biological auditory systems. The benchmark is built upon the diverse $\textbf{VocSim dataset}$, comprising $\textbf{125,382 audio clips}$ aggregated from 19 distinct sources. This includes Human Speech (phones, words, utterances, and non-verbal sounds from multiple languages, including specific blind test subsets from indigenous languages), Animal Vocalizations (songbird syllables and calls like zebra finch, Bengalese finch, canary, and giant otter calls), and Environmental Sounds (everyday environmental noises from ESC-50). The dataset is curated into these 19 subsets to stress

0 papers0 benchmarksAudio

THVD (Talking Head Video Dataset)

About

0 papers0 benchmarks3D, Actions, Audio, Environment, Speech, Videos

FortisAVQA

We introduce FortisAVQA, a dataset designed to assess the robustness of AVQA models. Its construction involves two key processes: rephrasing and splitting. Rephrasing modifies questions from the test set of MUSIC-AVQA to enhance linguistic diversity, thereby mitigating the reliance of models on spurious correlations between key question terms and answers. Splitting entails the automatic and reasonable categorization of questions into frequent (head) and rare (tail) subsets, enabling a more comprehensive evaluation of model performance in both in-distribution and out-of-distribution scenarios.

0 papers0 benchmarksAudio, Images, Texts, Videos

Video Dataset (Storytelling Video Dataset (Russian, Emotion, Gesture, Speech))

The Storytelling Video Dataset is a high-quality, human-reviewed multimodal dataset featuring over 700 full-body video recordings of native Russian speakers. Each video is 10+ minutes long and includes synchronized speech, facial expressions, gestures, and emotional variation. The dataset is ideal for research and development in:

0 papers0 benchmarksAudio, Speech, Texts, Videos

[[Human!!!Support]] How do I get a human at Expedia?

How do I get a human at Expedia? To get a human at Expedia customer service, call +1-888--829--0881 or +1-805-330-4056. or +1-805-330-4056. Expedia’s cancellation policy includes a "24-Hour Free Cancellation" feature, allowing you to cancel most bookings within 24 hours without facing a penalty, provided the check-in date is at least 5 days away. This applies to most hotel reservations, flights, and car rentals. Can I change my Expedia flight within 24 hours? To speak with Expedia customer service call +1-888--829--0881 or +1-805-330-4056. . Expedia's refund and cancellation policy varies by airline, hotel, or service provider. Generally, many flights and hotels offer a 24-hour cancellation policy for full refunds, provided you book at least seven days in advance. For more details, contact +1-888--829--0881 or +1-805-330-4056. . Does Expedia have a 24 hour refund policy? To speak with Expedia customer service call +1-888--829--0881 or +1-805-330-4056. . To speak with someone at Expedia

0 papers0 benchmarksAudio

DEAR

Dataset Summary The Deep Evaluation of Audio Representations (DEAR) dataset is a benchmark designed to assess general-purpose audio foundation models on properties critical for hearable devices. It comprises 1,158 mono audio tracks (30 s each), spatially mixing proprietary anechoic speech monologues with high-quality everyday acoustic scene recordings from the HOA‑SSR library. DEAR enables controlled evaluation of:

0 papers0 benchmarksAudio

PreviousPage 25 of 25