19,997 machine learning datasets
19,997 dataset results
IllusionVQA is a Visual Question Answering (VQA) dataset with two sub-tasks. The first task tests comprehension on 435 instances in 12 optical illusion categories. Each instance consists of an image with an optical illusion, a question, and 3 to 6 options, one of which is the correct answer. We refer to this task as Logo IllusionVQA-Comprehension. The second task tests how well VLMs can differentiate geometrically impossible objects from ordinary objects when two objects are presented side by side. The task consists of 1000 instances following a similar format to the first task. We refer to this task as Logo IllusionVQA-Soft-Localization.
xMIND is an open, large-scale multilingual news dataset for multi- and cross-lingual news recommendation. xMIND is derived from the English MIND dataset using open-source neural machine translation (i.e., NLLB 3.3B).
DevBench is a comprehensive benchmark designed to evaluate Large Language Models (LLMs) across various stages of the software development lifecycle. It covers critical steps such as software design, environment setup, implementation, acceptance testing, and unit testing. By integrating these interconnected tasks under a single framework, DevBench offers a holistic perspective on the potential of LLMs for automated software development1.
SeaEval is a benchmark designed for evaluating multilingual foundation models (FMs). These large language models (LLMs) have demonstrated impressive generalizability and adaptability across various downstream tasks. The SeaEval benchmark goes beyond standard accuracy metrics and investigates how well these models understand and reason with natural language, as well as their comprehension of cultural practices, nuances, and values¹.
Human face Deepfake dataset sampled from large datasets
DAD-3DHeads dataset consists of 44,898 images collected from various sources (37,840 in the training set, 4,312 in the validation set, and 2,746 in the test set).
The dataset, comprising 1204 meticulously curated images, serves as a comprehensive resource for advancing real-time mosquito detection models. The dataset is strategically divided into training, validation, and test sets, accounting for 87%, 8%, and 5% of the images, respectively. A rigorous preprocessing phase involves auto-orientation and resizing to standardize dimensions at 640x640 pixels. To ensure dataset integrity, the filter null criterion mandates that all images must contain annotations. Augmentations, including flips, rotations, crops, and grayscale applications, enhance the dataset's diversity, fostering robust model training. With a focus on quality and variety, this dataset provides a solid foundation for evaluating and enhancing real-time mosquito detection models.
The dataset comprises 4,500 question-answer pairs collected from trusted medical sources, with at least one answer and at most four unique paraphrased answers per question
OpenTrench3D, the first publicly available point cloud dataset of underground utilities from open trenches. It features 310 fully annotated point clouds consisting of a total of 528 million points categorised into 5 unique classes. OpenTrench3D consists of photogrammetrically derived 3D point clouds capturing detailed scenes of open trenches, revealing underground utilities.
MMCode is a multi-modal code generation dataset designed to evaluate the problem-solving skills of code language models in visually rich contexts (i.e. images). It contains 3,548 questions paired with 6,620 images, derived from real-world programming challenges across 10 code competition websites, with Python solutions and tests provided. The dataset emphasizes the extreme demand for reasoning abilities, the interwoven nature of textual and visual contents, and the occurrence of questions containing multiple images.
Hugging Face Datasets (New!) | Website | Github Repository | arXiv e-Print
This dataset is proposed in the ICLR 2024 paper: Measuring Vision-Language STEM Skills of Neural Models. The problems in the real world often require solutions, combining knowledge from STEM (science, technology, engineering, and math). Unlike existing datasets, our dataset requires the understanding of multimodal vision-language information of STEM. Our dataset features one of the largest and most comprehensive datasets for the challenge. It includes 448 skills and 1,073,146 questions spanning all STEM subjects. Compared to existing datasets that often focus on examining expert-level ability, our dataset includes fundamental skills and questions designed based on the K-12 curriculum. We also add state-of-the-art foundation models such as CLIP and GPT-3.5-Turbo to our benchmark.
We present XHate-999, a multi-domain and multilingual evaluation data set for abusive language detection. By aligning test instances across six typologically diverse languages, XHate-999 for the first time allows for disentanglement of the domain transfer and language transfer effects in abusive language detection. We conduct a series of domain- and language-transfer experiments with state-of-the-art monolingual and multilingual transformer models, setting strong baseline results and profiling XHate-999 as a comprehensive evaluation resource for abusive language detection. Finally, we show that domain- and language-adaption, via intermediate masked language modeling on abusive corpora in the target language, can lead to substantially improved abusive language detection in the target language in the zero-shot transfer setups.
The SoccerNet Game State Reconstruction task is a novel high level computer vision task that is specific to sports analytics. It aims at recognizing the state of a sport game, i.e., identifying and localizing all sports individuals (players, referees, ..) on the field based on a raw input videos. SoccerNet-GSR is composed of 200 video sequences of 30 seconds, annotated with 9.37 million line points for pitch localization and camera calibration, as well as over 2.36 million athlete positions on the pitch with their respective role, team, and jersey number.