Datasets

19,997 machine learning datasets

19,997 dataset results

UK biobank

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

SuperAnimal-Quadruped

Introduction This dataset supports Ye et al. 2024 Nature Communications.

The newly introduced UP-COUNT dataset includes drone footage captured with cameras from the DJI Mini 2 family UAV. It encompasses diverse environments, including streets, plazas, public transport stops, parks and other green recreation places. We recorded 202 unique videos and then extracted frames with a step of one second, resulting in 10,000 images with a resolution of 3840 × 2160 pixels. The recordings were taken at different altitudes and speeds of flight, and with various densities of people. Acquisition conditions vary in daytime and lighting, creating challenging shadows. Extra altitude information is provided for each image. Next, the labels of people’s heads were hand-prepared, resulting in 352,487 instances. During the labelling process, each image was marked and checked by two different people, and the continuity of labels within each sequence was reviewed. The lowest- (26.0 meters) and the highest-altitude (101.0 meters) recorded among the sequences, with an average of 60.

3 papers1 benchmarksImages

MM-OR

Operating rooms (ORs) are complex, high-stakes environments requiring precise understanding of interactions among medical staff, tools, and equipment for enhancing surgical assistance, situational awareness, and patient safety. Current datasets fall short in scale, realism and do not capture the multimodal nature of OR scenes, limiting progress in OR modeling. To this end, we introduce MM-OR, a realistic and large-scale multimodal spatiotemporal OR dataset, and the first dataset to enable multimodal scene graph generation. MM-OR captures comprehensive OR scenes containing RGB-D data, detail views, audio, speech transcripts, robotic logs, and tracking data and is annotated with panoptic segmentations, semantic scene graphs, and downstream task labels. Further, we propose MM2SG, the first multimodal large vision-language model for scene graph generation, and through extensive experiments, demonstrate its ability to effectively leverage multimodal inputs. Together, MM-OR and MM2SG establi

3 papers7 benchmarks3D, Audio, Graphs, Images, Medical, Point cloud, RGB-D, Speech, Texts, Time series, Videos

EAGLE

The automated recognition of different vehicle classes and their orientation on aerial images is an important task in the field of traffic research and also finds applications in disaster management, among other things. For the further development of corresponding algorithms that deliver reliable results not only under laboratory conditions but also in real scenarios, training data sets that are as extensive and versatile as possible play a decisive role. For this purpose, we present our dataset EAGLE (oriEnted vehicle detection using Aerial imaGery in real-worLd scEnarios).

3 papers0 benchmarksImages

MPEblink

The pioneering eyeblink detection dataset is characterized by three key features: (1) Sample with multi-human instances. (2) Unconstrained in-the-wild scenarios. (3) Untrimmed videos. These attributes make the dataset more challenging and better aligned with real-world scenarios.

3 papers1 benchmarksVideos

Knowledge

Collected by cleaning data from knowledge-intensive websites like Wikipedia and science and technology reports, and processing it using reverse engineering techniques.

3 papers0 benchmarksTexts

High-Quality Invoice Images for OCR

dataset link : https://www.kaggle.com/datasets/osamahosamabdellatif/high-quality-invoice-images-for-ocr

3 papers0 benchmarks

Motion-X++

In this paper, we introduce Motion-X++, a large-scale multimodal 3D expressive whole-body human motion dataset. Existing motion datasets predominantly capture body-only poses, lacking facial expressions, hand gestures, and fine-grained pose descriptions, and are typically limited to lab settings with manually labeled text descriptions, thereby restricting their scalability. To address this issue, we develop a scalable annotation pipeline that can automatically capture 3D whole-body human motion and comprehensive textural labels from RGB videos and build the Motion-X dataset comprising 81.1K text-motion pairs. Furthermore, we extend Motion-X into Motion-X++ by improving the annotation pipeline, introducing more data modalities, and scaling up the data quantities. Motion-X++ provides 19.5M 3D whole-body pose annotations covering 120.5K motion sequences from massive scenes, 80.8K RGB videos, 45.3K audios, 19.5M frame-level whole-body pose descriptions, and 120.5K sequence-level semantic l

3 papers0 benchmarks

BRIGHT

BRIGHT is the first open-access, globally distributed, event-diverse multimodal dataset specifically curated to support AI-based disaster response. It covers five types of natural disasters and two types of man-made disasters across 14 disaster events in 23 regions worldwide, with a particular focus on developing countries.

3 papers1 benchmarksImages

CURIE (CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning)

The data is organized into eight domain-specific subfolders: "biogr", "dft", "pdb", "geo", "mpve", "qecc_65", "hfd", and "hfe". Each subfolder contains two further subfolders: "ground_truth" and "inputs". Within these, each data instance is stored in a JSON file named record_id.json, where record_id is a unique identifier. The "biogr" domain also includes image inputs as record_id.png files alongside the corresponding JSON.

3 papers0 benchmarks

UIIS10K (General Underwater Image Instance Segmentation dataset 10K)

We propose a large-scale underwater instance segmentation dataset, UIIS10K, which includes 10,048 images with pixel-level annotations for 10 categories. As far as we know, this is the largest underwater instance segmentation dataset available and can be used as a benchmark for evaluating underwater segmentation methods.

3 papers0 benchmarksImages, Texts

HPLT v2

Multilingual text collection extracted from the Internet Archive and Common Crawl archives. Intended to train large language models.

3 papers0 benchmarksTexts

Indoor-6

The Indoor-6 dataset was created from multiple sessions captured in six indoor scenes over multiple days. The pseudo ground truth (pGT) 3D point clouds and camera poses for each scene are computed using COLMAP. All training data uses only colmap reconstruction from training images. Compared to 7-scenes, the scenes in Indoor-6 are larger, have multiple rooms, contains illumination variations as the images span multiple days and different times of day.

3 papers0 benchmarksImages

Seaquest - OpenAI Gym

Dataset: The experiments are conducted using the Seaquest environment from the OpenAI Gym framework, which simulates the Atari 2600 game Seaquest. The dataset consists of RGB frames (210x160x3) generated dynamically during training. These frames are preprocessed by converting to grayscale, resizing to 84x84 pixels, and stacking four consecutive frames to form a 4x84x84 tensor, capturing temporal dynamics of the game state. No external or pre-collected dataset is used; the data is produced through real-time interaction with the Gym environment.

3 papers1 benchmarks

EGY-BCD

Bi-temporal images in the EGY-BCD dataset are taken from 4 different regions located in Egypt, including New Mansoura, El Galala City, New Cairo, and New Thebes. The figure below shows the building changes in New Mansoura City and New Thebes. Our image data capture time varies from 2017 to 2022. The images feature seasonal changes and different lighting changes in our new dataset, which can help develop effective methods that can mitigate the impact of unrelated changes on real changes.

3 papers1 benchmarks

CLCD (Cropland-CD)

The CLCD dataset consists of 600 pairs image of cropland change samples, with 360 pairs for training, 120 pairs for validation and 120 pairs for testing. The bi-temporal images in CLCD were collected by Gaofen-2 in Guangdong Province, China, in 2017 and 2019, respectively, with spatial resolution ranged from 0.5 to 2 m. Each group of samples is composed of two images of 512 × 512 and a corresponding binary label of cropland change.

3 papers1 benchmarks

Benchmark for AMR Metrics based on Overt Objectives

Benchmark for AMR Metrics based on Overt Objectives (Bamboo), the first benchmark to support empirical assessment of graph-based MR similarity metrics. Bamboo maximizes the interpretability of results by defining multiple overt objectives that range from sentence similarity objectives to stress tests that probe a metric’s robustness against meaning-altering and meaning- preserving graph transformations.

3 papers2 benchmarks

DCASE 2017

The DCASE 2017 rare sound events dataset contains isolated sound events for three classes: 148 crying babies (mean duration 2.25s), 139 glasses breaking (mean duration 1.16s), and 187 gun shots (mean duration 1.32s). As with the DCASE 2016 data, silences are not excluded from active event markings in the annotations. While this data set contains many samples per class, there are only three classes

2 papers0 benchmarksAudio

Stanford40 (Stanford 40 Actions)

The Stanford 40 Action Dataset contains images of humans performing 40 actions. In each image, we provide a bounding box of the person who is performing the action indicated by the filename of the image. There are 9532 images in total with 180-300 images per action class.

2 papers1 benchmarks

PreviousPage 295 of 1000Next