Datasets

19,997 machine learning datasets

19,997 dataset results

Oxford Radar RobotCar Dataset

The Oxford Radar RobotCar Dataset is a radar extension to The Oxford RobotCar Dataset. It has been extended with data from a Navtech CTS350-X Millimetre-Wave FMCW radar and Dual Velodyne HDL-32E LIDARs with optimised ground truth radar odometry for 280 km of driving around Oxford, UK (in addition to all sensors in the original Oxford RobotCar Dataset).

28 papers2 benchmarksVideos

PMIndia

Consists of parallel sentences which pair 13 major languages of India with English. The corpus includes up to 56000 sentences for each language pair.

28 papers0 benchmarksTexts

RoboNet

An open database for sharing robotic experience, which provides an initial pool of 15 million video frames, from 7 different robot platforms, and study how it can be used to learn generalizable models for vision-based robotic manipulation.

28 papers0 benchmarks

SegTHOR (Segmentation of THoracic Organs at Risk)

SegTHOR (Segmentation of THoracic Organs at Risk) is a dataset dedicated to the segmentation of organs at risk (OARs) in the thorax, i.e. the organs surrounding the tumour that must be preserved from irradiations during radiotherapy. In this dataset, the OARs are the heart, the trachea, the aorta and the esophagus, which have varying spatial and appearance characteristics. The dataset includes 60 3D CT scans, divided into a training set of 40 and a test set of 20 patients, where the OARs have been contoured manually by an experienced radiotherapist.

28 papers0 benchmarksImages, Medical

SensatUrban

The SensatUrbat dataset is an urban-scale photogrammetric point cloud dataset with nearly three billion richly annotated points, which is five times the number of labeled points than the existing largest point cloud dataset. The dataset consists of large areas from two UK cities, covering about 6 km^2 of the city landscape. In the dataset, each 3D point is labeled as one of 13 semantic classes, such as ground, vegetation, car, etc..

28 papers6 benchmarksPoint cloud

TextZoom

TextZoom is a super-resolution dataset that consists of paired Low Resolution – High Resolution scene text images. The images are captured by cameras with different focal length in the wild.

28 papers18 benchmarksImages

WikiCoref

WikiCoref is an English corpus annotated for anaphoric relations, where all documents are from the English version of Wikipedia.

28 papers1 benchmarks

IndicCorp

IndicCorp is a large monolingual corpora with around 9 billion tokens covering 12 of the major Indian languages. It has been developed by discovering and scraping thousands of web sources - primarily news, magazines and books, over a duration of several months.

28 papers0 benchmarksTexts

MHIST (Minimalist Histopathology image analysis dataset)

The minimalist histopathology image analysis dataset (MHIST) is a binary classification dataset of 3,152 fixed-size images of colorectal polyps, each with a gold-standard label determined by the majority vote of seven board-certified gastrointestinal pathologists. MHIST also includes each image’s annotator agreement level. As a minimalist dataset, MHIST occupies less than 400 MB of disk space, and a ResNet-18 baseline can be trained to convergence on MHIST in just 6 minutes using approximately 3.5 GB of memory on a NVIDIA RTX 3090. As example use cases, the authors use MHIST to study natural questions that arise in histopathology image classification such as how dataset size, network depth, transfer learning, and high-disagreement examples affect model performance.

28 papers1 benchmarksBiology, Images

MIT-Adobe FiveK

The MIT-Adobe FiveK dataset consists of 5,000 photographs taken with SLR cameras by a set of different photographers. They are all in RAW format; that is, all the information recorded by the camera sensor is preserved. We made sure that these photographs cover a broad range of scenes, subjects, and lighting conditions. We then hired five photography students in an art school to adjust the tone of the photos. Each of them retouched all the 5,000 photos using a software dedicated to photo adjustment (Adobe Lightroom) on which they were extensively trained. We asked the retouchers to achieve visually pleasing renditions, akin to a postcard. The retouchers were compensated for their work.

28 papers4 benchmarksImages

MixATIS

Dataset is constructed from single intent dataset ATIS.

28 papers2 benchmarks

AGENT

Inspired by cognitive development studies on intuitive psychology, we present a benchmark consisting of a large dataset of procedurally generated 3D animations, AGENT (Action, Goal, Efficiency, coNstraint, uTility), structured around four scenarios (goal preferences, action efficiency, unobserved constraints, and cost-reward trade-offs) that probe key concepts of core intuitive psychology.

28 papers0 benchmarks

WNUT-2020 Task 2 (WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets)

Briefly describe the dataset. Provide:

28 papers2 benchmarks

ADNI (Alzheimer's Disease NeuroImaging Initiative)

Alzheimer's Disease Neuroimaging Initiative (ADNI) is a multisite study that aims to improve clinical trials for the prevention and treatment of Alzheimer’s disease (AD).[1] This cooperative study combines expertise and funding from the private and public sector to study subjects with AD, as well as those who may develop AD and controls with no signs of cognitive impairment.[2] Researchers at 63 sites in the US and Canada track the progression of AD in the human brain with neuroimaging, biochemical, and genetic biological markers.[2][3] This knowledge helps to find better clinical trials for the prevention and treatment of AD. ADNI has made a global impact,[4] firstly by developing a set of standardized protocols to allow the comparison of results from multiple centers,[4] and secondly by its data-sharing policy which makes available all at the data without embargo to qualified researchers worldwide.[5] To date, over 1000 scientific publications have used ADNI data.[6] A number of oth

28 papers8 benchmarksMRI

KITTI MOTS (KITTI Multi-Object Tracking and Segmentation (MOTS) Evaluation)

The Multi-Object and Segmentation (MOTS) benchmark [2] consists of 21 training sequences and 29 test sequences. It is based on the KITTI Tracking Evaluation 2012 and extends the annotations to the Multi-Object and Segmentation (MOTS) task. To this end, we added dense pixel-wise segmentation labels for every object. We evaluate submitted results using the metrics HOTA, CLEAR MOT, and MT/PT/ML. We rank methods by HOTA [1]. Our development kit and GitHub evaluation code provide details about the data format as well as utility functions for reading and writing the label files. (adapted for the segmentation case). Evaluation is performed using the code from the TrackEval repository.

28 papers3 benchmarksImages, Tracking, Videos

COCO 10% labeled data

Semi-Supervised Object Detection on COCO 10% labeled data

28 papers5 benchmarksImages

WPC (Waterloo Point Cloud)

The WPC (Waterloo Point Cloud) database is a dataset for subjective and objective quality assessment of point clouds.

28 papers4 benchmarks3D, Point cloud

MetaShift

MetaShift is a collection of 12,868 sets of natural images across 410 classes. It can be used to benchmark and evaluate how robust machine learning models are to data shifts.

28 papers0 benchmarksImages

BBBP (Blood-Brain Barrier Penetration)

The BBBP dataset comes from a study focused on modeling and predicting the permeability of the blood-brain barrier. The BBBP dataset contains binary labels indicating whether a compound can penetrate the blood-brain barrier (BBB) or not. Researchers use this dataset to develop and evaluate machine learning methods for predicting BBB permeability. It’s a critical task because understanding which compounds can cross the BBB is essential for drug discovery and designing therapeutics for neurological conditions.

28 papers1 benchmarks

PSG Dataset

PSG dataset has 48749 images with 133 object classes (80 objects and 53 stuff) and 56 predicate classes. It annotates inter-segment relations based on COCO panoptic segmentation.

28 papers6 benchmarksImages

PreviousPage 84 of 1000Next