Datasets

19,997 machine learning datasets

19,997 dataset results

MUSIC (Multi-Spectral Imaging via Computed Tomography)

The Multi-Spectral Imaging via Computed Tomography (MUSIC) dataset is a two-part (2D- and 3D spectral) open access dataset for advanced image analysis of spectral radiographic (x-ray) scans, their tomographic reconstruction and the detection of specific materials within such scans. The scans operate at a photon energy range of around 20 keV up to 160 keV.

2 papers0 benchmarksHyperspectral images

FSVQA (Full-Sentence Visual Question Answering)

Full-Sentence Visual Question Answering (FSVQA) dataset, consisting of nearly 1 million pairs of questions and full-sentence answers for images, built by applying a number of rule-based natural language processing techniques to original VQA dataset and captions in the MS COCO dataset.

2 papers0 benchmarksImages, Texts

VQA 360°

VQA 360° is a dataset for visual question answering on 360° images containing around 17,000 real-world image-question-answer triplets for a variety of question types.

2 papers0 benchmarksImages, Texts

PHSPD (Polarization Human Shape and Pose Dataset)

PHSPD is a home-grown polarization image dataset of various human shapes and poses.

2 papers0 benchmarksImages

Multi Task Crowd

Multi Task Crowd is a new 100 image dataset fully annotated for crowd counting, violent behaviour detection and density level classification.

2 papers0 benchmarksImages

PointPattern

PointPattern is a graph classification dataset constructed by simple point patterns from statistical mechanics. The authors simulated three point patterns in 2D: hard disks in equilibrium (HD), Poisson point process, and random sequential adsorption (RSA) of disks. The HD and Poisson distributions can be seen as simple models that describe the microstructures of liquids and gases while the RSA is a nonequilibrium stochastic process that introduces new particles one by one subject to nonoverlapping conditions.

2 papers0 benchmarksGraphs

ASD (Annotated Semantic Dataset)

The Annotated Semantic Dataset is composed of $11$ videos, divided in $3$ activity categories: Biking; Driving and Walking, according to their amount of semantic information. The classes are: $0p$, which represents the videos with approximately no semantic information; $25p$, for the videos containing relevant semantic information in ∼$25%$ of its frames ; the same ideia for the classes $50p$ and $75p$, The videos were record using a GoPro Hero 3 camera mounted in a helmet for the Biking and Walking videos and attached to a head strap for the Driving videos.

2 papers0 benchmarksVideos

News Interactions on Globo.com (News Portal User Interactions by Globo.com - A large dataset for news recommendations offline evaluation and analytics)

Context This large dataset with users interactions logs (page views) from a news portal was kindly provided by Globo.com, the most popular news portal in Brazil, for reproducibility of the experiments with CHAMELEON - a meta-architecture for contextual hybrid session-based news recommender systems. The source code was made available at GitHub.

2 papers0 benchmarksTabular

C&Z

One of the first datasets (if not the first) to highlight the importance of bias and diversity in the community, which started a revolution afterwards. Introduced in 2014 as integral part of a thesis of Master of Science [1,2] at Carnegie Mellon and City University of Hong Kong. It was later expanded by adding synthetic images generated by a GAN architecture at ETH Zürich (in HDCGAN by Curtó et al. 2017). Being then not only the pioneer of talking about the importance of balanced datasets for learning and vision but also for being the first GAN augmented dataset of faces.

2 papers0 benchmarksImages

TRN (Toulouse Road Network)

The Toulouse Road Network dataset describes patches of road maps from the city of Toulouse, represented both as spatial graphs G = (A, X) and as grayscale segmentation images.

2 papers0 benchmarksGraphs, Images

FRGC-Morphs

FRGC-Morphs is a dataset of morphed faces selected from the publicly available FRGC dataset [1].

2 papers0 benchmarksImages

ISOT Fake News Dataset

The ISOT Fake News dataset is a compilation of several thousands fake news and truthful articles, obtained from different legitimate news sites and sites flagged as unreliable by Politifact.com.

2 papers0 benchmarksTexts

PS-Plant dataset

Automated leaf segmentation is a challenging area in computer vision. Recent advances in machine learning approaches allowed to achieve better results than traditional image processing techniques; however, training such systems often require large annotated data sets. To contribute with annotated data sets and help to overcome this bottleneck in plant phenotyping research, here we provide a novel photometric stereo (PS) data set with annotated leaf masks. This data set forms part of the work done in the BBSRC Tools and Resources Development project BB/N02334X/1.

2 papers0 benchmarks3D, Images, Time series

Clinical Admission Notes from MIMIC-III

This dataset is created from MIMIC-III (Medical Information Mart for Intensive Care III) and contains simulated patient admission notes. The clinical notes contain information about a patient at admission time to the ICU and are labelled for four outcome prediction tasks: Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay.

2 papers6 benchmarksTexts

3D Platelet EM (Platelet Electron Microscopy)

The platelet-em dataset contains two 3D scanning electron microscope (EM) images of human platelets, as well as instance and semantic segmentations of those two image volumes. This data has been reviewed by NIBIB, contains no PII or PHI, and is cleared for public release. All files use a multipage uint16 TIF format. A 3D image with size [Z, X, Y] is saved as Z pages of size [X, Y]. Image voxels are approximately 40x10x10 nm

2 papers4 benchmarks3D, Biology

MHSMA (The Modified Human Sperm Morphology Analysis)

The MHSMA dataset is a collection of human sperm images from 235 patients with male factor infertility. Each image is labeled by experts for normal or abnormal sperm acrosome, head, vacuole, and tail.

2 papers0 benchmarksImages, Medical

The Annotated Gumar Corpus

2 papers0 benchmarks

IG-3.5B-17k

IG-3.5B-17k is an internal Facebook AI Research dataset for training image classification models. It consists of hashtags for up to 3.5 billion public Instagram images.

2 papers0 benchmarksImages, Texts

UBOFAB19 (SVBRDF Database Bonn)

A database of several hundred high quality fabric material measurements, provided as carefully calibrated rectified HDR images, together with SVBRDF fits.

2 papers0 benchmarksImages

DRI Corpus (Dr. Inventor Multi-layer Scientific Corpus)

The Dr. Inventor Multi-Layer Scientific Corpus (DRI Corpus) includes 40 Computer Graphics papers, selected by domain experts. Each paper of the Corpus has been annotated by three annotators by providing the following layers of annotations, each one characterizing a core aspect of scientific publications:

2 papers3 benchmarksTexts

PreviousPage 307 of 1000Next