Datasets

3,275 machine learning datasets

3,275 dataset results

FormulaNet

FormulaNet FormulaNet is a new large-scale Mathematical Formula Detection dataset. It consists of 46'672 pages of STEM documents from arXiv and has 13 types of labels. The dataset is split into a train set of 44'338 pages and a validation set of 2'334 pages. Due to copyrights reasons, we can only provide the list of papers, which must be downloaded and processed.

2 papers0 benchmarksImages, Texts

OmniCity

OmniCity is a dataset for omnipotent city understanding from multi-level and multi-view images. It contains multi-view satellite images as well as street-level panorama and mono-view images, constituting over 100K pixel-wise annotated images that are well-aligned and collected from 25K geo-locations in New York City. This dataset introduces a new task of fine-grained building instance segmentation on street-level panorama images. It also provides new problem settings for existing tasks, such as cross-view image matching, synthesis, segmentation, detection, etc., and facilitates the developing of new methods for large-scale city understanding, reconstruction, and simulation.

2 papers0 benchmarksImages

PersonPath22

PersonPath22 is a large-scale multi-person tracking dataset containing 236 videos captured mostly from static-mounted cameras, collected from sources where we were given the rights to redistribute the content and participants have given explicit consent. Each video has ground-truth annotations including both bounding boxes and tracklet-ids for all the persons in each frame.

2 papers4 benchmarksImages

Occluded COCO

Occluded COCO is automatically generated subset of COCO val dataset, collecting partially occluded objects for a large variety of categories in real images in a scalable manner, where target object is partially occluded but the segmentation mask is connected.

2 papers1 benchmarksImages

Wind Tunnel and Flight Test Experiments

Our dataset comprises $23.468$ non-labelled and $356$ labelled samples where each sample is $512 \times 512 \times 1$ dimensional IR image collected with the thermographic measurement specifications. Some samples contain scars, shadows, salt \& pepper noises and contrast burst regions, demonstrating that realistic laminar-turbulent flow observation scenarios are subject to high noise. Besides, a laminar flow area may occur brighter or darker as compared to the regions in a turbulent flow. Due to some effect (e.g. shadowing the sun) it is even possible that, in one part of the image, the laminar flow area appears darker, and in another part, it appears brighter than the turbulent flow area.

2 papers1 benchmarksImages

xView3-SAR

Unsustainable fishing practices worldwide pose a major threat to marine resources and ecosystems. Identifying vessels that do not show up in conventional monitoring systems---known as ``dark vessels''---is key to managing and securing the health of marine environments. With the rise of satellite-based synthetic aperture radar (SAR) imaging and modern machine learning (ML), it is now possible to automate detection of dark vessels day or night, under all-weather conditions. SAR images, however, require a domain-specific treatment and are not widely accessible to the ML community. Maritime objects (vessels and offshore infrastructure) are relatively small and sparse, challenging traditional computer vision approaches. We present the largest labeled dataset for training ML models to detect and characterize vessels and ocean structures in SAR imagery. xView3-SAR consists of nearly 1,000 analysis-ready SAR images from the Sentinel-1 mission that are, on average, 29,400-by-24,400 pixels each.

2 papers1 benchmarksEnvironment, Images

Paper2Fig100k

Paper2Fig100k is a dataset with over 100k images of figures and texts from research papers. The figures show architecture diagrams and methodologies of articles available at arXiv.org from fields like artificial intelligence and computer vision. Figures usually include text and discrete objects, e.g., boxes in a diagram, with lines and arrows that connect them.

2 papers0 benchmarksImages, Texts

NSCLC-Radiomics

This collection contains images from 422 non-small cell lung cancer (NSCLC) patients. For these patients pretreatment CT scans, manual delineation by a radiation oncologist of the 3D volume of the gross tumor volume and clinical outcome data are available.

2 papers0 benchmarksImages, Medical

Cinescale (CineScale: A dataset of cinematic shot scale in movies)

We provide a database containing shot scale annotations (i.e., the apparent distance of the camera from the subject of a filmed scene) for more than 792,000 image frames. Frames belong to 124 full movies from the entire filmographies by 6 important directors: Martin Scorsese, Jean-Luc Godard, Béla Tarr, Federico Fellini, Michelangelo Antonioni, and Ingmar Bergman. Each frame, extracted from videos at 1 frame per second, is annotated on the following scale categories: Extreme Close Up (ECU), Close Up (CU), Medium Close Up (MCU), Medium Shot (MS), Medium Long Shot (MLS), Long Shot (LS), Extreme Long Shot (ELS), Foreground Shot (FS), and Insert Shots (IS). Two independent coders annotated all frames from the 124 movies, whilst a third one checked their coding and made decisions in cases of disagreement. The CineScale database enables AI-driven interpretation of shot scale data and opens to a large set of research activities related to the automatic visual analysis of cinematic material, s

2 papers0 benchmarksImages, Videos

FDH (Flickr Diverse Humans)

The Flickr Diverse Humans (FDH) dataset consists of 1.53M images of human figures from the YFCC100M dataset. Each image is annotated with keypoints, pixel-to-vertex correspondences (from CSE ) and a segmentation mask.

2 papers0 benchmarksImages

DyML-Vehicle (Dynamic Metric Learning Vehicle)

DyML-Vehicle merges two vehicle re-ID datasets PKU VehicleID [1], VERI-Wild [1]. Since these two datasets have only annotations on the identity (fine) level, we manually annotate each image with “model” label (e.g., Toyota Camry, Honda Accord, Audi A4) and “body type” label (e.g., car, suv, microbus, pickup). Moreover, we label all the taxi images as a novel testing class under coarse level.

2 papers1 benchmarksImages

DyML-Animal (Dynamic Metric Learning Animal)

DyML-Animal is based on animal images selected from ImageNet-5K [1]. It has 5 semantic scales (i.e., classes, order, family, genus, species) according to biological taxonomy. Specifically, there are 611 “species” for the fine level, 47 categories corresponding to “order”, “family” or “genus” for the middle level, and 5 “classes” for the coarse level. We note some animals have contradiction between visual perception and biological taxonomy, e.g., whale in “mammal” actually looks more similar to fish. Annotating the whale images as belonging to mammal would cause confusion to visual recognition. So we take a detailed check on potential contradictions and intentionally leave out those animals.

2 papers1 benchmarksImages

DyML-Product (Dynamic Metric Learning Product)

DyML-Product is derived from iMaterialist-2019, a hierarchical online product dataset. The original iMaterialist-2019 offers up to 4 levels of hierarchical annotations. We remove the coarsest level and maintain 3 levels for DyML-Product.

2 papers1 benchmarksImages

STIR (Scaled and Translated Image Recognition)

While convolutions are known to be invariant to (discrete) translations, scaling continues to be a challenge and most image recognition networks are not invariant to them. To explore these effects, we have created the Scaled and Translated Image Recognition (STIR) dataset. This dataset contains objects of size $s \in [17, 64]$, each randomly placed in a $64 \times 64$ pixel image.

2 papers0 benchmarksImages

Retina Benchmark

The Retina Benchmark is a set of real-world tasks that accurately reflect such complexities and are designed to assess the reliability of predictive models in safety-critical scenarios. Specifically, two publicly available datasets of high-resolution human retina images exhibiting varying degrees of diabetic retinopathy, a medical condition that can lead to blindness, are used to design a suite of automated diagnosis tasks that require reliable predictive uncertainty quantification.

2 papers0 benchmarksImages

MIAD

MIAD contains more than 100K high-resolution color images in various outdoor industrial scenarios, designed for unsupervised anomaly detection. This dataset is generated by a 3D graphics software and covers both surface and logical anomalies with pixel-precise ground truth.

2 papers0 benchmarksImages

NVD (Naturalistic Variation Object Dataset)

Naturalistic Variation Object Dataset (NVD) is a large simulated dataset of 272k images of everyday objects with naturalistic variations such as object pose, scale, viewpoint, lighting and occlusions.

2 papers0 benchmarksImages

G-VUE (General-purpose Visual Understanding Evaluation)

General-purpose Visual Understanding Evaluation (G-VUE) is a comprehensive benchmark covering the full spectrum of visual cognitive abilities with four functional domains -- Perceive, Ground, Reason, and Act. The four domains are embodied in 11 carefully curated tasks, from 3D reconstruction to visual reasoning and manipulation.

2 papers0 benchmarksImages, Texts

ExHVV

ExHVV is a novel dataset that offers natural language explanations of connotative roles for three types of entities -- heroes, villains, and victims, encompassing 4,680 entities present in 3K memes.

2 papers0 benchmarksImages, Texts

Perseus

Perseus is a dataset for Cross-Lingual Summarization (CLS) which collects about 94K Chinese scientific documents paired with English summaries. The average length of documents in Perseus is more than two thousand tokens.

2 papers0 benchmarksImages

PreviousPage 100 of 164Next