Datasets

383 machine learning datasets

383 dataset results

AFLW2000-3D

AFLW2000-3D is a dataset of 2000 images that have been annotated with image-level 68-point 3D facial landmarks. This dataset is used for evaluation of 3D facial landmark detection models. The head poses are very diverse and often hard to be detected by a CNN-based face detector.

117 papers45 benchmarks3D, Images

smallNORB

The smallNORB dataset is a datset for 3D object recognition from shape. It contains images of 50 toys belonging to 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. The objects were imaged by two cameras under 6 lighting conditions, 9 elevations (30 to 70 degrees every 5 degrees), and 18 azimuths (0 to 340 every 20 degrees). The training set is composed of 5 instances of each category (instances 4, 6, 7, 8 and 9), and the test set of the remaining 5 instances (instances 0, 1, 2, 3, and 5).

112 papers1 benchmarks3D, Images

Chairs

The Chairs dataset contains rendered images of around 1000 different three-dimensional chair models.

109 papers4 benchmarks3D, Images

AMOS

Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods. To mitigate the limitations, we present AMOS, a large-scale, diverse, clinical dataset for abdominal organ segmentation. AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under

105 papers1 benchmarks3D, Medical

BP4D

The BP4D-Spontaneous dataset is a 3D video database of spontaneous facial expressions in a diverse group of young adults. Well-validated emotion inductions were used to elicit expressions of emotion and paralinguistic communication. Frame-level ground-truth for facial actions was obtained using the Facial Action Coding System. Facial features were tracked in both 2D and 3D domains using both person-specific and generic approaches. The database includes forty-one participants (23 women, 18 men). They were 18 – 29 years of age; 11 were Asian, 6 were African-American, 4 were Hispanic, and 20 were Euro-American. An emotion elicitation protocol was designed to elicit emotions of participants effectively. Eight tasks were covered with an interview process and a series of activities to elicit eight emotions. The database is structured by participants. Each participant is associated with 8 tasks. For each task, there are both 3D and 2D videos. As well, the Metadata include manually annotated

104 papers21 benchmarks3D, Images, Videos

ABC Dataset

The ABC Dataset is a collection of one million Computer-Aided Design (CAD) models for research of geometric deep learning methods and applications. Each model is a collection of explicitly parametrized curves and surfaces, providing ground truth for differential quantities, patch segmentation, geometric feature detection, and shape reconstruction. Sampling the parametric descriptions of surfaces and curves allows generating data in different formats and resolutions, enabling fair comparisons for a wide range of geometric learning algorithms.

104 papers0 benchmarks3D

ReferIt3D

ReferIt3D provides two large-scale and complementary visio-linguistic datasets: i) Sr3D, which contains 83.5K template-based utterances leveraging spatial relations among fine-grained object classes to localize a referred object in a scene, and ii) Nr3D which contains 41.5K natural, free-form, utterances collected by deploying a 2-player object reference game in 3D scenes. This dataset can be used for 3D visual grounding and 3D dense captioning tasks.

104 papers0 benchmarks3D, Point cloud, Texts

HM3D (Habitat-Matterport 3D)

Habitat-Matterport 3D (HM3D) is a large-scale dataset of 1,000 building-scale 3D reconstructions from a diverse set of real-world locations. Each scene in the dataset consists of a textured 3D mesh reconstruction of interiors such as multi-floor residences, stores, and other private indoor spaces.

96 papers0 benchmarks3D

T-LESS

T-LESS is a dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The dataset features thirty industry-relevant objects with no significant texture and no discriminative color or reflectance properties. The objects exhibit symmetries and mutual similarities in shape and/or size. Compared to other datasets, a unique property is that some of the objects are parts of others. The dataset includes training and test images that were captured with three synchronized sensors, specifically a structured-light and a time-of-flight RGB-D sensor and a high-resolution RGB camera. There are approximately 39K training and 10K test images from each sensor. Additionally, two types of 3D models are provided for each object, i.e. a manually created CAD model and a semi-automatically reconstructed one. Training images depict individual objects against a black background. Test images originate from twenty test scenes having varying complexity, which increases from

94 papers6 benchmarks3D, Images, RGB-D

Aachen Day-Night

Aachen Day-Night is a dataset designed for benchmarking 6DOF outdoor visual localization in changing conditions. It focuses on localizing high-quality night-time images against a day-time 3D model. There are 14,607 images with changing conditions of weather, season and day-night cycles.

93 papers0 benchmarks3D, Images

FaceWarehouse

FaceWarehouse is a 3D facial expression database that provides the facial geometry of 150 subjects, covering a wide range of ages and ethnic backgrounds.

91 papers1 benchmarks3D, Images

PartNet-Mobility

Dataset produced for the SAPIEN simulation environment. From the website: "PartNet-Mobility dataset is a collection of 2K articulated objects with motion annotations and rendernig material. The dataset powers research for generalizable computer vision and manipulation. The dataset is a continuation of ShapeNet and PartNet. "

88 papers0 benchmarks3D

Structured3D

Structured3D is a large-scale photo-realistic dataset containing 3.5K house designs (a) created by professional designers with a variety of ground truth 3D structure annotations (b) and generate photo-realistic 2D images (c). The dataset consists of rendering images and corresponding ground truth annotations (e.g., semantic, albedo, depth, surface normal, layout) under different lighting and furniture configurations.

85 papers4 benchmarks3D, Images

ABO (Amazon Berkeley Objects)

ABO is a large-scale dataset designed for material prediction and multi-view retrieval experiments. The dataset contains Blender renderings of 30 viewpoints for each of the 7,953 3D objects, as well as camera intrinsics and extrinsic for each rendering.

82 papers0 benchmarks3D

COMA

CoMA contains 17,794 meshes of the human face in various expressions

81 papers2 benchmarks3D, 3d meshes, Interactive

Scan2CAD

Scan2CAD is an alignment dataset based on 1506 ScanNet scans with 97607 annotated keypoints pairs between 14225 (3049 unique) CAD models from ShapeNet and their counterpart objects in the scans. The top 3 annotated model classes are chairs, tables and cabinets which arises due to the nature of indoor scenes in ScanNet. The number of objects aligned per scene ranges from 1 to 40 with an average of 9.3.

72 papers2 benchmarks3D, 3d meshes, Cad

BABEL

BABEL is a large dataset with language labels describing the actions being performed in mocap sequences. BABEL consists of action labels for about 43 hours of mocap sequences from AMASS. Action labels are at two levels of abstraction -- sequence labels describe the overall action in the sequence, and frame labels describe all actions in every frame of the sequence. Each frame label is precisely aligned with the duration of the corresponding action in the mocap sequence, and multiple actions can overlap. There are over 28k sequence labels, and 63k frame labels in BABEL, which belong to over 250 unique action categories. Labels from BABEL can be leveraged for tasks like action recognition, temporal action localization, motion synthesis, etc.

72 papers3 benchmarks3D

PreviousPage 2 of 20Next

Datasets

AFLW2000-3D

smallNORB

Chairs

AMOS

BP4D

ABC Dataset

ReferIt3D

HM3D (Habitat-Matterport 3D)

T-LESS

Aachen Day-Night

FaceWarehouse

PartNet-Mobility

Structured3D

ABO (Amazon Berkeley Objects)

COMA

Scan2CAD

BABEL

SemanticPOSS

MuPoTS-3D (Multiperson Pose Test Set in 3DMulti-person Pose estimation Test Set in 3D)

OmniObject3D

Datasets

AFLW2000-3D

smallNORB

Chairs

AMOS

BP4D

ABC Dataset

ReferIt3D

HM3D (Habitat-Matterport 3D)

T-LESS

Aachen Day-Night

FaceWarehouse

PartNet-Mobility

Structured3D

ABO (Amazon Berkeley Objects)

COMA

Scan2CAD

BABEL

SemanticPOSS

MuPoTS-3D (Multiperson Pose Test Set in 3DMulti-person Pose estimation Test Set in 3D)

OmniObject3D