TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

ViQuAE

ViQuAE is a dataset for KVQAE (Knowledge-based Visual Question Answering about named Entities), a task which consists in answering questions about named entities grounded in a visual context using a Knowledge Base. It is the first KVQAE dataset to cover a wide range of entity types (e.g. persons, landmarks, and products). We argue that KVQAE is a clear, well-defined task that can be evaluated easily, making it suitable to track the progress of multimodal entity representation’s quality. Multimodal entity representation is a central issue that will allow to make human-machine interactions more natural. For example, while watching a movie, one might wonder ‘‘Where did I already see this actress?’’ or ‘‘Did she ever win an Oscar?’’

13 papers0 benchmarksImages, Texts

safety-gym

openai.com/blog/safety-gym/

13 papers0 benchmarks

Robust04

The goal of the Robust track is to improve the consistency of retrieval technology by focusing on poorly performing topics. In addition, the track brings back a classic, ad hoc retrieval task in TREC that provides a natural home for new participants. An ad hoc task in TREC investigates the performance of systems that search a static set of documents using previously-unseen topics. For each topic, participants create a query and submit a ranking of the top 1000 documents for that topic.

13 papers0 benchmarksTexts

CBIS-DDSM (Curated Breast Imaging Subset of Digital Database for Screening Mammography)

This CBIS-DDSM (Curated Breast Imaging Subset of DDSM) is an updated and standardized version of the Digital Database for Screening Mammography (DDSM) . The DDSM is a database of 2,620 scanned film mammography studies. It contains normal, benign, and malignant cases with verified pathology information. The scale of the database along with ground truth validation makes the DDSM a useful tool in the development and testing of decision support systems. The CBIS-DDSM collection includes a subset of the DDSM data selected and curated by a trained mammographer. The images have been decompressed and converted to DICOM format. Updated ROI segmentation and bounding boxes, and pathologic diagnosis for training data are also included. A manuscript describing how to use this dataset in detail is available at https://www.nature.com/articles/sdata2017177.

13 papers1 benchmarksMedical

Flare7K

Flare7K, the first nighttime flare removal dataset, which is generated based on the observation and statistic of real-world nighttime lens flares. It offers 5,000 scattering flare images and 2,000 reflective flare images, consisting of 25 types of scattering flares and 10 types of reflective flares. The 7,000 flare patterns can be randomly added to the flare-free images, forming the flare-corrupted and flare-free image pairs.

13 papers6 benchmarksImages

MIDV-500

500 video clips for 50 different identity document types with ground truth.

13 papers0 benchmarks

Super-CLEVR

Super-CLEVR is a dataset for Visual Question Answering (VQA) where different factors in VQA domain shifts can be isolated in order that their effects can be studied independently. It contains 21 vehicle models belonging to 5 categories, with controllable attributes. Four factors are considered: visual complexity, question redundancy, concept distribution and concept compositionality.

13 papers0 benchmarks3D

HM3DSem

The Habitat-Matterport 3D Semantics Dataset (HM3DSem) is the largest-ever dataset of 3D real-world and indoor spaces with densely annotated semantics that is available to the academic community. HM3DSem v0.2 consists of 142,646 object instance annotations across 216 3D-spaces from HM3D and 3,100 rooms within those spaces. The HM3D scenes are annotated with the 142,646 raw object names, which are mapped to 40 Matterport categories. On average, each scene in HM3DSem v0.2 consists of 661 objects from 106 categories. This dataset is the result of 14,200+ hours of human effort for annotation and verification by 20+ annotators.

13 papers0 benchmarks3d meshes

ROSCOE

ROSCOE is a suite of interpretable, unsupervised automatic scores that improve and extend previous text generation evaluation metrics.

13 papers0 benchmarksTexts

EGFxSet (Electric Guitar Effects Dataset, ISMIR 2022 LBD)

EGFxSet (Electric Guitar Effects dataset) features recordings for all clean tones in a 22-fret Stratocaster, recorded with 5 different pickup configurations, also processed through 12 popular guitar effects. Our dataset was recorded in real hardware, making it relevant for music information retrieval tasks on real music. We also include annotations for parameter settings of the effects we used.

13 papers0 benchmarks

UnAV-100

Existing audio-visual event localization (AVE) handles manually trimmed videos with only a single instance in each of them. However, this setting is unrealistic as natural videos often contain numerous audio-visual events with different categories. To better adapt to real-life applications, we focus on the task of dense-localizing audio-visual events, which aims to jointly localize and recognize all audio-visual events occurring in an untrimmed video. To tackle this problem, we introduce the first Untrimmed Audio-Visual (UnAV-100) dataset, which contains 10K untrimmed videos with over 30K audio-visual events covering 100 event categories. Each video has 2.8 audio-visual events on average, and the events are usually related to each other and might co-occur as in real-life scenes. We believe our UnAV-100, with its realistic complexity, can promote the exploration on comprehensive audio-visual video understanding.

13 papers2 benchmarksAudio, Videos

Cityscapes 3D

Detecting vehicles and representing their position and orientation in the three dimensional space is a key technology for autonomous driving. Recently, methods for 3D vehicle detection solely based on monocular RGB images gained popularity. In order to facilitate this task as well as to compare and drive state-of-the-art methods, several new datasets and benchmarks have been published. Ground truth annotations of vehicles are usually obtained using lidar point clouds, which often induces errors due to imperfect calibration or synchronization between both sensors. To this end, we propose Cityscapes 3D, extending the original Cityscapes dataset with 3D bounding box annotations for all types of vehicles. In contrast to existing datasets, our 3D annotations were labeled using stereo RGB images only and capture all nine degrees of freedom. This leads to a pixel-accurate reprojection in the RGB image and a higher range of annotations compared to lidar-based approaches. In order to ease multi

13 papers10 benchmarksImages

3D Ken Burns

This dataset accompanies our paper on synthesizing the 3D Ken Burns effect from a single image. It consists of 134041 captures from 32 virtual environments where each capture consists of 4 views. Each view contains color-, depth-, and normal-maps at a resolution of 512x512 pixels.

13 papers0 benchmarksImages, RGB-D, Stereo

SIMMC2.0

Next generation task-oriented dialog systems need to understand conversational contexts with their perceived surroundings, to effectively help users in the real-world multimodal environment. Existing task-oriented dialog datasets aimed towards virtual assistance fall short and do not situate the dialog in the user's multimodal context. To overcome, we present a new dataset for Situated and Interactive Multimodal Conversations, SIMMC 2.0, which includes 11K task-oriented user<->assistant dialogs (117K utterances) in the shopping domain, grounded in immersive and photo-realistic scenes. The dialogs are collected using a two-phase pipeline: (1) A novel multimodal dialog simulator generates simulated dialog flows, with an emphasis on diversity and richness of interactions, (2) Manual paraphrasing of the generated utterances to collect diverse referring expressions. We provide an in-depth analysis of the collected dataset, and describe in detail the four main benchmark tasks we propose. Our

13 papers3 benchmarksImages, Texts

GUE (Genome Understanding Evaluation)

A collection of $28$ datasets across $7$ tasks constructed for genome language model evaluation. Contains seven tasks: promoter prediction. core promoter prediction, splice site prediction, covid variant classification, epigenetic marks prediction, and transcription factor binding sites prediction on human and mouse.

13 papers6 benchmarksMedical, Texts

HLW (Horizon Lines in the Wild)

We introduce Horizon Lines in the Wild (HLW), a large dataset of real-world images with labeled horizon lines, captured in a diverse set of environments. The dataset is available for download at our project website [1]. We begin by characterizing limitations in existing datasets for evaluating horizon line detection methods and then describe our approach for leveraging structure from motion to automatically label images with horizon lines.

13 papers0 benchmarks

WaterScenes

A Multi-Task 4D Radar-Camera Fusion Dataset for Autonomous Driving on Water Surfaces description of the dataset

13 papers6 benchmarksImages, Point cloud

DNA-Rendering

DNA-Rendering is a large-scale, high-fidelity repository of human performance data for neural actor rendering. It contains over 1500 human subjects, 5000 motion sequences, and 67.5M frames' data volume. Upon the massive collections, the authors provide human subjects with grand categories of pose actions, body shapes, clothing, accessories, hairdos, and object intersection, which ranges the geometry and appearance variances from everyday life to professional occasions. Second, they provide rich assets for each subject -- 2D/3D human body keypoints, foreground masks, SMPLX models, cloth/accessory materials, multi-view images, and videos. These assets boost the current method's accuracy on downstream rendering tasks. Third, they construct a professional multi-View system to capture data, which contains 60 synchronous cameras with max 4096×3000 resolution, 15 fps speed, and stern camera calibration steps, ensuring high-quality resources for task training and evaluation.

13 papers0 benchmarksImages, Videos

IDiff-Face

The IDiff-Face dataset was proposed in the paper "IDiff-Face: Synthetic-based Face Recognition through Fizzy Identity-Conditioned Diffusion Models". This dataset is synthetically generated using the IDiff-Face model.

13 papers0 benchmarks

Real HSI (End-to-End Low Cost Compressive Spectral Imaging with Spatial-Spectral Self-Attention)

End-to-End Low Cost Compressive Spectral Imaging with Spatial-Spectral Self-Attention

13 papers2 benchmarks
PreviousPage 135 of 1000Next