TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

UT-Kinect (UTKinect-Action3D Dataset)

The UT-Kinect dataset is a dataset for action recognition from depth sequences. The videos were captured using a single stationary Kinect. There are 10 action types: walk, sit down, stand up, pick up, carry, throw, push, pull, wave hands, clap hands. There are 10 subjects, Each subject performs each actions twice. Three channels were recorded: RGB, depth and skeleton joint locations. The three channel are synchronized. The framerate is 30f/s.

70 papers9 benchmarksImages

CoNLL 2002

The shared task of CoNLL-2002 concerns language-independent named entity recognition. The types of named entities include: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups. The participants of the shared task were offered training and test data for at least two languages. Information sources other than the training data might have been used in this shared task.

70 papers0 benchmarksTexts

HybridQA

A new large-scale question-answering dataset that requires reasoning on heterogeneous information. Each question is aligned with a Wikipedia table and multiple free-form corpora linked with the entities in the table. The questions are designed to aggregate both tabular information and text information, i.e., lack of either form would render the question unanswerable.

70 papers1 benchmarks

RSICD (Remote Sensing Image Captioning Dataset)

The Remote Sensing Image Captioning Dataset (RSICD) is a dataset for remote sensing image captioning task. It contains more than ten thousands remote sensing images which are collected from Google Earth, Baidu Map, MapABC and Tianditu. The images are fixed to 224X224 pixels with various resolutions. The total number of remote sensing images is 10921, with five sentences descriptions per image.

70 papers11 benchmarksImages

YouTube-UGC (YouTube UGC dataset)

This YouTube dataset is a sampling from thousands of User Generated Content (UGC) as uploaded to YouTube distributed under the Creative Commons license. This dataset was created in order to assist in the advancement of video compression and quality assessment research of UGC videos.

70 papers3 benchmarksVideos

DiffusionDB

DiffusionDB is a large-scale text-to-image prompt dataset. It contains 2 million images generated by Stable Diffusion using prompts and hyperparameters specified by real users.

70 papers0 benchmarksImages

OmniObject3D

OmniObject3D is a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects. OmniObject3D has several appealing properties:

70 papers0 benchmarks3D, 3d meshes, Images, Point cloud, Videos

MSP-IMPROV (MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception)

We present the MSP-IMPROV corpus, a multimodal emotional database, where the goal is to have control over lexical content and emotion while also promoting naturalness in the recordings. Studies on emotion perception often require stimuli with fixed lexical content, but that convey different emotions. These stimuli can also serve as an instrument to understand how emotion modulates speech at the phoneme level, in a manner that controls for coarticulation. Such audiovisual data are not easily available from natural recordings. A common solution is to record actors reading sentences that portray different emotions, which may not produce natural behaviors. We propose an alternative approach in which we define hypothetical scenarios for each sentence that are carefully designed to elicit a particular emotion. Two actors improvise these emotion-specific situations, leading them to utter contextualized, non-read renditions of sentences that have fixed lexical content and convey different emot

70 papers2 benchmarksAudio, Images, Videos

ScanQA (ScanQA: 3D Question Answering for Spatial Scene Understanding)

We collected 41,363 questions and 58,191 answers, in- cluding 32,337 unique questions and 16,999 unique an- swers. Table 2 presents the statistics of the ScanQA dataset. This dataset is an order of magnitude larger than existing embodied question-answering datasets in terms of both question size and variation. For example, the EQA dataset contains 4,246 questions, consisting of 147 unique questions in its training set. The EQA-MP3D dataset contains 767 questions consisting of 174 unique questions in its training set. Considering that our dataset contains not only question–answer pairs but also 3D object localization annotations, we assume that this is the largest dataset to specify the nature of objects in 3D scenes with the question answering form. The distribution of the questions based on their first word. We collected various types of questions through question auto-generation and editing by humans.

70 papers0 benchmarks

WikiLarge

WikiLarge comprise 359 test sentences, 2000 development sentences and 300k training sentences. Each source sentences in test set has 8 simplified references

69 papers0 benchmarksTexts

CACD (Cross-Age Celebrity Dataset)

The Cross-Age Celebrity Dataset (CACD) contains 163,446 images from 2,000 celebrities collected from the Internet. The images are collected from search engines using celebrity name and year (2004-2013) as keywords. Therefore, it is possible to estimate the ages of the celebrities on the images by simply subtract the birth year from the year of which the photo was taken.

69 papers6 benchmarksImages

RegDB (Dongguk Body-based Person Recognition Database (DBPerson-Recog-DB1))

RegDB is used for Visible-Infrared Re-ID which handles the cross-modality matching between the daytime visible and night-time infrared images. The dataset contains images of 412 people. It includes 10 color and 10 thermal images for each person.

69 papers3 benchmarks

CMRC (Chinese Machine Reading Comprehension)

CMRC is a dataset is annotated by human experts with near 20,000 questions as well as a challenging set which is composed of the questions that need reasoning over multiple clues.

69 papers0 benchmarksTexts

BraTS 2015

The BraTS 2015 dataset is a dataset for brain tumor image segmentation. It consists of 220 high grade gliomas (HGG) and 54 low grade gliomas (LGG) MRIs. The four MRI modalities are T1, T1c, T2, and T2FLAIR. Segmented “ground truth” is provide about four intra-tumoral classes, viz. edema, enhancing tumor, non-enhancing tumor, and necrosis.

69 papers0 benchmarksImages, MRI, Medical

Multimodal Opinionlevel Sentiment Intensity (MOSI)

Multimodal Opinionlevel Sentiment Intensity (MOSI) contains: (1) multimodal observations including transcribed speech and visual gestures as well as automatic audio and visual features, (2) opinion-level subjectivity segmentation, (3) sentiment intensity annotations with high coder agreement, and (4) alignment between words, visual and acoustic features.

69 papers0 benchmarksSpeech, Videos

QMSum

QMSum is a new human-annotated benchmark for query-based multi-domain meeting summarisation task, which consists of 1,808 query-summary pairs over 232 meetings in multiple domains.

69 papers1 benchmarksTexts

VSR (Visual Spatial Reasoning)

The Visual Spatial Reasoning (VSR) corpus is a collection of caption-image pairs with true/false labels. Each caption describes the spatial relation of two individual objects in the image, and a vision-language model (VLM) needs to judge whether the caption is correctly describing the image (True) or not (False).

69 papers1 benchmarksImages, Texts

UBFC-rPPG (Univ. Bourgogne Franche-Comté Remote PhotoPlethysmoGraphy)

We introduce here a new database called UBFC-rPPG (stands for Univ. Bourgogne Franche-Comté Remote PhotoPlethysmoGraphy) comprising two datasets that are focused specifically on rPPG analysis. The UBFC-RPPG database was created using a custom C++ application for video acquisition with a simple low cost webcam (Logitech C920 HD Pro) at 30fps with a resolution of 640x480 in uncompressed 8-bit RGB format. A CMS50E transmissive pulse oximeter was used to obtain the ground truth PPG data comprising the PPG waveform as well as the PPG heart rates. During the recording, the subject sits in front of the camera (about 1m away from the camera) with his/her face visible. All experiments are conducted indoors with a varying amount of sunlight and indoor illumination. The link to download the complete video dataset is available on request. A basic Matlab implementation can also be provided to read ground truth data acquired with a pulse oximeter.

69 papers20 benchmarksImages, Medical

Bank Marketing

The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).

69 papers0 benchmarksTabular

Recipe1M+

Recipe1M+ is a dataset which contains one million structured cooking recipes with 13M associated images.

68 papers6 benchmarksImages, Texts
PreviousPage 44 of 1000Next