TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

NL-Drive (Nonlinear Autonomous Driving Dataset)

A challenging multi-frame interpolation dataset for autonomous driving scenarios. Based on the principle of hard-sample selection and the diversity of scenarios, NL-Drive dataset contains point cloud sequences with large nonlinear movements from three public large-scale autonomous driving datasets: KITTI, Argoverse and Nuscenes. The overall dataset contains more than 20,000 LiDAR point cloud frames. The frame rate of point cloud sequence is 10Hz. And NL-Drive dataset is split into the training, validation and test set in the ratio of 14:3:3. For the point cloud interpolation task, the point cloud frame input is selected at a given interval of frames, and the remaining point clouds as the ground truth of the interpolation frame. Particularly, each sample of NL-Drive dataset is 4 point cloud frames of 2.5Hz when there are 3 interpolation frames to predict between the middle two input frames.

2 papers2 benchmarksPoint cloud

ActionBench

ActionBench contains two carefully designed probing tasks: Action Antonym and Video Reversal, which targets multimodal alignment capabilities and temporal understanding skills of the model, respectively. Action knowledge involves the understanding of textual, visual, and temporal aspects of actions. The benchmark is constructed by leveraging two existing open-domain video-language datasets, Ego4D and Something-Something v2 (SSv2), which provide fine-grained action annotations for each video clip.

2 papers0 benchmarksVideos

MultiTACRED

MultiTACRED is a multilingual version of the large-scale TAC Relation Extraction Dataset. It covers 12 typologically diverse languages from 9 language families, and was created by the Speech & Language Technology group of DFKI by machine-translating the instances of the original TACRED dataset and automatically projecting their entity annotations. For details of the original TACRED's data collection and annotation process, see the Stanford paper. Translations are syntactically validated by checking the correctness of the XML tag markup. Any translations with an invalid tag structure, e.g. missing or invalid head or tail tag pairs, are discarded (on average, 2.3% of the instances).

2 papers0 benchmarksTexts

SIDAR

SIDAR is a dataset designed to be a training and evaluation set for a multitude of tasks involving image alignment and artifact removal, such as deep homography estimation, dense image matching, 2D bundle adjustment, inpainting, shadow removal, denoising, content retrieval, and background subtraction.

2 papers0 benchmarks

ENRICH (Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetry)

A new synthetic, multi-purpose dataset - called ENRICH - for testing photogrammetric and computer vision algorithms. Compared to existing datasets, ENRICH offers higher resolution images also rendered with different lighting conditions, camera orientation, scales, and field of view. Specifically, ENRICH is composed of three sub-datasets: ENRICH-Aerial, ENRICH-Square, and ENRICH-Statue, each exhibiting different characteristics. The proposed dataset is useful for several photogrammetry and computer vision-related tasks, such as the evaluation of hand-crafted and deep learning-based local features, effects of ground control points (GCPs) configuration on the 3D accuracy, and monocular depth estimation.

2 papers0 benchmarksImages

SheetCopilot

The SheetCopilot dataset contains 28 evaluation workbooks and 221 spreadsheet manipulation tasks that are applied to these workbooks. These tasks involve diverse atomic actions related to six task categories (i.e. Entry and manipulation, Formatting, Management, Charts, Pivot Table, and Formula).

2 papers1 benchmarksTables

RUGD (RUGD: Robot Unstructured Ground Driving)

A Video Dataset for Visual Perception and Autonomous Navigation in Unstructured Environments. Website: http://rugd.vision/

2 papers4 benchmarksImages

Drone-Action (Drone-Action: An Outdoor Recorded Drone Video Dataset for Action Recognition)

Website: https://asankagp.github.io/droneaction/

2 papers4 benchmarksImages, Videos

InspiRe (Inspiring and non-inspiring posts from Reddit)

We analyze social media posts to tease out what makes a post inspiring and what topics are inspiring. We release a dataset of 5,800 inspiring and 5,800 non-inspiring English-language public post unique ids collected from a dump of Reddit public posts made available by a third party and use linguistic heuristics to automatically detect which social media English-language posts are inspiring.

2 papers0 benchmarksTexts

Watkins Marine Mammal Sounds (Watkins Marine Mammal Sound Database)

One of the founding fathers of marine mammal bioacoustics, William Watkins, carried out pioneering work with William Schevill at the Woods Hole Oceanographic Institution for more than four decades, laying the groundwork for our field today. One of the lasting achievements of his career was the Watkins Marine Mammal Sound Database, a resource that contains approximately 2000 unique recordings of more than 60 species of marine mammals (Table 1). Recordings were made by Watkins and Schevill as well as many others, including G. C. Ray, D. Wartzok, D. and M. Caldwell, K. Norris, and T. Poulter. Most of these have been digitized, along with approximately 15,000 annotated digital sound clips.

2 papers2 benchmarksAudio

Speech Accent Archive (The Speech Accent Archive)

The speech accent archive uniformly presents a large set of speech samples from a variety of language backgrounds. Native and non-native speakers of English read the same paragraph and are carefully transcribed. The archive is used by people who wish to compare and analyze the accents of different English speakers.

2 papers2 benchmarksAudio

Law Stack Exchange

Description Dataset from the Law Stack Exchange, as used in "Parameter-Efficient Legal Domain Adaptation" (Li et al., 2022). We introduce a dataset with data from the Law Stack Exchange. This dataset is composed of questions from the Law Stack Exchange, which is a community forum-based website containing questions with answers to legal questions. We link the questions with their associated tags (e.g., "copyright" or "criminal-law"), and perform a multi-label classification task

2 papers0 benchmarksTexts

CVB (Video Dataset of Cattle Visual Behaviors)

Existing image/video datasets for cattle behavior recognition are mostly small, lack well-defined labels, or are collected in unrealistic controlled environments. This limits the utility of machine learning (ML) models learned from them. Therefore, we introduce a new dataset, called Cattle Visual Behaviors (CVB), that consists of 502 video clips, each fifteen seconds long, captured in natural lighting conditions, and annotated with eleven visually perceptible behaviors of grazing cattle. By creating and sharing CVB, our aim is to develop improved models capable of recognizing all important cattle behaviors accurately and to assist other researchers and practitioners in developing and evaluating new ML models for cattle behavior classification using video data. The dataset is presented in the form of following three sub-directories. 1. raw_frames: contains 450 frames in each sub folder representing a 15 second video taken at a frame rate of 30 FPS. 2. annotations: contains the json file

2 papers0 benchmarksActions, Images, Tracking, Videos

IRFL: Image Recognition of Figurative Language

The IRFL dataset consists of idioms, similes, and metaphors with matching figurative and literal images, as well as two novel tasks of multimodal figurative understanding and preference.

2 papers2 benchmarksImages, Texts

E-ReDial (Explainable Recommendation Dialogues)

E-ReDial is a conversational recommender system dataset with high-quality explanations. It consists of 756 dialogues with 12,003 utterances, each with 15.9 turns on average. 2,058 high-quality explanations are included, each with 79.2 tokens on average.

2 papers0 benchmarksTexts

TAP (Traffic Accident Prediction data repository)

The Traffic Accident Prediction (TAP) data repository offers extensive coverage for 1,000 US cities (TAP-city) and 49 states (TAP-state), providing real-world road structure data that can be easily used for graph-based machine learning methods such as Graph Neural Networks. Additionally, it features multi-dimensional geospatial attributes, including angular and directional features, that are useful for analyzing transportation networks. The TAP repository has the potential to benefit the research community in various applications, including traffic crash prediction, road safety analysis, and traffic crash mitigation. The datasets can be accessed in the TAP-city and TAP-state directories.

2 papers0 benchmarksTabular

BabySLM

BabySLM is a language-acquisition-friendly benchmark to probe speech-based LMs at the lexical and syntactic levels, both of which are compatible with the vocabulary typical of children's language experiences.

2 papers0 benchmarksTexts

PanCollection

Pansharpening Datasets from WorldView 2, WorldView 3, QuickBird, Gaofen 2 sensors.

2 papers3 benchmarks

Krapivin

A dataset for benchmarking keyphrase extraction and generation techniques from long document English scientific papers. The dataset has high quality and consists of 2,000 scientific papers from the Computer Science domain published by ACM. Each paper has its keyphrases assigned by the authors and verified by the reviewers. Different parts of papers, such as title and abstract, are separated, enabling extraction based on the part of an article's text. The content of each paper is converted from PDF to plain text. The pieces of formulae, tables, figures and LaTeX mark up were removed automatically. Link: https://huggingface.co/datasets/midas/krapivin

2 papers1 benchmarks

NUS

The dataset was constructed by first finding suitable publications and then collecting keyphrases from manual annotators. Google SOAP API was used to find documents using variants of the query “keywords general terms filetype:pdf”. Over 250 of these PDF documents were downloaded for further processing. Documents were then manually restricted to scientific conference papers, with a length range of 4-12 pages. The PDF documents were then converted to plain text using the PDF995 software suite (as it handled two-columned text better than other programs tried). At the end of this process, 211 documents in plain text format were selected which were converted successfully without problems. The authors then recruited student volunteers from our department to participate in manual keyphrase assignments. Each volunteer was given three PDF files (with author-assigned keyphrases hidden) to assign keyphrases to.

2 papers1 benchmarks
PreviousPage 338 of 1000Next