Datasets

19,997 machine learning datasets

19,997 dataset results

ScreenSpot

ScreenSpot Evaluation Benchmark ScreenSpot is an evaluation benchmark for GUI grounding, comprising over 1,200 instructions from various environments, including iOS, Android, macOS, Windows, and Web. Each data point includes annotated element types (Text or Icon/Widget). For more details and examples, please refer to our paper.

47 papers1 benchmarksImages, Texts

Criteo (Display Advertising Challenge)

Criteo contains 7 days of click-through data, which is widely used for CTR prediction benchmarking. There are 26 anonymous categorical fields and 13 continuous fields in Criteo dataset.

46 papers2 benchmarks

MAMS (Multi Aspect Multi-Sentiment)

MAMS is a challenge dataset for aspect-based sentiment analysis (ABSA), in which each sentences contain at least two aspects with different sentiment polarities. MAMS dataset contains two versions: one for aspect-term sentiment analysis (ATSA) and one for aspect-category sentiment analysis (ACSA).

46 papers4 benchmarksTexts

FLIC (Frames Labelled in Cinema)

The FLIC dataset contains 5003 images from popular Hollywood movies. The images were obtained by running a state-of-the-art person detector on every tenth frame of 30 movies. People detected with high confidence (roughly 20K candidates) were then sent to the crowdsourcing marketplace Amazon Mechanical Turk to obtain ground truth labelling. Each image was annotated by five Turkers to label 10 upper body joints. The median-of-five labelling was taken in each image to be robust to outlier annotation. Finally, images were rejected manually by if the person was occluded or severely non-frontal.

46 papers0 benchmarksImages, Videos

Stanford Background (Standford Background Dataset)

The Stanford Background dataset contains 715 RGB images and the corresponding label images. Images are approximately 240×320 pixels in size and pixels are classified into eight different categories

46 papers0 benchmarksImages

CONCODE

A new large dataset with over 100,000 examples consisting of Java classes from online code repositories, and develop a new encoder-decoder architecture that models the interaction between the method documentation and the class environment.

46 papers3 benchmarks

CoS-E (Commonsense Explanations Dataset)

CoS-E consists of human explanations for commonsense reasoning in the form of natural language sequences and highlighted annotations

46 papers0 benchmarksTexts

Google Refexp

A new large-scale dataset for referring expressions, based on MS-COCO.

46 papers0 benchmarks

Synscapes

Synscapes is a synthetic dataset for street scene parsing created using photorealistic rendering techniques, and show state-of-the-art results for training and validation as well as new types of analysis.

46 papers0 benchmarksImages

UDC (Ubuntu Dialogue Corpus)

Ubuntu Dialogue Corpus (UDC) is a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This provides a unique resource for research into building dialogue managers based on neural language models that can make use of large amounts of unlabeled data. The dataset has both the multi-turn property of conversations in the Dialog State Tracking Challenge datasets, and the unstructured nature of interactions from microblog services such as Twitter.

46 papers0 benchmarksDialog, Texts

GeoQA (Geometric Question Answering)

GeoQA is a dataset for automatic geometric problem solving containing 5,010 geometric problems with corresponding annotated programs, which illustrate the solving process of the given problems

46 papers1 benchmarksTexts

FEVEROUS (Fact Extraction and VERification Over Unstructured and Structured information)

FEVEROUS (Fact Extraction and VERification Over Unstructured and Structured information) is a fact verification dataset which consists of 87,026 verified claims. Each claim is annotated with evidence in the form of sentences and/or cells from tables in Wikipedia, as well as a label indicating whether this evidence supports, refutes, or does not provide enough information to reach a verdict.

46 papers0 benchmarksTexts

LexGLUE

Legal General Language Understanding Evaluation (LexGLUE) benchmark is a collection of datasets for evaluating model performance across a diverse set of legal NLU tasks in a standardized way.

46 papers7 benchmarksTexts

AliMeeting (Multi-Channel Multi-Party Meeting Transcription Challenge)

AliMeeting corpus consists of 120 hours of recorded Mandarin meeting data, including far-field data collected by 8-channel microphone array as well as near-field data collected by headset microphone. Each meeting session is composed of 2-4 speakers with different speaker overlap ratio, recorded in rooms with different size.

46 papers1 benchmarksAudio

AbdomenCT-1K

We present a large and diverse abdominal CT organ segmentation dataset, termed AbdomenCT-1K, with more than 1000 (1K) CT scans from 12 medical centers, including multi-phase, multi-vendor, and multi-disease cases. Furthermore, we conduct a large-scale study for liver, kidney, spleen, and pancreas segmentation and reveal the unsolved segmentation problems of the SOTA methods, such as the limited generalization ability on distinct medical centers, phases, and unseen diseases. To advance the unsolved problems, we further build four organ segmentation benchmarks for fully supervised, semi-supervised, weakly supervised, and continual learning, which are currently challenging and active research topics. Accordingly, we develop a simple and effective method for each benchmark, which can be used as out-of-the-box methods and strong baselines. We believe the AbdomenCT-1K dataset will promote future in-depth research towards clinical applicable abdominal organ segmentation methods.

46 papers0 benchmarksMedical

OpenLane

OpenLane is the first real-world and the largest scaled 3D lane dataset to date. The dataset collects valuable contents from public perception dataset Waymo Open Dataset and provides lane&closest-in-path object(CIPO) annotation for 1000 segments. In short, OpenLane owns 200K frames and over 880K carefully annotated lanes. The OpenLane Dataset is publicly released to aid the research community in making advancements in 3D perception and autonomous driving technology.

46 papers18 benchmarksImages, Videos

TAP-Vid

TAP-Vid is a benchmark which contains both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks. This is designed for a new task called tracking any point.

46 papers6 benchmarksVideos

InternVid

InternVid is a large-scale video-centric multimodal dataset that enables learning powerful and transferable video-text representations for multimodAL understanding and generation. The InternVid dataset contains over 7 million videos lasting nearly 760K hours, yielding 234M video clips accompanied by detailed descriptions of total 4.1B words.

46 papers0 benchmarksVideos

MeViS (Motion expressions Video Segmentation)

MeViS is a large-scale dataset for motion expressions guided video segmentation, which focuses on segmenting objects in video content based on a sentence describing the motion of the objects. The dataset contains numerous motion expressions to indicate target objects in complex environments.

46 papers6 benchmarks

questions

Questions is an interaction graph of users of a question-answering website based on data provided by Yandex Q.

46 papers1 benchmarksGraphs

PreviousPage 60 of 1000Next