19,997 machine learning datasets
19,997 dataset results
ScreenSpot Evaluation Benchmark ScreenSpot is an evaluation benchmark for GUI grounding, comprising over 1,200 instructions from various environments, including iOS, Android, macOS, Windows, and Web. Each data point includes annotated element types (Text or Icon/Widget). For more details and examples, please refer to our paper.
Criteo contains 7 days of click-through data, which is widely used for CTR prediction benchmarking. There are 26 anonymous categorical fields and 13 continuous fields in Criteo dataset.
MAMS is a challenge dataset for aspect-based sentiment analysis (ABSA), in which each sentences contain at least two aspects with different sentiment polarities. MAMS dataset contains two versions: one for aspect-term sentiment analysis (ATSA) and one for aspect-category sentiment analysis (ACSA).
The FLIC dataset contains 5003 images from popular Hollywood movies. The images were obtained by running a state-of-the-art person detector on every tenth frame of 30 movies. People detected with high confidence (roughly 20K candidates) were then sent to the crowdsourcing marketplace Amazon Mechanical Turk to obtain ground truth labelling. Each image was annotated by five Turkers to label 10 upper body joints. The median-of-five labelling was taken in each image to be robust to outlier annotation. Finally, images were rejected manually by if the person was occluded or severely non-frontal.
The Stanford Background dataset contains 715 RGB images and the corresponding label images. Images are approximately 240×320 pixels in size and pixels are classified into eight different categories
A new large dataset with over 100,000 examples consisting of Java classes from online code repositories, and develop a new encoder-decoder architecture that models the interaction between the method documentation and the class environment.
CoS-E consists of human explanations for commonsense reasoning in the form of natural language sequences and highlighted annotations
A new large-scale dataset for referring expressions, based on MS-COCO.
Synscapes is a synthetic dataset for street scene parsing created using photorealistic rendering techniques, and show state-of-the-art results for training and validation as well as new types of analysis.
Ubuntu Dialogue Corpus (UDC) is a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This provides a unique resource for research into building dialogue managers based on neural language models that can make use of large amounts of unlabeled data. The dataset has both the multi-turn property of conversations in the Dialog State Tracking Challenge datasets, and the unstructured nature of interactions from microblog services such as Twitter.
GeoQA is a dataset for automatic geometric problem solving containing 5,010 geometric problems with corresponding annotated programs, which illustrate the solving process of the given problems
FEVEROUS (Fact Extraction and VERification Over Unstructured and Structured information) is a fact verification dataset which consists of 87,026 verified claims. Each claim is annotated with evidence in the form of sentences and/or cells from tables in Wikipedia, as well as a label indicating whether this evidence supports, refutes, or does not provide enough information to reach a verdict.
Legal General Language Understanding Evaluation (LexGLUE) benchmark is a collection of datasets for evaluating model performance across a diverse set of legal NLU tasks in a standardized way.
AliMeeting corpus consists of 120 hours of recorded Mandarin meeting data, including far-field data collected by 8-channel microphone array as well as near-field data collected by headset microphone. Each meeting session is composed of 2-4 speakers with different speaker overlap ratio, recorded in rooms with different size.
We present a large and diverse abdominal CT organ segmentation dataset, termed AbdomenCT-1K, with more than 1000 (1K) CT scans from 12 medical centers, including multi-phase, multi-vendor, and multi-disease cases. Furthermore, we conduct a large-scale study for liver, kidney, spleen, and pancreas segmentation and reveal the unsolved segmentation problems of the SOTA methods, such as the limited generalization ability on distinct medical centers, phases, and unseen diseases. To advance the unsolved problems, we further build four organ segmentation benchmarks for fully supervised, semi-supervised, weakly supervised, and continual learning, which are currently challenging and active research topics. Accordingly, we develop a simple and effective method for each benchmark, which can be used as out-of-the-box methods and strong baselines. We believe the AbdomenCT-1K dataset will promote future in-depth research towards clinical applicable abdominal organ segmentation methods.
OpenLane is the first real-world and the largest scaled 3D lane dataset to date. The dataset collects valuable contents from public perception dataset Waymo Open Dataset and provides lane&closest-in-path object(CIPO) annotation for 1000 segments. In short, OpenLane owns 200K frames and over 880K carefully annotated lanes. The OpenLane Dataset is publicly released to aid the research community in making advancements in 3D perception and autonomous driving technology.
TAP-Vid is a benchmark which contains both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks. This is designed for a new task called tracking any point.
InternVid is a large-scale video-centric multimodal dataset that enables learning powerful and transferable video-text representations for multimodAL understanding and generation. The InternVid dataset contains over 7 million videos lasting nearly 760K hours, yielding 234M video clips accompanied by detailed descriptions of total 4.1B words.
MeViS is a large-scale dataset for motion expressions guided video segmentation, which focuses on segmenting objects in video content based on a sentence describing the motion of the objects. The dataset contains numerous motion expressions to indicate target objects in complex environments.
Questions is an interaction graph of users of a question-answering website based on data provided by Yandex Q.