19,997 machine learning datasets
19,997 dataset results
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
RepoEval is a benchmark specifically designed for evaluating repository-level code auto-completion systems. While existing benchmarks mainly focus on single-file tasks, RepoEval addresses the assessment gap for more complex, real-world, multi-file programming scenarios. Here are the key details about RepoEval:
PU1K is nearly 8 times larger than the largest publicly available dataset collected by PU-GAN. PU1K consists of 1,147 3D models split into 1020 training samples and 127 testing samples. The training set contains 120 3D models compiled from PU-GAN’s dataset, in addition to 900 different models collected from ShapeNetCore. The testing set contains 27 models from PU-GAN and 100 more models from ShapeNetCore.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
Many existing datasets for lidar place recognition are solely representative of structured urban environments, and have recently been saturated in performance by deep learning based approaches. Natural and unstructured environments present many additional challenges for the tasks of long-term localisation but these environments are not represented in currently available datasets. To address this we introduce Wild-Places, a challenging large-scale dataset for lidar place recognition in unstructured, natural environments. Wild-Places contains eight lidar sequences collected with a handheld sensor payload over the course of fourteen months, containing a total of 63K undistorted lidar submaps along with accurate 6DoF ground truth. This dataset contains multiple revisits both within and between sequences, allowing for both intra-sequence (i.e., loop closure detection) and inter-sequence (i.e., re-localisation) tasks. We also benchmark several state-of-the-art approaches to demonstrate t
Have you wondered how autonomous mobile robots should share space with humans in public spaces? Are you interested in developing autonomous mobile robots that can navigate within human crowds in a socially compliant manner? Do you want to analyze human reactions and behaviors in the presence of mobile robots of different morphologies?
CoIR (Code Information Retrieval) benchmark, is designed to evaluate code retrieval capabilities. CoIR includes 10 curated code datasets, covering 8 retrieval tasks across 7 domains. In total, it encompasses two million documents. It also provides a common and easy Python framework, installable via pip, and shares the same data schema as benchmarks like MTEB and BEIR for easy cross-benchmark evaluations.
We propose a large-scale benchmark here, which contains a total of 6,461 mirror images with ground truth annotations.
TuringBench is a benchmark environment that contains :
We create a benchmark dataset named ReVOS. This dataset comprises 35,074 pairs of instruction-mask sequences derived from 1,042 diverse videos. In contrast to traditional referring video segmentation datasets, such as Ref-YouTube-VOS and MeViS, which primarily contain explicit short phrases, ReVOS includes text instructions that necessitates a sophisticated understanding of both video content and general world knowledge
MuirBench is a benchmark containing 11,264 images and 2,600 multiple-choice questions, providing robust evaluation on 12 multi-image understanding tasks.
The DroneVehicle dataset consists of a total of 56,878 images collected by the drone, half of which are RGB images, and the resting are infrared images. We have made rich annotations with oriented bounding boxes for the five categories. Among them, car has 389,779 annotations in RGB images, and 428,086 annotations in infrared images, truck has 22,123 annotations in RGB images, and 25,960 annotations in infrared images, bus has 15,333 annotations in RGB images, and 16,590 annotations in infrared images, van has 11,935 annotations in RGB images, and 12,708 annotations in infrared images, and freight car has 13,400 annotations in RGB images, and 17,173 annotations in infrared image. This dataset is available on the download page.
In the rapidly evolving landscape of Large Language Models (LLMs), ensuring robust safety measures is paramount. To meet this crucial need, we propose \emph{SALAD-Bench}, a safety benchmark specifically designed for evaluating LLMs, attack, and defense methods. Distinguished by its breadth, SALAD-Bench transcends conventional benchmarks through its large scale, rich diversity, intricate taxonomy spanning three levels, and versatile this http URL-Bench is crafted with a meticulous array of questions, from standard queries to complex ones enriched with attack, defense modifications and multiple-choice. To effectively manage the inherent complexity, we introduce an innovative evaluators: the LLM-based MD-Judge for QA pairs with a particular focus on attack-enhanced queries, ensuring a seamless, and reliable evaluation. Above components extend SALAD-Bench from standard LLM safety evaluation to both LLM attack and defense methods evaluation, ensuring the joint-purpose utility. Our extensive
TID2013 is a dataset for image quality assessment that contains 25 reference images and 3000 distorted images (25 reference images x 24 types of distortions x 5 levels of distortions).
ApolloCar3DT is a dataset that contains 5,277 driving images and over 60K car instances, where each car is fitted with an industry-grade 3D CAD model with absolute model size and semantically labelled keypoints. This dataset is above 20 times larger than PASCAL3D+ and KITTI, the current state-of-the-art.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
The Kumar dataset contains 30 1,000×1,000 image tiles from seven organs (6 breast, 6 liver, 6 kidney, 6 prostate, 2 bladder, 2 colon and 2 stomach) of The Cancer Genome Atlas (TCGA) database acquired at 40× magnification. Within each image, the boundary of each nucleus is fully annotated.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
The New College Data is a freely available dataset collected from a robot completing several loops outdoors around the New College campus in Oxford. The data includes odometry, laser scan, and visual information. The dataset URL is not working anymore.
SceneNet is a dataset of labelled synthetic indoor scenes. There are several labeled indoor scenes, including: