Datasets

3,275 machine learning datasets

3,275 dataset results

Indic IPO Success

We present two multi-modal datasets, one for Main Board IPOs, and the other for Small and Medium Enterprises (SME) IPOs. It consists of various features relating to the company going for IPOs, and other macroeconomic factors. The objective is to estimate the direction and under pricing with respect to opening, high and closing prices of stocks on the IPOlisting day.

1 papers0 benchmarksImages, Tabular, Texts

Reactive Diffusion Policy-Dataset (Dataset of Reactive Diffusion Policy)

Two versions of the dataset are offered: one is the full dataset used to train the models in our paper, and the other is a mini dataset for easier examination. Both versions include raw and postprocessed subsets of peeling, wiping and lifting. The raw videos of the tactile dataset used for generate the PCA embedding are also provided.

1 papers0 benchmarksActions, Images, Videos

FBIS-22M (Field Boundary Instance Segmentation - 22M)

FBIS-22M is the largest field boundary instance segmentation dataset to date, featuring over 22 million labeled field instances across more than 672 000 high-resolution satellite image patches. It includes imagery from 0.25m to 10m resolution, sourced from multiple satellites and covering diverse geographic regions, enabling robust training for scalable agricultural vision models.

1 papers0 benchmarksImages

FashionRec (Fashion Recommendation Dataset)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksImages, Texts

SemanticSugarBeets

SemanticSugarBeets, a novel and high-quality dataset containing 953 monocular RGB images and 2920 annotations of sugar beets, enables a wide range of learning tasks including object detection, semantic segmentation, instance segmentation and mass estimation for post-harvest and post-storage analysis.

1 papers0 benchmarksImages

UAV Multiview Navigation

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksImages

Shaved Ice Snowflake VM Demand Dataset (Snowflake Dataset for "Shaved Ice: Optimal Compute Resource Commitments for Dynamic Multi-Cloud Workloads" paper)

This repository contains documentation for the dataset that accompanies our ICPE 2025 paper, "Shaved Ice: Optimal Compute Resource Commitments for Dynamic Multi-Cloud Workloads". It also includes example R and Python notebooks to read and visualize the data, including scripts to reproduce the figures and analysis results in the paper.

1 papers0 benchmarksGraphs, Images, Time series

SIDL: A Real-World Dataset for Restoring Smartphone Images with Dirty Lenses

Smartphone cameras are ubiquitous in daily life, yet their performance can be severely impacted by dirty lenses, leading to degraded image quality. This issue is often overlooked in image restoration research, which assumes ideal or controlled lens conditions. To address this gap, we introduced SIDL (Smartphone Images with Dirty Lenses), a novel dataset designed to restore images captured through contaminated smartphone lenses. SIDL contains diverse real-world images taken under various lighting conditions and environments. These images feature a wide range of lens contaminants, including water drops, fingerprints, and dust. Each contaminated image is paired with a clean reference image, enabling supervised learning approaches for restoration tasks. To evaluate the challenge posed by SIDL, various state-of-the-art restoration models were trained and compared on this dataset. Their performances achieved some level of restoration but did not adequately address the diverse and reali

1 papers0 benchmarksImages

DARai (Daily Activity Recordings for AI and ML applications)

Daily Activity Recordings for Artificial Intelligence (DARai, pronounced "Dahr-ree") is a multimodal, hierarchically annotated dataset constructed to understand human activities in real-world settings. DARai consists of continuous scripted and unscripted recordings of 50 participants in 10 different environments, totaling over 200 hours of data from 20 sensors including multiple camera views, depth and radar sensors, wearable inertial measurement units (IMUs), electromyography (EMG), insole pressure sensors, biomonitor sensors, and gaze tracker. To capture the complexity in human activities, DARai is annotated at three levels of hierarchy: (i) high-level activities (L1) that are independent tasks, (ii) lower-level actions (L2) that are patterns shared between activities, and (iii) fine-grained procedures (L3) that detail the exact execution steps for actions. The dataset annotations and recordings are designed so that 22.7% of L2 actions are shared between L1 activities and 14.2% of L3

1 papers0 benchmarksBiomedical, Environment, Images, LiDAR, RGB-D, Time series, Videos

mwBTFreddy

mwBTFreddy dataset is a resource developed to support flash flood damage assessment in urban Malawi, specifically focusing on the impacts of Cyclone Freddy in 2023. The dataset comprises paired pre- and post-disaster satellite images sourced from Google Earth Pro, ac- companied by JSON files containing labelled building annotations with geographic coordinates and damage levels (no damage, minor, major, or destroyed). Developed by the Kuyesera AI Lab at the Malawi University of Business and Applied Sciences, this dataset is intended to facilitate the develop- ment of machine learning models tailored to building detection and damage classification in African urban contexts. It also supports flood damage visualisation and spatial analysis to inform decisions on relocation, infrastructure planning, and emergency response in climate-vulnerable regions.

1 papers0 benchmarksImages

PEnG (Pose-Enhanced Geo-Localisation)

This dataset builds upon the SpaGBOL dataset - a graph-based dataset covering numerous cities across the globe for the purpose of structured city-scale Cross-View Geo-Localisation (CVGL).

1 papers0 benchmarksGraphs, Images

Aneux

The AneuX morphology database includes data from 3 different data sources: AneuX, @neurIST and Aneurisk. The AneuX data consists of two portions AneuX1 and AneuX2, which have extracted by two different teams of data curators.

1 papers0 benchmarks3d meshes, Images, Medical

X-TransferBench

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksImages

Remote Flash LiDAR Vehicles Dataset

This dataset includes 3D point-cloud and 2D imagery from a flash LiDAR...

1 papers6 benchmarks3D, Images, LiDAR, Point cloud, Videos

RB-Dust (RB-Dust: Real-world Industrial Dust Dehazing Dataset)

A small-scale real-world dataset containing hazy/dusty industrial images and their clean ground truth counterparts. Designed for evaluating deep learning models for dust removal and image dehazing in industrial environments. Collected and fine-tuned by Moshtaghioun et al., 2025.

1 papers2 benchmarksImages

CADEdgeTune

CAD-EdgeTune dataset is acquired using a Husarion ROSbot 2.0 and ROSbot 2.0 Pro with the collection speed set to 5 frames per second from a suburban university environment. We may split the information into subgroups for noon, dusk, and dawn in order to depict our surroundings under various lighting situations. We have assembled 17 sequences totaling 8080 frames, of which 1619 have been manually analyzed using an open-source pixel annotation program. Since nearby photographs are highly similar to one another, we decide to annotate every five images. Since the annotation procedure may be highly time-consuming, we employ soft-labeling while annotating CAD-EdgeTune dataset, which enables us to proceed through the frames more quickly. The annotation method we employ enables us to create minute annotations inside an image's objects, and the categorization would encompass related pixels. This approach may result in less-than-perfect annotations and some performance accuracy loss, but the los

1 papers0 benchmarksImages

AutomotiveUI-Bench-4K

Dataset Overview: 998 images and 4,208 annotations focusing on interaction with in-vehicle infotainment (IVI) systems. Key Features:

1 papers0 benchmarksImages, Texts

SynthCheX-75K

A dataset consisting of high-quality, synthetic chest X-rays from the CheXGenBench-benchmark leading model, Sana (0.6B). The dataset has been filtered to contain using High Quality samples using HealthGPT.

1 papers0 benchmarksImages

BD-TypoSAT (Building Damage Typology Satellite Dataset)

On Sunday, August 29, 2021, Hurricane Ida struck parts of Louisiana and Mississippi with wind gusts reaching up to 172 mph, leaving more than a million customers without electricity, including the entire New Orleans area. During the disaster, Maxar captured high spatial resolution satellite imagery (at 0.4 m/pixel) and was subsequently made publicly available. The original images were segmented into 512*512-pixel patches to maintain spatial context while enabling detailed analysis. From this process, we generated a dataset of 2,135 triplets, each containing a pre-disaster image, a post-disaster image, and a manually annotated damage categorical mask.

1 papers0 benchmarksImages

EMMOE-100

first everyday task dataset featuring COT outputs, diverse task designs, detailed re-plan processes, along with SFT and DPO sub-datasets.

1 papers0 benchmarksImages, Texts

PreviousPage 149 of 164Next