3,275 machine learning datasets
3,275 dataset results
Overview This is a dataset of blood cells photos.
PTVD is a plot-oriented multimodal dataset in the TV domain. It is also the first non-English dataset of its kind. Additionally, PTVD contains more than 26 million bullet screen comments (BSCs), powering large-scale pre-training.
The RoseBlooming dataset is a stage-specific flower dataset for detection. The dataset, consisting of overhead images, contains two rose cultivars and was filmed over a period of months. The dataset has 519 images, and most of the images contain several bounding boxes. Therefore, this dataset contains over 7,000 bounding boxes. The developmental stages of flowering branches were visually classified and annotated into two stages: rose_small, and rose_large. For the rose variation, the dataset includes 2 rose cultivars (‘Samourai 08’ and ‘Blossom Pink’ roses). The dataset contains images under various weather conditions.
The UTRSet-Real dataset is a comprehensive, manually annotated dataset specifically curated for Printed Urdu OCR research. It contains over 11,000 printed text line images, each of which has been meticulously annotated. One of the standout features of this dataset is its remarkable diversity, which includes variations in fonts, text sizes, colours, orientations, lighting conditions, noises, styles, and backgrounds. This diversity closely mirrors real-world scenarios, making the dataset highly suitable for training and evaluating models that aim to excel in real-world Urdu text recognition tasks.
The UTRSet-Synth dataset is introduced as a complementary training resource to the UTRSet-Real Dataset, specifically designed to enhance the effectiveness of Urdu OCR models. It is a high-quality synthetic dataset comprising 20,000 lines that closely resemble real-world representations of Urdu text.
The UrduDoc Dataset is a benchmark dataset for Urdu text line detection in scanned documents. It is created as a byproduct of the UTRSet-Real dataset generation process. Comprising 478 diverse images collected from various sources such as books, documents, manuscripts, and newspapers, it offers a valuable resource for research in Urdu document analysis. It includes 358 pages for training and 120 pages for validation, featuring a wide range of styles, scales, and lighting conditions. It serves as a benchmark for evaluating printed Urdu text detection models, and the benchmark results of state-of-the-art models are provided. The Contour-Net model demonstrates the best performance in terms of h-mean.
A dataset made of 3D image data and their embeddings to test TomoSAM
It consists of 32x32 pixel images of shapes with multiple attributes (size, location, rotation, color). Each image is also paired with its ground truth information (attributes), and a natural language description (English) of the image.
Dataset of >200 synthetic cardboard texture images that were rendered with DoubeGum's cardboard shader in Blender. Used to generate Parcel3D, the dataset for our paper on single image 3D reconstructions of potentially damaged parcels.
Synthetic humans generated by the RePoGen method.
LLNeRF Dataset is a real-world dataset as a benchmark for model learning and evaluation. To obtain real low-illumination images with real noise distributions, photos are taken at nighttime outdoor scenes or low-light indoor scenes containing diverse objects. Since the ISP operations are device dependent and the noise distributions across devices are also different, the data is collected using a mobile phone camera and a DSLR camera to enrich the diversity of the dataset.
Replay is a collection of multi-view, multi-modal videos of humans interacting socially. Each scene is filmed in high production quality, from different viewpoints with several static cameras, as well as wearable action cameras, and recorded with a large array of microphones at different positions in the room. The full Replay dataset consists of 68 scenes of social interactions between people, such as playing boarding games, exercising, or unwrapping presents. Each scene is about 5 minutes long and filmed with 12 cameras, static and dynamic. Audio is captured separately by 12 binaural microphones and additional near-range microphones for each actor and for each egocentric video. All sensors are temporally synchronized, undistorted, geometrically calibrated, and color calibrated.
Persian Font Recognition (PFR)
Persian Text Image Segmentation (PTI SEG)
Different types of cells play a vital role in the initiation, development, invasion, metastasis and therapeutic response of tumors of various organs. For example, (1) most carcinomas originate from epithelial cells, (2) spatial arrangement of tumor infiltrating Lymphocytes (TILs) is associated with clinical outcome in several cancers, including the ones of breast, prostate, and lung (Fridman et. al., Nature Reviews Cancer, 2012), and (3) tumor associated macrophages (TAMs) influence diverse processes such as angiogenesis, neoplastic cell mitogenesis, antigen presentation, matrix degradation, and cytotoxicity in various tumors (Ruffel and Coussens, Cancer Cell, 2015). Thus, accurate identification and segmentation of nuclei of multiple cell-types is important for AI enabled characterization of tumor and its microenvironment.
We present a comprehensive dataset comprising a vast collection of raw mineral samples for the purpose of mineral recognition. The dataset encompasses more than 5,000 distinct mineral species and incorporates subsets for zero-shot and few-shot learning. In addition to the samples themselves, some entries in the dataset are accompanied by supplementary natural language descriptions, size measurements, and segmentation masks. For detailed information on each sample, please refer to the minerals_full.csv file.
VisAlign is a dataset for measuring AI-human visual alignment in terms of image classification, a fundamental task in machine perception. In order to evaluate AI-Human visual alignment, a dataset should encompass samples with various scenarios that may arise in the real world and have gold human perception labels. The dataset consists of three groups of samples, namely Must-Act (i.e., Must-Classify), Must-Abstain, and Uncertain, based on the quantity and clarity of visual information in an image and further divided into eight categories.
MGPFD is a dataset for multi-goal path finding problem, including a training dataset and a simulation dataset.
The "Microbundle Time-lapse Dataset" contains 24 experimental time-lapse images of cardiac microbundles using three distinct types of experimental testbed of beating lab grown hiPSC-based cardiac microbundles. Of the 24 experimental time-lapse images, 23 examples are brightfield videos, and a single example is a phase contrast video. We categorize the different experimental testbeds into 3 types, where "Type 1" includes movies obtained from standard experimental microbundle platforms termed microbundle strain gauges [1,2,3]. We refer to data collected from non-standard platforms termed FibroTUGs [4] as "Type 2" data, and "Type 3" data represents a highly versatile and diverse nanofabricated experimental platform [5,6].
Dataset release for the BMVC 2021 Paper "Few-Shot Domain Adaptation for Low Light RAW Image Enhancement"