3,275 machine learning datasets
3,275 dataset results
The goal of this project is to present two new datasets that seek to expand the capability of the Learning to See in the Dark Low-light enhancement CNN for the Canon 6D DSLR, and explore how the network performs when modified in various ways, both pruning it and making it deeper.
Overview: This dataset encompasses a compilation of 6,700 executed scoops (excavations), mapped across a vast spectrum of materials, terrain topography, and compositions.
The dataset is generated from the study of computational reproducibility of Jupyter notebooks from biomedical publications. Our focus lies in evaluating the extent of reproducibility of Jupyter notebooks derived from GitHub repositories linked to publications present in the biomedical literature repository, PubMed Central. We analyzed the reproducibility of Jupyter notebooks from GitHub repositories associated with publications indexed in the biomedical literature repository PubMed Central. The dataset includes the metadata information of the journals, publications, the Github repositories mentioned in the publications and the notebooks present in the Github repositories.
This repository contains the dataset for the study of the computational reproducibility of Jupyter notebooks from biomedical publications. We analyzed the reproducibility of Jupyter notebooks from GitHub repositories associated with publications indexed in the biomedical literature repository PubMed Central. The dataset includes the metadata information of the journals, publications, the Github repositories mentioned in the publications and the notebooks present in the Github repositories.
We build a new evaluation set by adding spotting words to the images of ImageNet 2012 evaluation sets. There are 1,000 categories in ImageNet. For each category c, we find its most confusing category c*and spot the category name to every evaluation image.
ISEKAI dataset’s images are generated by Midjourney’s text-to-image model using well-crafted instructions. Images were manually selected to ensure core concept consistency. The dataset currently comprises 20 groups, and 40 categories in total (continues to grow). Each group pairs a new concept with a related real-world concept, like "octopus vacuum" and "octopus." These can serve as challenging negative samples for each other. Each concept has no less than 32 images, supporting multi-shot examples.
The data consists of 21 images of microtubules in PFA-fixed NIH 3T3 mouse embryonic fibroblasts (DSMZ: ACC59) labeled with a mouse anti-alpha-tubulin monoclonal IgG1 antibody (Thermofisher A11126, primary antibody) and visualized by a blue-fluorescent Alexa Fluor 405 goat anti-mouse IgG antibody (Thermofisher A-31553, secondary antibody). Acquisition of the images was performed using a confocal microscope (Olympus IX81).
We introduce USPTO-30K, a large-scale benchmark dataset of annotated molecule images, which overcomes these limitations. It is created using the pairs of images and MolFiles by the United States Patent and Trademark Office. Each molecule was independently selected among all the available documents from 2001 to 2020. The set consists of three subsets to decouple the study of clean molecules, molecules with abbreviations and large molecules.
The set is created using molecule SMILES retrieved from the database PubChem. Images are then generated from SMILES using the molecule drawing library RDKit. The synthetic set is augmented at multiple levels:
We introduce the trapped yeast cell (TYC) dataset, a novel dataset for understanding instance-level semantics and motions of cells in microstructures. We release $105$ dense annotated high-resolution brightfield microscopy images, including about $19$k instance masks. We also release $261$ curated video clips composed of $1293$ high-resolution microscopy images to facilitate unsupervised understanding of cell motions and morphology.
SD7K is the only large-scale high-resolution dataset that satisfies all important data features about document shadow currently, which covers a large number of document shadow images. Mean resolution is $2462 \times 3699$
We introduce a new AI-ready computational pathology dataset containing restained and co-registered digitized images from eight head-and-neck squamous cell carcinoma patients. Specifically, the same tumor sections were stained with the expensive multiplex immunofluorescence (mIF) assay first and then restained with cheaper multiplex immunohistochemistry (mIHC). This is a first public dataset that demonstrates the equivalence of these two staining methods which in turn allows several use cases; due to the equivalence, our cheaper mIHC staining protocol can offset the need for expensive mIF staining/scanning which requires highly skilled lab technicians. As opposed to subjective and error-prone immune cell annotations from individual pathologists (disagreement > 50%) to drive SOTA deep learning approaches, this dataset provides objective immune and tumor cell annotations via mIF/mIHC restaining for more reproducible and accurate characterization of tumor immune microenvironment (e.g. for
The CLPD dataset comprises 1200 images that encompass various regions within mainland China. These images were sourced from diverse origins, including the internet, mobile devices, and in-car recording devices. While the majority of the images were recorded during daylight hours, a portion of them were captured at nighttime. The dataset predominantly features passenger cars, with a limited number of images depicting trucks and buses.
CD-HARD comprises 102 images featuring vehicles with oblique license plates sourced from the Cars dataset. Each image within this dataset exclusively depicts a single vehicle and was captured during daylight hours. While the dataset encompasses images from diverse geographic regions, it predominantly consists of images seemingly taken in European locales.
In this dataset an uppertorso humanoid robot with 7-DOF arm explored 100 different objects belonging to 20 different categories using 10 behaviors: Look, Crush, Grasp, Hold, Lift, Drop, Poke, Push, Shake and Tap.
This dataset contains the ground truth for urban changes occurred in Mariupol, Ukraine for the time frame 2017-2020. This is useful for transferring the urban change monitoring network ERCNN-DRS (https://github.com/It4innovations/ERCNN-DRS_urban_change_monitoring) to that region.
The CapMIT1003 database contains captions and clicks collected for images from the MIT1003 database, for which reference eye scanpath are available. The database is distributed as a single SQLite3 database named capmit1003.db. For convenience, a lightweight Python class to access the database is provided in the official repository
Robot@Home2, is an enhanced version aimed at improving usability and functionality for developing and testing mobile robotics and computer vision algorithms. Robot@Home2 consists of three main components. Firstly, a relational database that states the contextual information and data links, compatible with Standard Query Language. Secondly,a Python package for managing the database, including downloading, querying, and interfacing functions. Finally, learning resources in the form of Jupyter notebooks, runnable locally or on the Google Colab platform, enabling users to explore the dataset without local installations. These freely available tools are expected to enhance the ease of exploiting the Robot@Home dataset and accelerate research in computer vision and robotics.
InfraParis is a novel and versatile dataset supporting multiple tasks across three modalities: RGB, depth, and infrared. From the city to the suburbs, it contains a variety of styles in different areas of the greater Paris area, providing rich semantic information. InfraParis contains 7301 images with bounding boxes and full semantic (19 classes) annotations. We assess various state-of-the-art baseline techniques, encompassing models for the tasks of semantic segmentation, object detection, and depth estimation.
Two separate datasets of calibration runs in front of a calibration board: - 4IMUs+3Cams -4IMUs+4Cams