Datasets

19,997 machine learning datasets

19,997 dataset results

Shelf&Tote Benchmark Dataset (MIT-Princeton Amazon Picking Challenge 2016 Shelf&Tote Benchmark Dataset)

Shelf&Tote Training Dataset (MIT-Princeton Amazon Picking Challenge 2016 Shelf&Tote Training Dataset)

Kvasir-Capsule

Kvasir-Capsule dataset is the largest publicly released VCE dataset. In total, the dataset contains 47,238 labeled images and 117 videos, where it captures anatomical landmarks and pathological and normal findings. The results is more than 4,741,621 images and video frames altogether.

2 papers0 benchmarksBiomedical, Images, Medical

Windows PE Malware

This is a dataset for the task of PE-type malware in the Windows operating system. The different samples in the dataset are classified into 8 main malware families: Trojan, Backdoor, Downloader, Worms, Spyware Adware, Dropper, Virus.

2 papers0 benchmarks

Semantic Trails

Semantic Trails Datasets (STDs) are two different datasets of semantically annotated trails created starting from check-ins performed on the Foursquare social network.

2 papers0 benchmarks

RePack

RePack is a dataset to study the detection of repackaged Android apps.

2 papers0 benchmarks

CinemAirSim

CinemAirSim is an extension of the well-known drone simulator, AirSim, with a cinematic camera as well as extended its API to control all of its parameters in real time, including various filming lenses and common cinematographic properties.

2 papers0 benchmarksEnvironment

ICDCN2019

This is a dataset consisting of complete network traces comprising benign and malicious traffic, which is feature-rich and publicly available.

2 papers0 benchmarks

Online Cryptocurrency-topic diffusion on Twitter, Telegram, and Discord

This Dataset is described in Charting the Landscape of Online Cryptocurrency Manipulation. IEEE Access (2020), a study that aims to map and assess the extent of cryptocurrency manipulations within and across the online ecosystems of Twitter, Telegram, and Discord. Starting from tweets mentioning cryptocurrencies, we leveraged and followed invite URLs from platform to platform, building the invite-link network, in order to study the invite link diffusion process.

2 papers0 benchmarks

Visual Servoing

Dataset for visual servoing (VS) and camera pose estimation. The images were obtained by a manipulator robot with an eye-in-hand camera in different poses. The labels represent the camera pose. It is possible to obtain the absolute pose of the camera without any pre-processing of the dataset, as well as the relative pose between images through matrix transformations. One may also use the dataset to get the camera's 6DoF speeds so that the visual servo control between two images can be performed.

2 papers0 benchmarks

ExpMRC

ExpMRC is a benchmark for the Explainability evaluation of Machine Reading Comprehension. ExpMRC contains four subsets of popular MRC datasets with additionally annotated evidences, including SQuAD, CMRC 2018, RACE+ (similar to RACE), and C3, covering span-extraction and multiple-choice questions MRC tasks in both English and Chinese.

2 papers0 benchmarksTexts

HLGD (Headline Grouping Dataset)

The Headline Grouping dataset is a binary classification dataset on pairs of news headline. For each pair of headline, the binary label indicates whether the two headlines are part of the same group (and describe the same underlying event), or whether they are in distinct groups. The dataset contains a total of 20k annotated headline pairs, further split in a train, validation and test portions.

2 papers0 benchmarksTexts

Caltech Cars

The Caltech Cars dataset consists of 126 rear-view photographs captured within parking lots. These images possess a resolution of 896 × 592 pixels, featuring a solitary vehicle as the primary subject. The acquisitions were made during daylight hours employing a handheld camera at roughly equivalent distances for all instances.

2 papers1 benchmarksImages

UCSD-Stills

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

2 papers1 benchmarks

OpenALPR-EU

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

2 papers1 benchmarks

UFPR-ADMR-v1

This dataset contains 2,000 dial meter images obtained on-site by employees of the Energy Company of Paraná (Copel), which serves more than 4 million consuming units in the Brazilian state of Paraná. The images were acquired with many different cameras and are available in the JPG format with 320×640 or 640×320 pixels (depending on the camera orientation).

2 papers1 benchmarksImages

LFM-BeyMS

This dataset is based on the LFM-1b [ and the Cultural LFM-1b [2] datasets. LFM-BeyMS includes equally-sized groups of both, beyond-mainstream and mainstream music listeners and thus, can be used for studying the characteristics of beyond-mainstream music listeners for recommendation experiments. For more details, we refer to our publication.

2 papers0 benchmarks

Darpa OpTC (Darpa Operationally Transparent Cyber (OpTC) Dataset)

Operationally Transparent Cyber (OpTC) was a technology transition pilot study funded under Boston Fusion Corp.'s Cyber APT Scenarios for Enterprise Systems (CASES) project. Its primary objective was to determine if DARPA Transparent Computing (TC) program technologies could scale without loss of detection performance to address cyber defense capability gaps identified in USTRANSCOM's Joint Deployment Distribution Enterprise (JDDE) solicitation for the government fiscal years 2019-2023. Boston Fusion along with two performers from the TC program (Five Directions providing endpoint telemetry (TA1) and BAE providing analysis over the data (TA2)) worked to scale their systems from two machines to one thousand machines. The OpTC team conducted scaling and detection tests in the fall of 2019. A third performer (Provatek), not originally associated with the TC program, acted as a red team and test coordinator. This data set represents a subset of that activity.

2 papers0 benchmarks

TabLeX

TabLeX is a large-scale benchmark dataset comprising table images generated from scientific articles. TabLeX consists of two subsets, one for table structure extraction and the other for table content extraction. Each table image is accompanied by its corresponding LATEX source code. To facilitate the development of robust table IE tools, TabLeX contains images in different aspect ratios and in a variety of fonts.

2 papers0 benchmarksImages

TrackML challenge Throughput phase dataset (Tracking Machine Learning Challenge)

The dataset comprises multiple independent events, where each event contains simulated measurements (essentially 3D points) of particles generated in a collision between proton bunches at the Large Hadron Collider at CERN. The goal of the tracking machine learning challenge is to group the recorded measurements or hit for each event into tracks, sets of hits that belong to the same initial particle. A solution must uniquely associate each hit to one track. The training dataset contains the recorded hit, their ground truth counterpart and their association to particles, and the initial parameters of those particles. The test dataset contains only the recorded hits.

2 papers0 benchmarks

PreviousPage 311 of 1000Next