Datasets

19,997 machine learning datasets

19,997 dataset results

3D Datasets of Broccoli in the Field

This work was undertaken by members of the Lincoln Centre for Autonomous Systems, University of Lincoln, UK. The four data collection sessions were conducted at three different sites in Lincolnshire, UK and one in Murcia, Spain (see Fig. 1). The sessions were conducted at the beginning and towards the end of harvesting season in UK and at the end of the harvest in Spain. The variety of broccoli plants grown in UK is called Iron Man whilst the variety grown in Spain is called Titanium.The weather during UK data capture included a mixture of different conditions including sunny, overcast and raining with broccoli varying in maturity levels from small to larger to already harvested, while the conditions for data capture in Spain included strong sunlight and mature plants at the very end of the harvesting season. The tractor was driven through the broccoli field at a slow walking speed with two rows of broccoli plants being imaged by the RGB-D sensor.

2 papers0 benchmarksBiology, Images, RGB-D

PHANTOM (Physical Anomalous Trajectory or Motion (PHANTOM))

To evaluate the presented approaches, we created the Physical Anomalous Trajectory or Motion (PHANTOM) dataset consisting of six classes featuring everyday objects or physical setups, and showing nine different kinds of anomalies. We designed our classes to evaluate detection of various modes of video abnormalities that are generally excluded in video AD settings.

2 papers6 benchmarksRGB Video

CPPE-5 (Medical Personal Protective Equipment Dataset)

CPPE - 5 (Medical Personal Protective Equipment) is a new challenging dataset with the goal to allow the study of subordinate categorization of medical personal protective equipments, which is not possible with other popular data sets that focus on broad level categories.

2 papers30 benchmarksImages

Grasping dataset: suction-based (suction-based-grasping-dataset)

A small and simple dataset featuring RGB-D images and heightmaps of various objects in a bin with manually annotated suctionable regions

2 papers0 benchmarks

Phishing and Benign Websites

An annotated dataset of 38,800 phishing and benign websites.

2 papers0 benchmarksTexts

Moon Phases (moon phases and derived)

Dates with Moon phases extended days until next phase (1992/1/4 to 2027/12/20)

2 papers0 benchmarks

IMS Bearing Dataset

Bearing acceleration data from three run-to-failure experiments on a loaded shaft. The data set was provided by the Center for Intelligent Maintenance Systems (IMS), University of Cincinnati.

2 papers0 benchmarksTime series

VGG-Sound Sync

VGG-Sound Sync is an audio-visual synchronisation benchmark based on videos collected from YouTube. VGG-Sound Sync contains over 100k video clips, spanning 160 classes and can be downloaded here.

2 papers0 benchmarksVideos

MetaVD (Meta Video Dataset)

MetaVD is a Meta Video Dataset for enhancing human action recognition datasets. It provides human-annotated relationship labels between action classes across human action recognition datasets. MetaVD is proposed in the following paper: Yuya Yoshikawa, Yutaro Shigeto, and Akikazu Takeuchi. "MetaVD: A Meta Video Dataset for enhancing human action recognition datasets." Computer Vision and Image Understanding 212 (2021): 103276. [link]

2 papers0 benchmarksGraphs, Videos

ProSLU (Profile-based Spoken Language Understanding)

In the paper, to bridge the research gap, we propose a new and important task, Profile-based Spoken Language Understanding (ProSLU), which requires a model not only depends on the text but also on the given supporting profile information. We further introduce a Chinese human-annotated dataset, with over 5K utterances annotated with intent and slots, and corresponding supporting profile information. In total, we provide three types of supporting profile information: (1) Knowledge Graph (KG) consists of entities with rich attributes, (2) User Profile (UP) is composed of user settings and information, (3) Context Awareness(CA) is user state and environmental information.

2 papers2 benchmarksTexts

DIDI Dataset (The DIDI dataset: Digital Ink Diagram data)

The dataset contains digital ink drawings of diagrams with dynamic drawing information. The dataset aims to foster research in interactive graphical symbolic understanding. The dataset was obtained using a prompted data collection effort.

2 papers0 benchmarks

RLD (Responsive Listener Dataset)

RLD (Responsive Listener Dataset) is a conversation video corpus collected from the public resources featuring 67 speakers, 76 listeners with three different attitudes. Through non-verbal signals response to the speakers' words, intonations, or behaviors in real-time, listeners show how they are engaged in dialogue.

2 papers0 benchmarksVideos

EMDS-6

In EMDS-6, there are 21 classes of environmental microorganisms (EMs). In each calss, there are 40 EM original images and their corresponding binary groud truth images. In ground truth images, the foreground is white and background is black.

2 papers0 benchmarksImages

ITB (Informative Tracking Benchmark)

Informative Tracking Benchmark (ITB) is a small and informative tracking benchmark with 7% out of 1.2 M frames of existing and newly collected datasets, which enables efficient evaluation while ensuring effectiveness. Specifically, the authors designed a quality assessment mechanism to select the most informative sequences from existing benchmarks taking into account 1) challenging level, 2) discriminative strength, 3) and density of appearance variations. Furthermore, they collect additional sequences to ensure the diversity and balance of tracking scenarios, leading to a total of 20 sequences for each scenario.

2 papers2 benchmarksVideos

PerCQA

PerCQA is the first Persian dataset for CQA (Community Question Answering). This dataset contains the questions and answers crawled from the most well-known Persian forum.

2 papers0 benchmarksTexts

VocBench

VocBench is a framework that benchmark the performance of state-of-the art neural vocoders. VocBench uses a systematic study to evaluate different neural vocoders in a shared environment that enables a fair comparison between them.

2 papers0 benchmarksSpeech

CD&S (Corn Disease and Severity)

The Corn Disease and Severity (CD&S) dataset consists of 511, 524, and 562, field acquired raw images, corresponding to three common foliar corn diseases, namely Northern Leaf Blight (NLB), Gray Leaf Spot (GLS), and Northern Leaf Spot.

2 papers0 benchmarksImages

Curlie

Curlie dataset is a dataset with more than 1M websites in 92 languages with relative labels collected from Curlie, the largest multilingual crowdsourced Web directory. The dataset contains 14 website categories aligned across languages. It is used for language-agnostic website embedding and classification

2 papers0 benchmarks

Semantic Question Similarity in Arabic (NSURL-2019 Shared Task 8: Semantic Question Similarity in Arabic)

NSURL-2019 Shared Task 8: Semantic Question Similarity in Arabic

2 papers0 benchmarks

DeepCom-Java

The Java dataset introduced in DeepCom (Deep Code Comment Generation), commonly used to evaluate automated code summarization.

2 papers2 benchmarks

PreviousPage 320 of 1000Next