19,997 machine learning datasets
19,997 dataset results
The HO-3D v3 is the version 3 of the HO-3D dataset with more accurate hand-object poses. HO-3D v3 provides more accurate annotations for both the hand and object poses thus resulting in better estimates of contact regions between the hand and the object. The table below shows the statistics of the HO-3D v2 compared to the HO-3D v3 datasets.
Abstract: There have been emerging a number of benchmarks and techniques for the detection of deepfakes. However, very few works study the detection of incrementally appearing deepfakes in the real-world scenarios. To simulate the wild scenes, this paper suggests a continual deepfake detection benchmark (CDDB) over a new collection of deepfakes from both known and unknown generative models. The suggested CDDB designs multiple evaluations on the detection over easy, hard, and long sequence of deepfake tasks, with a set of appropriate measures. In addition, we exploit multiple approaches to adapt multiclass incremental learning methods, commonly used in the continual visual recognition, to the continual deepfake detection problem. We evaluate existing methods, including their adapted ones, on the proposed CDDB. Within the proposed benchmark, we explore some commonly known essentials of standard continual learning. Our study provides new insights on these essentials in the context of cont
CVCS is a synthetic multi-view people dataset, containing 31 scenes, where 23 are for training and the rest 8 for testing. The scene size varies from about 10m∗20m to 90m∗80m. Each scene contains 100 multi-view frames. The ground plane map resolution is 900×800, where each grid stands for 0.1 meters in the real world. In training, 5 views are randomly selected 5 times in each iteration per scene frame, and the same view number is randomly selected 21 times in evaluation.
DiDi is a distractor-distilled tracking dataset created to address the limitation of low distractor presence in current visual object tracking benchmarks. To enhance the evaluation and analysis of tracking performance amidst distractors, we have semi-automatically distilled several existing benchmarks into the DiDi dataset. The dataset is available for download at this URL: https://go.vicos.si/didi
The AI City Challenge, hosted at CVPR 2024, focuses on harnessing AI to enhance operational efficiency in physical settings such as retail and warehouse environments, and Intelligent Traffic Systems (ITS). It aims to utilize AI for actionable insights from sensor data, like camera feeds, to improve traffic safety and transportation outcomes. This year, the challenge spotlights two key areas with significant potential: retail business and ITS.
SST-5 is the Stanford Sentiment Treebank 5-way classification dataset (positive, somewhat positive, neutral, somewhat negative, negative). To create SST-3 (positive, neutral, negative), the 'somewhat positive' class was merged and treated as 'positive'. Similarly, the 'somewhat negative' class was merged and treated as 'negative'.
The PLAsTiCC dataset is a collection of simulated light curves from the Photometric LSST Astronomical Time-Series Classification Challenge. This diverse dataset contains 14 types of astronomical time-varying objects, simulated using the expected instrument characteristics and survey strategy of the upcoming Legacy Survey of Space and Time [LSST 79] conducted at the Vera C. Rubin Observatory.
Significant progress has been made in building generalist robot manipulation policies, yet their scalable and reproducible evaluation remains challenging, as real-world evaluation is operationally expensive and inefficient. We propose employing physical simulators as efficient, scalable, and informative complements to real-world evaluations. These simulation evaluations offer valuable quantitative metrics for checkpoint selection, insights into potential real-world policy behaviors or failure modes, and standardized setups to enhance reproducibility.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
This repository contains the released respiratory sound database for IEEE BioCAS Respiratory Sound Track Challenges. Please refer to this link (Respiratory Sound Challenge) for more information about the challenges.
Open-source dataset
ImgEdit is a large-scale, high-quality image-editing dataset comprising 1.2 million carefully curated edit pairs, which contain both novel and complex single-turn edits, as well as challenging multi-turn tasks.
The dataset was created using high-resolution (8 m) satellite imagery from the Gaofen series (Gaofen-2 and Gaofen-6), captured in 2019 over Maduo County, China, located in the Yellow River source area. This region is known for its high-altitude, alpine grasslands, and complex terrain, with coordinates between 33°50'–35°40' N latitude and 96°50'–99°20' E longitude.
The SciTail dataset is an entailment dataset created from multiple-choice science exams and web sentences. Each question and the correct answer choice are converted into an assertive statement to form the hypothesis. We use information retrieval to obtain relevant text from a large text corpus of web sentences, and use these sentences as a premise P. We crowdsource the annotation of such premise-hypothesis pair as supports (entails) or not (neutral), in order to create the SciTail dataset. The dataset contains 27,026 examples with 10,101 examples with entails label and 16,925 examples with neutral label.
The Collective Activity Dataset contains 5 different collective activities: crossing, walking, waiting, talking, and queueing and 44 short video sequences some of which were recorded by consumer hand-held digital camera with varying view point.
CCGbank is a translation of the Penn Treebank into a corpus of Combinatory Categorial Grammar derivations. It pairs syntactic derivations with sets of word-word dependencies which approximate the underlying predicate-argument structure. The dataset contains 99.44% of the sentences in the Penn Treebank, for which it corrects a number of inconsistencies and errors in the original annotation.
Thai-Chi-HD is a high resolution dataset which can be used as reference benchmark for evaluating frameworks for image animation and video generation. It consists of cropped videos of full human bodies performing Tai Chi actions.
The Ecoli dataset is a dataset for protein localization. It contains 336 E.coli proteins split into 8 different classes.
Leonardo Filipe Rodrigues Ribeiro, Pedro H. P. Saverese, and Daniel R. Figueiredo. struc2vec: Learning node representations from structural identity.
Multi-exposure image fusion (MEF) is considered an effective quality enhancement technique widely adopted in consumer electronics, but little work has been dedicated to the perceptual quality assessment of multi-exposure fused images. In this paper, we first build an MEF database and carry out a subjective user study to evaluate the quality of images generated by different MEF algorithms. There are several useful findings. First, considerable agreement has been observed among human subjects on the quality of MEF images. Second, no single state-of-the-art MEF algorithm produces the best quality for all test images. Third, the existing objective quality models for general image fusion are very limited in predicting perceived quality of MEF images. Motivated by the lack of appropriate objective models, we propose a novel objective image quality assessment (IQA) algorithm for MEF images based on the principle of the structural similarity approach and a novel measure of patch structural con