19,997 machine learning datasets
19,997 dataset results
Large-scale Anomaly Detection (LAD) is a database to benchmark anomaly detection in video sequences, which is featured in two aspects. 1) It contains 2000 video sequences including normal and abnormal video clips with 14 anomaly categories including crash, fire, violence, etc. with large scene varieties, making it the largest anomaly analysis database to date. 2) It provides the annotation data, including video-level labels (abnormal/normal video, anomaly type) and frame-level labels (abnormal/normal video frame) to facilitate anomaly detection.
The full IFCNet dataset currently consists of 19,000 CAD models distributed over 65 classes according to the taxonomy of the Industry Foundation Classes (IFC) standard. The IFC standard provides an open data exchange format for projects in the Architecture, Engineering and Construction (AEC) domain. Due to high imbalances with respect to the number of objects in each class, a subset of 8,000 objects from 20 classes is selected to form the IFCNetCore dataset, providing a more balanced distribution. Apart from the geometric information of the CAD model, most objects also have semantic information in the form of key-value pairs, enums or lists, which are relevant to different stages of the construction process.
The Kinships dataset describes relationships between members of the Australian tribe Alyawarra and consists of 10,686 triples. It contains 104 entities representing members of the tribe and 26 relationship types that represent kinship terms such as Adiadya or Umbaidya.
Continuous control tasks in the Box2D simulator.
Golos is a Russian speech dataset suitable for speech research. The dataset mainly consists of recorded audio files manually annotated on the crowd-sourcing platform. The total duration of the audio is about 1240 hours.
The dataset contains 578,731 structures for methane combustion and their energies and forces under MN15/6-31G** level.
EchoCP is an echocardiography dataset in cTTE targeting PFO (Patent foramen ovale) diagnosis. EchoCP consists of 30 patients with both rest and Valsalva maneuver videos which covers various PFO grades.
Quality, diversity, and size of training dataset are critical factors for learning-based gaze estimators. We create two datasets satisfying these criteria for near-eye gaze estimation under infrared illumination: a synthetic dataset using anatomically-informed eye and face models with variations in face shape, gaze direction, pupil and iris, skin tone, and external conditions (two million images at 1280x960), and a real-world dataset collected with 35 subjects (2.5 million images at 640x480). Using our datasets, we train a neural network for gaze estimation, achieving 2.06 (+/- 0.44) degrees of accuracy across a wide 30 x 40 degrees field of view on real subjects excluded from training and 0.5 degrees best-case accuracy (across the same field of view) when explicitly trained for one real subject. We also train a variant of our network to perform pupil estimation, showing higher robustness than previous methods. Our network requires fewer convolutional layers than previous networks, ach
The dataset consists of a total of 20 videos, each of which is 5.5 minutes long in duration. The videos are captured at a resolution of 1024x1024 and at 30 frames per second. Each video contains only one pig performing the Novel Object Recognition task.
OpenSLR is a repository of open speech and language resources, including large-scale transcribed audio corpora and related software. It serves as a central platform for researchers and practitioners to access and share datasets used in speech recognition (ASR), text-to-speech (TTS), and linguistic research.
Morph Call is a suite of 46 probing tasks for four Indo-European languages that fall under different morphology: Russian, French, English, and German. The tasks are designed to explore the morphosyntactic content of multilingual transformers which is a less studied aspect at the moment.
Blender Cycles Ray-tracing (BCR) dataset contains 2449 high-quality images rendered from 1463 models. We render the images at a range of spp rates, including 1-8, 12, 16, 32, 64, 250, 1000, and 4000 spp. All the images are rendered at the resolution of 1080p. Each image contains not only the final rendered result but also the intermediate render layers, including albedo, normal, diffuse, glossy, and so on.
Ambiguous-HOI is a challenging dataset containing ambiguous human-object interaction images for HOI detection based on HICO-DET.
This dataset is a collection of input-label pairs where each input is in the form of a numerical dataset, itself a set of input and output pairs {(x, y)}, and the corresponding label is a string encoding the symbolic expression governing the relationship between variables in the numerical dataset.
Introduction Iris is considered one of the most accurate and reliable biometric modality. Iris is more stable and distinctive compared with fingerprint, face, voice, etc, and difficult to be replicated for spoof attacks. Although an iris pattern is naturally an ideal identifier, the development of a high-performance iris recognition algorithm and transferring it from laboratory to field application is still a challenging task. In practical applications, the iris recognition system must face various unpredictable iris image degraded. For example, recognition of low-quality iris images, non-cooperative iris images, long-range iris images, and moving iris images are all huge problems in iris recognition. We believe that the first step in solving these problems is to design and develop a database of iris images that includes all of these degraded.
Amazon-PQA is a product question-answer dataset. The Amazon-PQA dataset includes questions and their answers that are published on Amazon website, along with the public product information and category (Amazon Browse Node name). It contains more than 8M questions from 1M+ products.
We present TNCR, a new table dataset with varying image quality collected from free open source websites. TNCR dataset can be used for table detection in scanned document images and their classification into 5 different classes.
This dataset is a composition of scenes taken by SPOT sensor in 2005 over four counties in the State of Minas Gerais, Brazil: Arceburgo, Guaranesia, Guaxupé and Monte Santo. It has multispectral high-resolution scenes of coffee crops and non-coffee areas. It has many intraclass variance caused by different crop management technique, as well as scenes with different plant ages and/or with spectral distortions caused by shadows.
ScanBank is a benchmark dataset for figure extraction from scanned electronic theses and dissertations containing 10 thousand scanned page images, manually labeled by humans as to the presence of the 3.3 thousand figures or tables found therein.
This dataset includes all music sources, background noises and impulse-reponses (IR) samples and conversation speech that have been used in the work "Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning" ICASSP 2021 (https://arxiv.org/abs/2010.11910).