19,997 machine learning datasets
19,997 dataset results
The MarKG dataset has 11,292 entities, 192 relations and 76,424 images, including 2,063 analogy entities and 27 analogy relations. The original intention of MarKG is to provide prior knowledge of analogy entities and relations for better multimodal analogical reasoning.
We present the VIS30K dataset, a collection of 29,689 images that represents 30 years of figures and tables from each track of the IEEE Visualization conference series (Vis, SciVis, InfoVis, VAST). VIS30K’s comprehensive coverage of the scientific literature in visualization not only reflects the progress of the field but also enables researchers to study the evolution of the state-of-the-art and to find relevant work based on graphical content. We describe the dataset and our semi-automatic collection process, which couples convolutional neural networks (CNN) with curation. Extracting figures and tables semi-automatically allows us to verify that no images are overlooked or extracted erroneously. To improve quality further, we engaged in a peer-search process for high-quality figures from early IEEE Visualization papers.
FluidLab is a simulation environment with a diverse set of manipulation tasks involving complex fluid dynamics. These tasks address interactions between solid and fluid as well as among multiple fluids.
This dataset is a collection of labelled PCAP files, both encrypted and unencrypted, across 10 applications, as well as a pandas dataframe in HDF5 format containing detailed metadata summarizing the connections from those files. It was created to assist the development of machine learning tools that would allow operators to see the traffic categories of both encrypted and unencrypted traffic flows. In particular, features of the network packet traffic timing and size information (both inside of and outside of the VPN) can be leveraged to predict the application category that generated the traffic.
BIWI 3D corpus comprises a total of 1109 sentences uttered by 14 native English speakers (6 males and 8 females). A real time 3D scanner and a professional microphone were used to capture the facial movements and the speech of the speakers. The dense dynamic face scans were acquired at 25 frames per second and the RMS error in the 3D reconstruction is about 0.5 mm. In order to ease automatic speech segmentation, we carried out the recordings in a anechoic room, with walls covered by sound wave-absorbing materials.
NIH-CXR-LT. NIH ChestXRay14 contains over 100,000 chest X-rays labeled with 14 pathologies, plus a “No Findings” class. We construct a single-label, long-tailed version of the NIH ChestXRay14 dataset by introducing five new disease findings described above. The resulting NIH-CXR-LT dataset has 20 classes, including 7 head classes, 10 medium classes, and 3 tail classes. NIH-CXR-LT contains 88,637 images labeled with one of 19 thorax diseases, with 68,058 training and 20,279 test images. The validation and balanced test sets contain 15 and 30 images per class, respectively.
🤖 Robo3D - The WOD-C Benchmark WOD-C is an evaluation benchmark heading toward robust and reliable 3D perception in autonomous driving. With it, we probe the robustness of 3D detectors and segmentors under out-of-distribution (OoD) scenarios against corruptions that occur in the real-world environment. Specifically, we consider natural corruptions happen in the following cases:
WEAR is an outdoor sports dataset for both vision- and inertial-based human activity recognition (HAR). The dataset comprises data from 22 participants performing a total of 18 different workout activities with untrimmed inertial (acceleration) and camera (egocentric video) data recorded at 11 different outside locations. Unlike previous egocentric datasets, WEAR provides a challenging prediction scenario marked by purposely introduced activity variations as well as an overall small information overlap across modalities.
This data set is collected for the ERC project: The Hands that Wrote the Bible: Digital Palaeography and Scribal Culture of the Dead Sea Scrolls PI: Mladen Popović Grant agreement ID: 640497
ImageNet-Hard is a new benchmark that comprises 10,980 images collected from various existing ImageNet-scale benchmarks (ImageNet, ImageNet-V2, ImageNet-Sketch, ImageNet-C, ImageNet-R, ImageNet-ReaL, ImageNet-A, and ObjectNet). This dataset poses a significant challenge to state-of-the-art vision models as merely zooming in often fails to improve their ability to classify images correctly. As a result, even the most advanced models, such as CLIP-ViT-L/14@336px, struggle to perform well on this dataset, achieving a mere 2.02% accuracy.
Year after year, the demand for ever-better smartphone photos continues to grow, in particular in the domain of portrait photography. Manufacturers thus use perceptual quality criteria throughout the development of smartphone cameras. This costly procedure can be partially replaced by automated learning-based methods for image quality assessment (IQA). Due to its subjective nature, it is necessary to estimate and guarantee the consistency of the IQA process, a characteristic lacking in the mean opinion scores (MOS) widely used for crowdsourcing IQA. In addition, existing blind IQA (BIQA) datasets pay little attention to the difficulty of cross-content assessment, which may degrade the quality of annotations. This paper introduces PIQ23, a portrait-specific IQA dataset of 5116 images of 50 predefined scenarios acquired by 100 smartphones, covering a high variety of brands, models, and use cases. The dataset includes individuals of various genders and ethnicities who have given explicit
OntoEvent is a new ED dataset with event correlations. It contains 13 supertypes with 100 subtypes, derived from 4,115 documents with 60,546 event instances.
Dataset Description: Toward making use of the complementary information captured by the various bioactivity types, including IC50, K(i), and K(d), Tang et al. introduces a model-based integration approach, termed KIBA to generate an integrated drug-target bioactivity matrix.
RoCoG-v2 (Robot Control Gestures) is a dataset intended to support the study of synthetic-to-real and ground-to-air video domain adaptation. It contains over 100K synthetically-generated videos of human avatars performing gestures from seven (7) classes. It also provides videos of real humans performing the same gestures from both ground and air perspectives
With the advance of AI, road object detection has been a prominent topic in computer vision, mostly using perspective cameras. Fisheye lens provides omnidirectional wide coverage for using fewer cameras to monitor road intersections, however with view distortions. The dataset will be available on the GitHub (https://github.com/MoyoG/FishEye8K) with PASCAL VOC, MS COCO, and YOLO annotation formats.
Maximilian B. Kiss, Sophia B. Coban, K. Joost Batenburg, Tristan van Leeuwen, and Felix Lucka "2DeteCT - A large 2D expandable, trainable, experimental Computed Tomography dataset for machine learning", Sci Data 10, 576 (2023) or arXiv:2306.05907 (2023)
UruDendro is a database of wood cross section images of commercially grown Pinus taeda trees from northern Uruguay. It is form by 64 RGB wood images, their tree rings delineations and pith location.
IoT devices captures
CHI3D is a lab-based accurate 3D motion capture dataset with 631 sequences containing 2,525 contact events,728,664 ground truth 3d poses, as well as FlickrCI3D, a dataset of 11,216 images, with 14,081 processed pairs of people, and 81,233 facet-level surface correspondences.
Jiyoung Lee, Seungryong Kim, Sunok Kim, Jungin Park, Kwanghoon Sohn; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 10143-10152