19,997 machine learning datasets
19,997 dataset results
A Benchmark for Robust Multi-Hop Spatial Reasoning in Texts
Counting repetitive actions are widely seen in human activities such as physical exercise. Existing methods focus on performing repetitive action counting in short videos, which is tough for dealing with longer videos in more realistic scenarios. In the data-driven era, the degradation of such generalization capability is mainly attributed to the lack of long video datasets. To complement this margin, we introduce a new large-scale repetitive action counting dataset called RepCount covering a wide variety of video lengths, along with more realistic situations where action interruption or action inconsistencies occur in the video. Besides, we also provide a fine-grained annotation of the action cycles instead of just counting annotation along with a numerical value. Such a dataset contains 1451 videos with about 20000 annotations, which is more challenging. Furthermore, the dataset consists of two subsets namely Part-A and Part-B. The videos in Part-A are fetched from YouTube, while
MuCGEC is a multi-reference multi-source evaluation dataset for Chinese Grammatical Error Correction (CGEC), consisting of 7,063 sentences collected from three different Chinese-as-a-Second-Language (CSL) learner sources. Each sentence has been corrected by three annotators, and their corrections are meticulously reviewed by an expert, resulting in 2.3 references per sentence.
RTMV is a large-scale synthetic dataset for novel view synthesis consisting of ∼300k images rendered from nearly 2000 complex scenes using high-quality ray tracing at high resolution (1600 × 1600 pixels). The dataset is orders of magnitude larger than existing synthetic datasets for novel view synthesis, thus providing a large unified benchmark for both training and evaluation. Using 4 distinct sources of high-quality 3D meshes, the scenes of our dataset exhibit challenging variations in camera views, lighting, shape, materials, and textures.
We construct a dataset named CPED from 40 Chinese TV shows. CPED consists of multisource knowledge related to empathy and personal characteristic. This knowledge covers 13 emotions, gender, Big Five personality traits, 19 dialogue acts and other knowledge.
We introduce a large image dataset HaGRID (HAnd Gesture Recognition Image Dataset) for hand gesture recognition (HGR) systems. You can use it for image classification or image detection tasks. Proposed dataset allows to build HGR systems, which can be used in video conferencing services (Zoom, Skype, Discord, Jazz etc.), home automation systems, the automotive sector, etc.
ToxCast is an initiative by the U.S. Environmental Protection Agency (EPA) aimed at predicting the potential toxicity of various chemical compounds. It involves high-throughput screening assays that evaluate thousands of chemicals across multiple biological endpoints. These endpoints cover a wide range of effects, including cell cycle disruptions, interactions with steroid receptors, and cytotoxicity.
Pile of Law is a ∼256GB (and growing) dataset of legal and administrative data which can be used for assessing norms on data sanitization across legal and administrative settings.
ArSarcasm-v2 is an extension of the original ArSarcasm dataset published along with the paper From Arabic Sentiment Analysis to Sarcasm Detection: The ArSarcasm Dataset. ArSarcasm-v2 conisists of ArSarcasm along with portions of DAICT corpus and some new tweets. Each tweet was annotated for sarcasm, sentiment and dialect. The final dataset consists of 15,548 tweets divided into 12,548 training tweets and 3,000 testing tweets. ArSarcasm-v2 was used and released as a part of the shared task on sarcasm detection and sentiment analysis in Arabic.
VGMIDI is a dataset of piano arrangements of video game soundtracks. It contains 200 MIDI pieces labeled according to emotion and 3,850 unlabeled pieces. Each labeled piece was annotated by 30 human subjects according to the Circumplex (valence-arousal) model of emotion using a custom web tool.
The development of safety-oriented research ideas and applications requires fine-grained vehicle trajectory data that not only has high accuracy but also captures a substantial number of critical safety events. This paper introduces the CitySim Dataset, which was devised with a core objective of facilitating safety-based research and applications. CitySim has vehicle trajectories extracted from 1140-minutes of drone videos recorded at 12 different locations. It covers a variety of road geometries including freeway basic segments, weaving segments, expressway merge/diverge segments, signalized intersections, stop-controlled intersections, and intersections without sign/signal control. CitySim trajectories were generated through a five-step procedure which ensured the trajectory accuracy. Furthermore, the dataset provides vehicle rotated bounding box information which is demonstrated to improve safety evaluation. Compared to other video-based trajectory datasets, the CitySim Dataset has
Breaking Bad is a large-scale dataset of fractured objects. The dataset contains around 10k meshes from PartNet and Thingi10k. For each mesh, 20 fracture modes are pre-computed and then simulate 80 fractures from them, resulting in a total of 1M breakdown patterns. This dataset serves as a benchmark that enables the study of fractured object reassembly and presents new challenges for geometric shape understanding.
Node classification on Wisconsin with the fixed 48%/32%/20% splits provided by Geom-GCN.
Node classification on Cora with the fixed 48%/32%/20% splits provided by Geom-GCN.
Node classification on Citeseer with the fixed 48%/32%/20% splits provided by Geom-GCN.
Node classification on PubMed with the fixed 48%/32%/20% splits provided by Geom-GCN.
The N-ImageNet dataset is an event-camera counterpart for the ImageNet dataset. The dataset is obtained by moving an event camera around a monitor displaying images from ImageNet. N-ImageNet contains approximately 1,300k training samples and 50k validation samples. In addition, the dataset also contains variants of the validation dataset recorded under a wide range of lighting or camera trajectories. Additional details about the dataset are explained in the paper available through this link. Please cite this paper if you make use of the dataset.
Large-scale American Sign Language (ASL) - English dataset collected from online video sites (e.g., YouTube). OpenASL contains 288 hours of ASL videos in multiple domains from over 200 signers.
Existing hate speech datasets contain only textual data. We create a new manually annotated multimodal hate speech dataset formed by 150,000 tweets, each one of them containing text and an image. We call the dataset MMHS150K.
CelebV-Text comprises 70,000 in-the-wild face video clips with diverse visual content, each paired with 20 texts generated using the proposed semi-automatic text generation strategy. The provided texts describes both static and dynamic attributes precisely.