Papers With Code 2 | ML Benchmarks, SotA Results & Code

This large collection of over 161,000 video-label pairs of video clips, shows humans drawing letters and digits in the air, and is used to evaluate a model’s ability to classify articulated motions correctly. Unlike existing video datasets, AirLetters’ accurate classification predictions rely on discerning motion patterns and integrating information presented by the video over time (i.e., over many frames of video). That study revealed that while trivial for humans, accurate representations of complex articulated motions remain an open problem for end-to-end learning for video understanding models.