TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Rescaling Egocentric Vision

Rescaling Egocentric Vision

Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Evangelos Kazakos, Jian Ma, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

2020-06-23Action DetectionCross-Modal RetrievalAction AnticipationAction RecognitionRetrievalUnsupervised Domain AdaptationDomain Adaptation
PaperPDFCodeCode(official)CodeCodeCode(official)CodeCode

Abstract

This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version, EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments). This collection enables new challenges such as action detection and evaluating the "test of time" - i.e. whether models trained on data collected in 2018 can generalise to new footage collected two years later. The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics

Results

TaskDatasetMetricValueModel
Activity RecognitionEPIC-KITCHENS-100Action@137.39TSM
Activity RecognitionEPIC-KITCHENS-100Action@136.81SlowFast
Activity RecognitionEPIC-KITCHENS-100Action@135.55TBN
Activity RecognitionEPIC-KITCHENS-100Action@135.28TRN
Activity RecognitionEPIC-KITCHENS-100Action@133.57TSN
Activity RecognitionEPIC-KITCHENS-100Recall@513.94RU-LSTM
Action RecognitionEPIC-KITCHENS-100Action@137.39TSM
Action RecognitionEPIC-KITCHENS-100Action@136.81SlowFast
Action RecognitionEPIC-KITCHENS-100Action@135.55TBN
Action RecognitionEPIC-KITCHENS-100Action@135.28TRN
Action RecognitionEPIC-KITCHENS-100Action@133.57TSN
Action RecognitionEPIC-KITCHENS-100Recall@513.94RU-LSTM
Action AnticipationEPIC-KITCHENS-100Recall@513.94RU-LSTM
2D Human Pose EstimationEPIC-KITCHENS-100Recall@513.94RU-LSTM
Action Recognition In VideosEPIC-KITCHENS-100Recall@513.94RU-LSTM

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16