TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D C...

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

Guo Chen, Sen Xing, Zhe Chen, Yi Wang, Kunchang Li, Yizhuo Li, Yi Liu, Jiahao Wang, Yin-Dong Zheng, Bingkun Huang, Zhiyu Zhao, Junting Pan, Yifei HUANG, Zun Wang, Jiashuo Yu, Yinan He, Hongjie Zhang, Tong Lu, Yali Wang, LiMin Wang, Yu Qiao

2022-11-17Short-term Object Interaction AnticipationState Change Object DetectionMoment QueriesFuture Hand PredictionVideo UnderstandingNatural Language Queriesobject-detectionObject Detection
PaperPDFCodeCode(official)

Abstract

In this report, we present our champion solutions to five tracks at Ego4D challenge. We leverage our developed InternVideo, a video foundation model, for five Ego4D tasks, including Moment Queries, Natural Language Queries, Future Hand Prediction, State Change Object Detection, and Short-term Object Interaction Anticipation. InternVideo-Ego4D is an effective paradigm to adapt the strong foundation model to the downstream ego-centric video understanding tasks with simple head designs. In these five tasks, the performance of InternVideo-Ego4D comprehensively surpasses the baseline methods and the champions of CVPR2022, demonstrating the powerful representation ability of InternVideo as a video foundation model. Our code will be released at https://github.com/OpenGVLab/ego4d-eccv2022-solutions

Results

TaskDatasetMetricValueModel
State Change Object DetectionEgo4DAP37.19InternVideo
State Change Object DetectionEgo4DAP5055.97InternVideo
State Change Object DetectionEgo4DAP7538.44InternVideo
Short-term Object Interaction AnticipationEgo4DNoun (Top5 mAP)24.6InternVideo
Short-term Object Interaction AnticipationEgo4DNoun+TTC (Top5 mAP)7.64InternVideo
Short-term Object Interaction AnticipationEgo4DNoun+Verb(Top5 mAP)9.18InternVideo
Short-term Object Interaction AnticipationEgo4DOverall (Top5 mAP)3.4InternVideo
Future Hand PredictionEgo4DC.Disp(Left)53.33InternVideo
Future Hand PredictionEgo4DC.Disp(Right)53.37InternVideo
Future Hand PredictionEgo4DDisp(Total)196.8InternVideo
Future Hand PredictionEgo4DM.Disp(Left)43.25InternVideo
Future Hand PredictionEgo4DM.Disp(Right)46.25InternVideo
Natural Language QueriesEgo4DR@1 IoU=0.316.45InternVideo
Natural Language QueriesEgo4DR@1 IoU=0.510.06InternVideo
Natural Language QueriesEgo4DR@1 Mean(0.3 and 0.5)13.26InternVideo
Natural Language QueriesEgo4DR@5 IoU=0.322.95InternVideo
Natural Language QueriesEgo4DR@5 IoU=0.516.1InternVideo

Related Papers

VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks2025-07-15Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15