TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/BIKE

BIKE

Reported on 12 benchmarks across 4 tasks · 1 paper · 4 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision6 results

  • Zero-Shot Action RecognitiononKinetics
    Top-5 Accuracy· 2022-12-31
    91.1
    best: 95.7 (TC-CLIP)
    SOTA
    Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language ModelsarXiv:2301.00182
  • Zero-Shot Action RecognitiononActivityNet
    Top-1 Accuracy· 2022-12-31
    86.2
    SOTA
    Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language ModelsarXiv:2301.00182
  • VideoonCharades
    MAP· 2022-12-31
    50.7
    best: 66.3 (TokenLearner)
    Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language ModelsarXiv:2301.00182
  • Zero-Shot Action RecognitiononUCF101
    Top-1 Accuracy· 2022-12-31
    86.6
    best: 92.8 (OTI(ViT-L/14))
    Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language ModelsarXiv:2301.00182
  • Zero-Shot Action RecognitiononKinetics
    Top-1 Accuracy· 2022-12-31
    68.5
    best: 78.1 (TC-CLIP)
    Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language ModelsarXiv:2301.00182
  • Zero-Shot Action RecognitiononHMDB51
    Top-1 Accuracy· 2022-12-31
    61.4
    best: 64.7 (MOV (ViT-L/14))
    Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language ModelsarXiv:2301.00182

Robots3 results

  • Activity RecognitiononUCF101
    3-fold Accuracy· uses extra data· 2022-12-31
    98.8
    best: 99.7 (FTP-UniFormerV2-L/14)
    SOTA
    Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language ModelsarXiv:2301.00182
  • Activity RecognitiononHMDB-51
    Average accuracy of 3 splits· uses extra data· 2022-12-31
    83.1
    best: 88.7 (VideoMAE V2-g)
    Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language ModelsarXiv:2301.00182
  • Activity RecognitiononActivityNet
    mAP· 2022-12-31
    96.1
    best: 96.9 (Text4Vis (w/ ViT-L))
    Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language ModelsarXiv:2301.00182

Time Series3 results

  • Action RecognitiononUCF101
    3-fold Accuracy· uses extra data· 2022-12-31
    98.8
    best: 99.7 (FTP-UniFormerV2-L/14)
    SOTA
    Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language ModelsarXiv:2301.00182
  • Action RecognitiononHMDB-51
    Average accuracy of 3 splits· uses extra data· 2022-12-31
    83.1
    best: 88.7 (VideoMAE V2-g)
    Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language ModelsarXiv:2301.00182
  • Action RecognitiononActivityNet
    mAP· 2022-12-31
    96.1
    best: 96.9 (Text4Vis (w/ ViT-L))
    Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language ModelsarXiv:2301.00182