TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Text4Vis

Text4Vis

Reported on 7 benchmarks across 3 tasks · 1 paper · 5 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision5 results

  • Zero-Shot Action RecognitiononUCF101
    Top-1 Accuracy· 2022-07-04
    85.8
    best: 92.8 (OTI(ViT-L/14))
    SOTA
    Revisiting Classifier: Transferring Vision-Language Models for Video RecognitionarXiv:2207.01297
  • Zero-Shot Action RecognitiononKinetics
    Top-1 Accuracy· 2022-07-04
    68.9
    best: 78.1 (TC-CLIP)
    SOTA
    Revisiting Classifier: Transferring Vision-Language Models for Video RecognitionarXiv:2207.01297
  • Zero-Shot Action RecognitiononKinetics
    Top-5 Accuracy· 2022-07-04
    90.3
    best: 95.7 (TC-CLIP)
    SOTA
    Revisiting Classifier: Transferring Vision-Language Models for Video RecognitionarXiv:2207.01297
  • Zero-Shot Action RecognitiononHMDB51
    Top-1 Accuracy· 2022-07-04
    58.4
    best: 64.7 (MOV (ViT-L/14))
    SOTA
    Revisiting Classifier: Transferring Vision-Language Models for Video RecognitionarXiv:2207.01297
  • Zero-Shot Action RecognitiononActivityNet
    Top-1 Accuracy· 2022-07-04
    84.6
    best: 86.2 (BIKE)
    SOTA
    Revisiting Classifier: Transferring Vision-Language Models for Video RecognitionarXiv:2207.01297

Robots1 result

  • Activity RecognitiononUCF101
    3-fold Accuracy· 2022-07-04
    98.2
    best: 99.7 (FTP-UniFormerV2-L/14)
    Revisiting Classifier: Transferring Vision-Language Models for Video RecognitionarXiv:2207.01297

Time Series1 result

  • Action RecognitiononUCF101
    3-fold Accuracy· 2022-07-04
    98.2
    best: 99.7 (FTP-UniFormerV2-L/14)
    Revisiting Classifier: Transferring Vision-Language Models for Video RecognitionarXiv:2207.01297