TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/ViT

ViT

Reported on 30 benchmarks across 13 tasks · 6 papers · 9 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing16 results

  • Visual Question Answering (VQA)onIconQA
    Reasoning (Sce.)· 2021-10-25
    68.8
    SOTA
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Alg.)· 2021-10-25
    51.1
    best: 56.73 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Com.)· 2021-10-25
    82.12
    best: 87 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Cou.)· 2021-10-25
    70.84
    best: 77.81 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Est.)· 2021-10-25
    98.95
    best: 99.54 (Top-Down)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Fra.)· 2021-10-25
    77.41
    best: 82.13 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Geo.)· 2021-10-25
    82.6
    best: 82.61 (ViLT)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Mea.)· 2021-10-25
    98.76
    best: 99.46 (Top-Down)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Pat.)· 2021-10-25
    58.46
    best: 68.75 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Pro.)· 2021-10-25
    86.07
    best: 95.73 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Sen.)· 2021-10-25
    84.72
    best: 92.49 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Spa.)· 2021-10-25
    54.64
    best: 55.62 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Tim.)· 2021-10-25
    68.66
    best: 77.98 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Sub-tasks (Blank)· 2021-10-25
    78.92
    best: 83.62 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Sub-tasks (Img.)· 2021-10-25
    79.15
    best: 82.66 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Sub-tasks (Txt.)· 2021-10-25
    72.34
    best: 75.19 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214

Computer Vision7 results

  • Parking Space OccupancyonAction-Camera Parking
    F1· 2023-06-07
    0.8152
    SOTA
    Revising deep learning methods in parking lot occupancy detectionarXiv:2306.04288
  • Face ReconstructiononJAFFE
    Accuracy· 2021-07-07
    94.83
    best: 99.52 (TL)
    SOTA
    Learning Vision Transformer with Squeeze and Excitation for Facial Expression RecognitionarXiv:2107.03107
  • Facial Expression Recognition (FER)onJAFFE
    Accuracy· 2021-07-07
    94.83
    best: 99.52 (TL)
    SOTA
    Learning Vision Transformer with Squeeze and Excitation for Facial Expression RecognitionarXiv:2107.03107
  • 3D Face ReconstructiononJAFFE
    Accuracy· 2021-07-07
    94.83
    best: 99.52 (TL)
    SOTA
    Learning Vision Transformer with Squeeze and Excitation for Facial Expression RecognitionarXiv:2107.03107
  • Parking Space OccupancyonSPKL
    F1-score· 2023-06-07
    0.7335
    best: 0.7393 (EfficientNet-P)
    Revising deep learning methods in parking lot occupancy detectionarXiv:2306.04288
  • Parking Space OccupancyonCNRPark+EXT
    F1-score· 2023-06-07
    0.9176
    best: 0.9683 (EfficientNet-P)
    Revising deep learning methods in parking lot occupancy detectionarXiv:2306.04288
  • Image ClassificationonObjectNet
    Top-1 Accuracy· uses extra data· 2021-11-30
    17.36
    best: 82.7 (CoCa)
    Pyramid Adversarial Training Improves ViT PerformancearXiv:2111.15121

Music1 result

  • Facial Recognition and ModellingonJAFFE
    Accuracy· 2021-07-07
    94.83
    best: 99.52 (TL)
    SOTA
    Learning Vision Transformer with Squeeze and Excitation for Facial Expression RecognitionarXiv:2107.03107

Methodology1 result

  • 3DonJAFFE
    Accuracy· 2021-07-07
    94.83
    best: 99.52 (TL)
    SOTA
    Learning Vision Transformer with Squeeze and Excitation for Facial Expression RecognitionarXiv:2107.03107

Medical1 result

  • 3D Face ModellingonJAFFE
    Accuracy· 2021-07-07
    94.83
    best: 99.52 (TL)
    SOTA
    Learning Vision Transformer with Squeeze and Excitation for Facial Expression RecognitionarXiv:2107.03107

Time Series1 result

  • Time Series AnalysisonSpeech Commands
    % Test Accuracy· 2021-04-05
    98.11
    best: 98.51 (SepTr)
    SOTA
    AST: Audio Spectrogram TransformerarXiv:2104.01778

Reasoning1 result

  • Visual ReasoningonVASR
    1:1 Accuracy· 2022-12-08
    50.3
    best: 52.9 (Swin)
    VASR: Visual Analogies of Situation RecognitionarXiv:2212.04542

Audio1 result

  • Emotion RecognitiononCREMA-D
    Accuracy· 2021-04-05
    67.81
    best: 94.07 (Vertically long patch ViT)
    AST: Audio Spectrogram TransformerarXiv:2104.01778

Speech1 result

  • Speech Emotion RecognitiononCREMA-D
    Accuracy· 2021-04-05
    67.81
    best: 94.07 (Vertically long patch ViT)
    AST: Audio Spectrogram TransformerarXiv:2104.01778