TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Baseline

Baseline

Reported on 64 benchmarks across 32 tasks · 10 papers · 26 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing27 results

  • Dialogue Understandingonrt-inod-jailbreaking
    Best-of· 2024-04-15
    0.92
    SOTA
    Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for HallucinationsarXiv:2404.09785
  • Dialogue UnderstandingonTimers and Such
    Accuracy (%)· uses extra data· 2021-04-04
    81.6
    best: 95.4 (Finstreder (Conformer))
    SOTA
    Timers and Such: A Practical Benchmark for Spoken Language Understanding with NumbersarXiv:2104.01604
  • Relation ExtractiononWNUT 2020
    F1· 2020-10-27
    72.5
    SOTA
    WNUT-2020 Task 1 Overview: Extracting Entities and Relations from Wet Lab ProtocolsarXiv:2010.14576
  • Relation ExtractiononWNUT 2020
    Precision· 2020-10-27
    80.1
    SOTA
    WNUT-2020 Task 1 Overview: Extracting Entities and Relations from Wet Lab ProtocolsarXiv:2010.14576
  • Relation ExtractiononWNUT 2020
    Recall· 2020-10-27
    66.21
    SOTA
    WNUT-2020 Task 1 Overview: Extracting Entities and Relations from Wet Lab ProtocolsarXiv:2010.14576
  • Named Entity Recognition (NER)onWNUT 2020
    F1· 2020-10-27
    65.73
    best: 76.6 (mgsohrab)
    SOTA
    WNUT-2020 Task 1 Overview: Extracting Entities and Relations from Wet Lab ProtocolsarXiv:2010.14576
  • Named Entity Recognition (NER)onWNUT 2020
    Precision· 2020-10-27
    70.06
    SOTA
    WNUT-2020 Task 1 Overview: Extracting Entities and Relations from Wet Lab ProtocolsarXiv:2010.14576
  • Named Entity Recognition (NER)onWNUT 2020
    Recall· 2020-10-27
    61.91
    SOTA
    WNUT-2020 Task 1 Overview: Extracting Entities and Relations from Wet Lab ProtocolsarXiv:2010.14576
  • Abuse DetectiononDKhate
    F1· 2019-08-13
    0.7
    SOTA
    Offensive Language and Hate Speech Detection for DanisharXiv:1908.04531
  • Hate Speech DetectiononDKhate
    F1· 2019-08-13
    0.7
    SOTA
    Offensive Language and Hate Speech Detection for DanisharXiv:1908.04531
  • Dialogue UnderstandingonSpoken-SQuAD
    F1 score· 2018-04-01
    58.71
    best: 77.1 (ALBERT)
    SOTA
    Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening ComprehensionarXiv:1804.00320
  • Bias Detectiononrt-inod-bias
    Best-of· 2024-04-15
    0.41
    best: 0.5 (GPT-4)
    Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for HallucinationsarXiv:2404.09785
  • Aspect Category DetectiononAWARE
    F1-score
    0.32
  • Argument MiningonValNov Subtask B
    JOINT-F1
    21.46
    best: 41.5 (NLP@UIT)
  • Argument MiningonValNov Subtask B
    NOV-F1
    23.09
    best: 38.39 (NLP@UIT)
  • Argument MiningonValNov Subtask B
    VAL-F1
    19.82
    best: 44.6 (NLP@UIT)
  • Argument MiningonValNov Subtask A
    JOINT-F1
    23.9
    best: 45.16 (CLTeamL-3)
  • Argument MiningonValNov Subtask A
    NOV-F1
    36.12
    best: 70 (ACCEPT-1)
  • Argument MiningonValNov Subtask A
    VAL-F1
    59.96
    best: 74.64 (CLTeamL-3)
  • Aspect Category PolarityonAWARE
    Accuracy (%)
    67
  • Term ExtractiononAWARE
    F1-Score
    0.82
  • ValNovonValNov Subtask B
    JOINT-F1
    21.46
    best: 41.5 (NLP@UIT)
  • ValNovonValNov Subtask B
    NOV-F1
    23.09
    best: 38.39 (NLP@UIT)
  • ValNovonValNov Subtask B
    VAL-F1
    19.82
    best: 44.6 (NLP@UIT)
  • ValNovonValNov Subtask A
    JOINT-F1
    23.9
    best: 45.16 (CLTeamL-3)
  • ValNovonValNov Subtask A
    NOV-F1
    36.12
    best: 70 (ACCEPT-1)
  • ValNovonValNov Subtask A
    VAL-F1
    59.96
    best: 74.64 (CLTeamL-3)

Methodology18 results

  • 3DonCityscapes
    mPC [AP]· 2015-06-04
    15.4
    best: 27.4 (FGT (SD-1.5 Backbone))
    SOTA
    Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksarXiv:1506.01497
  • 2D ClassificationonCityscapes
    mPC [AP]· 2015-06-04
    15.4
    best: 27.4 (FGT (SD-1.5 Backbone))
    SOTA
    Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksarXiv:1506.01497
  • 2D Object DetectiononCityscapes
    mPC [AP]· 2015-06-04
    15.4
    best: 27.4 (FGT (SD-1.5 Backbone))
    SOTA
    Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksarXiv:1506.01497
  • 16konCityscapes
    mPC [AP]· 2015-06-04
    15.4
    best: 27.4 (FGT (SD-1.5 Backbone))
    SOTA
    Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksarXiv:1506.01497
  • Multi-Label ClassificationonCheXpert
    NUM RADS BELOW CURVE· 2019-11-04
    0.2
    best: 3 (inisis)
    pyannote.audio: neural building blocks for speaker diarizationarXiv:1911.01255
  • Data MiningonValNov Subtask B
    JOINT-F1
    21.46
    best: 41.5 (NLP@UIT)
  • Data MiningonValNov Subtask B
    NOV-F1
    23.09
    best: 38.39 (NLP@UIT)
  • Data MiningonValNov Subtask B
    VAL-F1
    19.82
    best: 44.6 (NLP@UIT)
  • Data MiningonValNov Subtask A
    JOINT-F1
    23.9
    best: 45.16 (CLTeamL-3)
  • Data MiningonValNov Subtask A
    NOV-F1
    36.12
    best: 70 (ACCEPT-1)
  • Data MiningonValNov Subtask A
    VAL-F1
    59.96
    best: 74.64 (CLTeamL-3)
  • Multi-Label ClassificationonCheXpert
    AVERAGE AUC ON 14 LABEL
    0.848
    best: 0.933 (CFT (ensemble) Macao Polytechnic University)
  • Interpretable Machine LearningonValNov Subtask B
    JOINT-F1
    21.46
    best: 41.5 (NLP@UIT)
  • Interpretable Machine LearningonValNov Subtask B
    NOV-F1
    23.09
    best: 38.39 (NLP@UIT)
  • Interpretable Machine LearningonValNov Subtask B
    VAL-F1
    19.82
    best: 44.6 (NLP@UIT)
  • Interpretable Machine LearningonValNov Subtask A
    JOINT-F1
    23.9
    best: 45.16 (CLTeamL-3)
  • Interpretable Machine LearningonValNov Subtask A
    NOV-F1
    36.12
    best: 70 (ACCEPT-1)
  • Interpretable Machine LearningonValNov Subtask A
    VAL-F1
    59.96
    best: 74.64 (CLTeamL-3)

Speech8 results

  • Dialogueonrt-inod-jailbreaking
    Best-of· 2024-04-15
    0.92
    SOTA
    Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for HallucinationsarXiv:2404.09785
  • DialogueonTimers and Such
    Accuracy (%)· uses extra data· 2021-04-04
    81.6
    best: 95.4 (Finstreder (Conformer))
    SOTA
    Timers and Such: A Practical Benchmark for Spoken Language Understanding with NumbersarXiv:2104.01604
  • Spoken Language UnderstandingonTimers and Such
    Accuracy (%)· uses extra data· 2021-04-04
    81.6
    best: 95.4 (Finstreder (Conformer))
    SOTA
    Timers and Such: A Practical Benchmark for Spoken Language Understanding with NumbersarXiv:2104.01604
  • Speaker DiarizationonETAPE
    DER(%)· 2019-11-04
    7.7
    SOTA
    pyannote.audio: neural building blocks for speaker diarizationarXiv:1911.01255
  • Speaker DiarizationonETAPE
    FA· 2019-11-04
    7.5
    SOTA
    pyannote.audio: neural building blocks for speaker diarizationarXiv:1911.01255
  • DialogueonSpoken-SQuAD
    F1 score· 2018-04-01
    58.71
    best: 77.1 (ALBERT)
    SOTA
    Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening ComprehensionarXiv:1804.00320
  • Spoken Language UnderstandingonSpoken-SQuAD
    F1 score· 2018-04-01
    58.71
    best: 77.1 (ALBERT)
    SOTA
    Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening ComprehensionarXiv:1804.00320
  • Speaker DiarizationonETAPE
    Miss· 2019-11-04
    0.2
    best: 0.7 (pyannote (waveform))
    pyannote.audio: neural building blocks for speaker diarizationarXiv:1911.01255

Computer Vision6 results

  • Multi-Object TrackingonSynthehicle
    MOTA· 2022-08-30
    59.05
    SOTA
    Synthehicle: Multi-Vehicle Multi-Camera Tracking in Virtual CitiesarXiv:2208.14167
  • Object TrackingonSynthehicle
    MOTA· 2022-08-30
    59.05
    SOTA
    Synthehicle: Multi-Vehicle Multi-Camera Tracking in Virtual CitiesarXiv:2208.14167
  • Object DetectiononCityscapes
    mPC [AP]· 2015-06-04
    15.4
    best: 27.4 (FGT (SD-1.5 Backbone))
    SOTA
    Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksarXiv:1506.01497
  • Scene ParsingonSoccerNet-v2
    mIoU· 2020-11-26
    35.8
    best: 47.3 (CALF (Cioppa et al.))
    SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer VideosarXiv:2011.13367
  • Video Semantic SegmentationonSoccerNet-v2
    mIoU· 2020-11-26
    35.8
    best: 47.3 (CALF (Cioppa et al.))
    SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer VideosarXiv:2011.13367
  • Scene UnderstandingonSoccerNet-v2
    mIoU· 2020-11-26
    35.8
    best: 47.3 (CALF (Cioppa et al.))
    SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer VideosarXiv:2011.13367

Audio3 results

  • Hearing Aid and device processingonFMA
    HAAQI· 2023-10-09
    0.1256
    SOTA
    The First Cadenza Signal Processing Challenge: Improving Music for Those With a Hearing LossarXiv:2310.05799
  • 2D Semantic SegmentationonSoccerNet-v2
    mIoU· 2020-11-26
    35.8
    best: 47.3 (CALF (Cioppa et al.))
    SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer VideosarXiv:2011.13367
  • Sound Event DetectiononDESED
    event-based F1 score
    25.8
    best: 63.4 (ATST-SED)

Robots1 result

  • Activity RecognitiononActionNet-VE
    F-measure (%)
    90.27

Time Series1 result

  • Action RecognitiononActionNet-VE
    F-measure (%)
    90.27