TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/CRNN

CRNN

Reported on 41 benchmarks across 7 tasks · 4 papers · 20 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Speech22 results

  • DialogueonYouTube News dataset (No Noise)
    Accuracy · 2021-10-05
    0.967
    SOTA
    Is Attention always needed? A Case Study on Language Identification from SpeecharXiv:2110.03427
  • DialogueonIndicTTS
    Classification Accuracy· 2021-10-05
    0.987
    SOTA
    Is Attention always needed? A Case Study on Language Identification from SpeecharXiv:2110.03427
  • DialogueonYouTube News dataset (White Noise)
    Accuracy · 2021-10-05
    0.912
    SOTA
    Is Attention always needed? A Case Study on Language Identification from SpeecharXiv:2110.03427
  • Spoken Language UnderstandingonYouTube News dataset (No Noise)
    Accuracy · 2021-10-05
    0.967
    SOTA
    Is Attention always needed? A Case Study on Language Identification from SpeecharXiv:2110.03427
  • Spoken Language UnderstandingonIndicTTS
    Classification Accuracy· 2021-10-05
    0.987
    SOTA
    Is Attention always needed? A Case Study on Language Identification from SpeecharXiv:2110.03427
  • Spoken Language UnderstandingonYouTube News dataset (White Noise)
    Accuracy · 2021-10-05
    0.912
    SOTA
    Is Attention always needed? A Case Study on Language Identification from SpeecharXiv:2110.03427
  • DialogueonYouTube News dataset (Background Music)
    Accuracy · 2017-08-16
    0.7
    best: 0.89 (Inception-v3 CRNN)
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • DialogueonYouTube News dataset (Background Music)
    F1 Score· 2017-08-16
    0.7
    best: 0.89 (Inception-v3 CRNN)
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • DialogueonYouTube News dataset (No Noise)
    Accuracy · 2017-08-16
    0.91
    best: 0.967
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • DialogueonYouTube News dataset (No Noise)
    F1 Score· 2017-08-16
    0.91
    best: 0.96 (Inception-v3 CRNN)
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • DialogueonYouTube News dataset (Crackling Noise)
    Accuracy · 2017-08-16
    0.82
    best: 0.93 (Inception-v3 CRNN)
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • DialogueonYouTube News dataset (Crackling Noise)
    F1 Score· 2017-08-16
    0.83
    best: 0.93 (Inception-v3 CRNN)
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • DialogueonYouTube News dataset (White Noise)
    Accuracy · 2017-08-16
    0.63
    best: 0.912
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • DialogueonYouTube News dataset (White Noise)
    F1 Score· 2017-08-16
    0.63
    best: 0.91 (Inception-v3 CRNN)
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • Spoken Language UnderstandingonYouTube News dataset (Background Music)
    Accuracy · 2017-08-16
    0.7
    best: 0.89 (Inception-v3 CRNN)
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • Spoken Language UnderstandingonYouTube News dataset (Background Music)
    F1 Score· 2017-08-16
    0.7
    best: 0.89 (Inception-v3 CRNN)
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • Spoken Language UnderstandingonYouTube News dataset (No Noise)
    Accuracy · 2017-08-16
    0.91
    best: 0.967
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • Spoken Language UnderstandingonYouTube News dataset (No Noise)
    F1 Score· 2017-08-16
    0.91
    best: 0.96 (Inception-v3 CRNN)
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • Spoken Language UnderstandingonYouTube News dataset (Crackling Noise)
    Accuracy · 2017-08-16
    0.82
    best: 0.93 (Inception-v3 CRNN)
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • Spoken Language UnderstandingonYouTube News dataset (Crackling Noise)
    F1 Score· 2017-08-16
    0.83
    best: 0.93 (Inception-v3 CRNN)
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • Spoken Language UnderstandingonYouTube News dataset (White Noise)
    Accuracy · 2017-08-16
    0.63
    best: 0.912
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • Spoken Language UnderstandingonYouTube News dataset (White Noise)
    F1 Score· 2017-08-16
    0.63
    best: 0.91 (Inception-v3 CRNN)
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811

Natural Language Processing11 results

  • Dialogue UnderstandingonYouTube News dataset (No Noise)
    Accuracy · 2021-10-05
    0.967
    SOTA
    Is Attention always needed? A Case Study on Language Identification from SpeecharXiv:2110.03427
  • Dialogue UnderstandingonIndicTTS
    Classification Accuracy· 2021-10-05
    0.987
    SOTA
    Is Attention always needed? A Case Study on Language Identification from SpeecharXiv:2110.03427
  • Dialogue UnderstandingonYouTube News dataset (White Noise)
    Accuracy · 2021-10-05
    0.912
    SOTA
    Is Attention always needed? A Case Study on Language Identification from SpeecharXiv:2110.03427
  • Dialogue UnderstandingonYouTube News dataset (Background Music)
    Accuracy · 2017-08-16
    0.7
    best: 0.89 (Inception-v3 CRNN)
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • Dialogue UnderstandingonYouTube News dataset (Background Music)
    F1 Score· 2017-08-16
    0.7
    best: 0.89 (Inception-v3 CRNN)
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • Dialogue UnderstandingonYouTube News dataset (No Noise)
    Accuracy · 2017-08-16
    0.91
    best: 0.967
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • Dialogue UnderstandingonYouTube News dataset (No Noise)
    F1 Score· 2017-08-16
    0.91
    best: 0.96 (Inception-v3 CRNN)
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • Dialogue UnderstandingonYouTube News dataset (Crackling Noise)
    Accuracy · 2017-08-16
    0.82
    best: 0.93 (Inception-v3 CRNN)
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • Dialogue UnderstandingonYouTube News dataset (Crackling Noise)
    F1 Score· 2017-08-16
    0.83
    best: 0.93 (Inception-v3 CRNN)
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • Dialogue UnderstandingonYouTube News dataset (White Noise)
    Accuracy · 2017-08-16
    0.63
    best: 0.912
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811
  • Dialogue UnderstandingonYouTube News dataset (White Noise)
    F1 Score· 2017-08-16
    0.63
    best: 0.91 (Inception-v3 CRNN)
    Language Identification Using Deep Convolutional Recurrent Neural NetworksarXiv:1708.04811

Audio8 results

  • Sound Event DetectiononWildDESED
    PSDS1 (10dB)· 2024-07-04
    0.222
    best: 0.356 (CRNN (with BEATs + Separation))
    SOTA
    WildDESED: An LLM-Powered Dataset for Wild Domestic Environment Sound Event Detection SystemarXiv:2407.03656
  • Sound Event DetectiononWildDESED
    PSDS1 (Clean)· 2024-07-04
    0.348
    best: 0.5 (CRNN (with BEATs))
    SOTA
    WildDESED: An LLM-Powered Dataset for Wild Domestic Environment Sound Event Detection SystemarXiv:2407.03656
  • 2D Semantic SegmentationonSVT
    Accuracy· 2015-07-21
    80.8
    best: 99.1 (CLIP4STR-H (DFN-5B))
    SOTA
    An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text RecognitionarXiv:1507.05717
  • 2D Semantic SegmentationonICDAR 2003
    Accuracy· 2015-07-21
    89.4
    best: 97.1 (Yet Another Text Recognizer)
    SOTA
    An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text RecognitionarXiv:1507.05717
  • 2D Semantic SegmentationonICDAR2013
    Accuracy· 2015-07-21
    86.7
    best: 99.42 (CLIP4STR-L*)
    SOTA
    An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text RecognitionarXiv:1507.05717
  • Sound Event DetectiononWildDESED
    PSDS1 (-5dB)· 2024-07-04
    0.017
    best: 0.134 (CRNN (with BEATs + Separation))
    WildDESED: An LLM-Powered Dataset for Wild Domestic Environment Sound Event Detection SystemarXiv:2407.03656
  • Sound Event DetectiononWildDESED
    PSDS1 (0dB)· 2024-07-04
    0.064
    best: 0.219 (CRNN (with BEATs + Separation))
    WildDESED: An LLM-Powered Dataset for Wild Domestic Environment Sound Event Detection SystemarXiv:2407.03656
  • Sound Event DetectiononWildDESED
    PSDS1 (5dB)· 2024-07-04
    0.148
    best: 0.291 (CRNN (with BEATs + Separation))
    WildDESED: An LLM-Powered Dataset for Wild Domestic Environment Sound Event Detection SystemarXiv:2407.03656

Computer Vision6 results

  • Scene ParsingonSVT
    Accuracy· 2015-07-21
    80.8
    best: 99.1 (CLIP4STR-H (DFN-5B))
    SOTA
    An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text RecognitionarXiv:1507.05717
  • Scene ParsingonICDAR 2003
    Accuracy· 2015-07-21
    89.4
    best: 97.1 (Yet Another Text Recognizer)
    SOTA
    An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text RecognitionarXiv:1507.05717
  • Scene ParsingonICDAR2013
    Accuracy· 2015-07-21
    86.7
    best: 99.42 (CLIP4STR-L*)
    SOTA
    An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text RecognitionarXiv:1507.05717
  • Scene Text RecognitiononSVT
    Accuracy· 2015-07-21
    80.8
    best: 99.1 (CLIP4STR-H (DFN-5B))
    SOTA
    An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text RecognitionarXiv:1507.05717
  • Scene Text RecognitiononICDAR 2003
    Accuracy· 2015-07-21
    89.4
    best: 97.1 (Yet Another Text Recognizer)
    SOTA
    An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text RecognitionarXiv:1507.05717
  • Scene Text RecognitiononICDAR2013
    Accuracy· 2015-07-21
    86.7
    best: 99.42 (CLIP4STR-L*)
    SOTA
    An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text RecognitionarXiv:1507.05717