TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/LSTM

LSTM

Reported on 65 benchmarks across 29 tasks · 22 papers · 16 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing30 results

  • Data-to-Text GenerationonCleaned E2E NLG Challenge
    METEOR (Validation set)· 2021-02-02
    0.394
    SOTA
    The GEM Benchmark: Natural Language Generation, its Evaluation and MetricsarXiv:2102.01672
  • Sentiment AnalysisonTweetEval
    Hate· 2020-10-23
    52.6
    SOTA
    TweetEval: Unified Benchmark and Comparative Evaluation for Tweet ClassificationarXiv:2010.12421
  • Japanese Word SegmentationonBCCWJ
    F1-score (Word)· 2017-09-23
    0.9842
    best: 0.9936 (LATTE (Linguistic units, lattices, PTMs, GNNs))
    SOTA
    Long Short-Term Memory for Japanese Word SegmentationarXiv:1709.08011
  • Sentiment AnalysisonTweetEval
    ALL· 2020-10-23
    56.5
    best: 67.9 (BERTweet)
    TweetEval: Unified Benchmark and Comparative Evaluation for Tweet ClassificationarXiv:2010.12421
  • Sentiment AnalysisonTweetEval
    Emoji· 2020-10-23
    24.7
    best: 33.4 (BERTweet)
    TweetEval: Unified Benchmark and Comparative Evaluation for Tweet ClassificationarXiv:2010.12421
  • Sentiment AnalysisonTweetEval
    Emotion· 2020-10-23
    66
    best: 79.5 (RoB-RT)
    TweetEval: Unified Benchmark and Comparative Evaluation for Tweet ClassificationarXiv:2010.12421
  • Sentiment AnalysisonTweetEval
    Irony· 2020-10-23
    62.8
    best: 82.1 (BERTweet)
    TweetEval: Unified Benchmark and Comparative Evaluation for Tweet ClassificationarXiv:2010.12421
  • Sentiment AnalysisonTweetEval
    Offensive· 2020-10-23
    71.7
    best: 80.5 (RoB-RT)
    TweetEval: Unified Benchmark and Comparative Evaluation for Tweet ClassificationarXiv:2010.12421
  • Sentiment AnalysisonTweetEval
    Sentiment· 2020-10-23
    58.3
    best: 73.4 (BERTweet)
    TweetEval: Unified Benchmark and Comparative Evaluation for Tweet ClassificationarXiv:2010.12421
  • Sentiment AnalysisonTweetEval
    Stance· 2020-10-23
    59.4
    best: 71.2 (BERTweet)
    TweetEval: Unified Benchmark and Comparative Evaluation for Tweet ClassificationarXiv:2010.12421
  • Question AnsweringonMathematics Dataset
    Accuracy· 2019-04-02
    0.57
    best: 0.8192 (TP-Transformer)
    Analysing Mathematical Reasoning Abilities of Neural ModelsarXiv:1904.01557
  • Question AnsweringonYahooCQA
    MRR· 2017-07-25
    0.669
    best: 0.863 (sMIM (1024) +)
    Hyperbolic Representation Learning for Fast and Efficient Neural Question AnsweringarXiv:1707.07847
  • Question AnsweringonYahooCQA
    P@1· 2017-07-25
    0.465
    best: 0.757 (sMIM (1024) +)
    Hyperbolic Representation Learning for Fast and Efficient Neural Question AnsweringarXiv:1707.07847
  • Question AnsweringonQASent
    MAP· 2015-11-19
    0.6436
    best: 0.7339 (Attentive LSTM)
    Neural Variational Inference for Text ProcessingarXiv:1511.06038
  • Question AnsweringonQASent
    MRR· 2015-11-19
    0.7235
    best: 0.8117 (Attentive LSTM)
    Neural Variational Inference for Text ProcessingarXiv:1511.06038
  • Question AnsweringonWikiQA
    MAP· 2015-11-19
    0.6552
    best: 0.927 (TANDA-DeBERTa-V3-Large + ALL)
    Neural Variational Inference for Text ProcessingarXiv:1511.06038
  • Question AnsweringonWikiQA
    MRR· 2015-11-19
    0.6747
    best: 0.939 (TANDA-DeBERTa-V3-Large + ALL)
    Neural Variational Inference for Text ProcessingarXiv:1511.06038
  • Machine TranslationonWMT2014 English-French
    BLEU score· 2014-09-10
    34.8
    best: 46.4 (Transformer+BT (ADMIN init))
    Sequence to Sequence Learning with Neural NetworksarXiv:1409.3215
  • Visual Question Answering (VQA)onGQA Test2019
    Accuracy
    41.07
    best: 89.3 (human)
  • Visual Question Answering (VQA)onGQA Test2019
    Binary
    61.9
    best: 91.2 (human)
  • Visual Question Answering (VQA)onGQA Test2019
    Consistency
    68.68
    best: 98.4 (human)
  • Visual Question Answering (VQA)onGQA Test2019
    Distribution
    17.93
    best: 93.08 (GlobalPrior)
  • Visual Question Answering (VQA)onGQA Test2019
    Open
    22.69
    best: 87.4 (human)
  • Visual Question Answering (VQA)onGQA Test2019
    Plausibility
    87.3
    best: 97.2 (human)
  • Visual Question Answering (VQA)onGQA Test2019
    Validity
    96.39
    best: 98.9 (human)
  • Text SimplificationonWikiLargeFR
    SARI
    39.05
    best: 39.23 (mT5 (fine-tuned on MULTI-SIM))
  • Sentence EmbeddingsonGoogle Dataset
    CR
    0.38
    best: 0.43 (BiLSTM)
  • Sentence EmbeddingsonGoogle Dataset
    F1
    0.82
    best: 0.855 (SLAHAN (LSTM+syntactic-information))
  • Sentence CompressiononGoogle Dataset
    CR
    0.38
    best: 0.43 (BiLSTM)
  • Sentence CompressiononGoogle Dataset
    F1
    0.82
    best: 0.855 (SLAHAN (LSTM+syntactic-information))

Time Series16 results

  • Fault DiagnosisonDigital twin-supported deep learning for fault diagnosis
    Accuray· 2024-11-02
    61.56
    best: 80.22 (DANN)
    SOTA
    Use Digital Twins to Support Fault Diagnosis From System-level Condition-monitoring DataarXiv:2411.01360
  • Time Series AnalysisonFinSen
    Mean MSE· 2024-08-02
    0.01
    SOTA
    Enhancing Financial Market Predictions: Causality-Driven Feature SelectionarXiv:2408.01005
  • Time Series RegressiononFinSen
    Mean MSE· 2024-08-02
    0.01
    SOTA
    Enhancing Financial Market Predictions: Causality-Driven Feature SelectionarXiv:2408.01005
  • Trajectory PredictiononVi-Fi Multi-modal Dataset
    MSE-D· 2024-04-02
    58.31
    best: 13.42 (OOSTraj)
    SOTA
    OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning DenoisingarXiv:2404.02227
  • Trajectory PredictiononVi-Fi Multi-modal Dataset
    MSE-P· 2024-04-02
    57.7
    best: 13.83 (OOSTraj)
    SOTA
    OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning DenoisingarXiv:2404.02227
  • Time Series ForecastingonBPI challenge '12
    Accuracy· 2016-12-07
    0.76
    best: 0.79 (QuerySelector)
    SOTA
    Predictive Business Process Monitoring with LSTM Neural NetworksarXiv:1612.02130
  • Time Series ForecastingonHelpdesk
    Accuracy· 2016-12-07
    0.7123
    best: 0.743 (QuerySelector)
    SOTA
    Predictive Business Process Monitoring with LSTM Neural NetworksarXiv:1612.02130
  • Time Series AnalysisonBPI challenge '12
    Accuracy· 2016-12-07
    0.76
    best: 0.79 (QuerySelector)
    SOTA
    Predictive Business Process Monitoring with LSTM Neural NetworksarXiv:1612.02130
  • Time Series AnalysisonHelpdesk
    Accuracy· 2016-12-07
    0.7123
    best: 0.743 (QuerySelector)
    SOTA
    Predictive Business Process Monitoring with LSTM Neural NetworksarXiv:1612.02130
  • Multivariate Time Series ForecastingonBPI challenge '12
    Accuracy· 2016-12-07
    0.76
    best: 0.79 (QuerySelector)
    SOTA
    Predictive Business Process Monitoring with LSTM Neural NetworksarXiv:1612.02130
  • Multivariate Time Series ForecastingonHelpdesk
    Accuracy· 2016-12-07
    0.7123
    best: 0.743 (QuerySelector)
    SOTA
    Predictive Business Process Monitoring with LSTM Neural NetworksarXiv:1612.02130
  • Trajectory PredictiononVi-Fi Multi-modal Dataset
    SUM· 2024-04-02
    116.01
    best: 200.9 (ViTag)
    OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning DenoisingarXiv:2404.02227
  • Time Series PredictiononSunspot
    RMSE· uses extra data
    0
  • Time Series AnalysisonSunspot
    RMSE· uses extra data
    0
  • Fire DetectiononNIST Report of Test FR 4016
    F1-Score
    0.86
    best: 0.93 (rTPNN)
  • Fire DetectiononNIST Report of Test FR 4016
    MCC
    0.82
    best: 0.9 (rTPNN)

Medical5 results

  • Language ModellingonWikiText-103
    Test perplexity· 2016-12-13
    48.7
    best: 2.4 (RETRO (7.5B))
    SOTA
    Improving Neural Language Models with a Continuous CachearXiv:1612.04426
  • Language ModellingonWikiText-103
    Validation perplexity· 2020-05-17
    52.73
    best: 13.11 (Ensemble of All)
    How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?arXiv:2005.08199
  • Language Modellingonenwik8
    Bit per Character (BPC)· 2019-09-04
    1.195
    best: 1.67 (LSTM (7 layers))
    Mogrifier LSTMarXiv:1909.01792
  • Language ModellingonWikiText-103
    Test perplexity· 2018-03-27
    36.4
    best: 2.4 (RETRO (7.5B))
    Fast Parametric Learning with Activation MemorizationarXiv:1803.10049
  • Language ModellingonWikiText-103
    Validation perplexity· 2018-03-27
    36
    best: 13.11 (Ensemble of All)
    Fast Parametric Learning with Activation MemorizationarXiv:1803.10049

Speech4 results

  • Keyword SpottingonGoogle Speech Commands
    Google Speech Commands V2 20· 2019-07-10
    93.72
    best: 97.8 (Wav2KWS)
    Multi-layer Attention Mechanism for Speech Keyword RecognitionarXiv:1907.04536
  • Keyword SpottingonGoogle Speech Commands
    Google Speech Commands V1 12· uses extra data· 2017-11-20
    92.9
    best: 98.56 (TripletLoss-res15)
    Hello Edge: Keyword Spotting on MicrocontrollersarXiv:1711.07128
  • Speech Emotion RecognitiononQuechua-SER
    CCC (Arousal)
    0.764
  • Speech Emotion RecognitiononQuechua-SER
    CCC (Valence)
    0.648

Methodology3 results

  • ClassificationonSHD - Adding
    Accuracy (%)· 2023-06-14
    10
    best: 82 (ELM Neuron)
    The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon TasksarXiv:2306.16922
  • Representation LearningonGoogle Dataset
    CR
    0.38
    best: 0.43 (BiLSTM)
  • Representation LearningonGoogle Dataset
    F1
    0.82
    best: 0.855 (SLAHAN (LSTM+syntactic-information))

Audio3 results

  • Speech RecognitiononTIMIT
    Percentage error· 2018-11-19
    16
    best: 8.3 (wav2vec 2.0)
    The PyTorch-Kaldi Speech Recognition ToolkitarXiv:1811.07453
  • Emotion RecognitiononQuechua-SER
    CCC (Arousal)
    0.764
  • Emotion RecognitiononQuechua-SER
    CCC (Valence)
    0.648

Computer Vision2 results

  • Gesture RecognitiononDVS128 Gesture
    Accuracy (%)· 2020-05-02
    86.81
    best: 100 (TENNs-PLEIADES)
    Comparing SNNs and RNNs on Neuromorphic Vision Datasets: Similarities and DifferencesarXiv:2005.02183
  • Image Classificationonnoise padded CIFAR-10
    % Test Accuracy· 2019-02-26
    11.6
    best: 62.4 (UnICORNN)
    AntisymmetricRNN: A Dynamical System View on Recurrent Neural NetworksarXiv:1902.09689

Knowledge Base2 results

  • Text SummarizationonGoogle Dataset
    CR
    0.38
    best: 0.43 (BiLSTM)
  • Text SummarizationonGoogle Dataset
    F1
    0.82
    best: 0.855 (SLAHAN (LSTM+syntactic-information))

Adversarial1 result

  • Text GenerationonCleaned E2E NLG Challenge
    METEOR (Validation set)· 2021-02-02
    0.394
    SOTA
    The GEM Benchmark: Natural Language Generation, its Evaluation and MetricsarXiv:2102.01672

Music1 result

  • Music ModelingonNottingham
    NLL· 2018-03-04
    3.29
    best: 4.05 (RNN)
    An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence ModelingarXiv:1803.01271