Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/LSTM

LSTM

Reported on 65 benchmarks across 29 tasks · 22 papers · 16 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing30 results

Data-to-Text GenerationonCleaned E2E NLG Challenge
METEOR (Validation set)· 2021-02-02
0.394
SOTA
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics arXiv:2102.01672
Sentiment AnalysisonTweetEval
Hate· 2020-10-23
52.6
SOTA
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification arXiv:2010.12421
Japanese Word SegmentationonBCCWJ
F1-score (Word)· 2017-09-23
0.9842
best: 0.9936 (LATTE (Linguistic units, lattices, PTMs, GNNs))
SOTA
Long Short-Term Memory for Japanese Word Segmentation arXiv:1709.08011
Sentiment AnalysisonTweetEval
ALL· 2020-10-23
56.5
best: 67.9 (BERTweet)
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification arXiv:2010.12421
Sentiment AnalysisonTweetEval
Emoji· 2020-10-23
24.7
best: 33.4 (BERTweet)
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification arXiv:2010.12421
Sentiment AnalysisonTweetEval
Emotion· 2020-10-23
66
best: 79.5 (RoB-RT)
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification arXiv:2010.12421
Sentiment AnalysisonTweetEval
Irony· 2020-10-23
62.8
best: 82.1 (BERTweet)
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification arXiv:2010.12421
Sentiment AnalysisonTweetEval
Offensive· 2020-10-23
71.7
best: 80.5 (RoB-RT)
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification arXiv:2010.12421
Sentiment AnalysisonTweetEval
Sentiment· 2020-10-23
58.3
best: 73.4 (BERTweet)
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification arXiv:2010.12421
Sentiment AnalysisonTweetEval
Stance· 2020-10-23
59.4
best: 71.2 (BERTweet)
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification arXiv:2010.12421
Question AnsweringonMathematics Dataset
Accuracy· 2019-04-02
0.57
best: 0.8192 (TP-Transformer)
Analysing Mathematical Reasoning Abilities of Neural Models arXiv:1904.01557
Question AnsweringonYahooCQA
MRR· 2017-07-25
0.669
best: 0.863 (sMIM (1024) +)
Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering arXiv:1707.07847
Question AnsweringonYahooCQA
P@1· 2017-07-25
0.465
best: 0.757 (sMIM (1024) +)
Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering arXiv:1707.07847
Question AnsweringonQASent
MAP· 2015-11-19
0.6436
best: 0.7339 (Attentive LSTM)
Neural Variational Inference for Text Processing arXiv:1511.06038
Question AnsweringonQASent
MRR· 2015-11-19
0.7235
best: 0.8117 (Attentive LSTM)
Neural Variational Inference for Text Processing arXiv:1511.06038
Question AnsweringonWikiQA
MAP· 2015-11-19
0.6552
best: 0.927 (TANDA-DeBERTa-V3-Large + ALL)
Neural Variational Inference for Text Processing arXiv:1511.06038
Question AnsweringonWikiQA
MRR· 2015-11-19
0.6747
best: 0.939 (TANDA-DeBERTa-V3-Large + ALL)
Neural Variational Inference for Text Processing arXiv:1511.06038
Machine TranslationonWMT2014 English-French
BLEU score· 2014-09-10
34.8
best: 46.4 (Transformer+BT (ADMIN init))
Sequence to Sequence Learning with Neural Networks arXiv:1409.3215
Visual Question Answering (VQA)onGQA Test2019
Accuracy
41.07
best: 89.3 (human)
Visual Question Answering (VQA)onGQA Test2019
Binary
61.9
best: 91.2 (human)
Visual Question Answering (VQA)onGQA Test2019
Consistency
68.68
best: 98.4 (human)
Visual Question Answering (VQA)onGQA Test2019
Distribution
17.93
best: 93.08 (GlobalPrior)
Visual Question Answering (VQA)onGQA Test2019
Open
22.69
best: 87.4 (human)
Visual Question Answering (VQA)onGQA Test2019
Plausibility
87.3
best: 97.2 (human)
Visual Question Answering (VQA)onGQA Test2019
Validity
96.39
best: 98.9 (human)
Text SimplificationonWikiLargeFR
SARI
39.05
best: 39.23 (mT5 (fine-tuned on MULTI-SIM))
Sentence EmbeddingsonGoogle Dataset
CR
0.38
best: 0.43 (BiLSTM)
Sentence EmbeddingsonGoogle Dataset
F1
0.82
best: 0.855 (SLAHAN (LSTM+syntactic-information))
Sentence CompressiononGoogle Dataset
CR
0.38
best: 0.43 (BiLSTM)
Sentence CompressiononGoogle Dataset
F1
0.82
best: 0.855 (SLAHAN (LSTM+syntactic-information))

Time Series16 results

Fault DiagnosisonDigital twin-supported deep learning for fault diagnosis
Accuray· 2024-11-02
61.56
best: 80.22 (DANN)
SOTA
Use Digital Twins to Support Fault Diagnosis From System-level Condition-monitoring Data arXiv:2411.01360
Time Series AnalysisonFinSen
Mean MSE· 2024-08-02
0.01
SOTA
Enhancing Financial Market Predictions: Causality-Driven Feature Selection arXiv:2408.01005
Time Series RegressiononFinSen
Mean MSE· 2024-08-02
0.01
SOTA
Enhancing Financial Market Predictions: Causality-Driven Feature Selection arXiv:2408.01005
Trajectory PredictiononVi-Fi Multi-modal Dataset
MSE-D· 2024-04-02
58.31
best: 13.42 (OOSTraj)
SOTA
OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising arXiv:2404.02227
Trajectory PredictiononVi-Fi Multi-modal Dataset
MSE-P· 2024-04-02
57.7
best: 13.83 (OOSTraj)
SOTA
OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising arXiv:2404.02227
Time Series ForecastingonBPI challenge '12
Accuracy· 2016-12-07
0.76
best: 0.79 (QuerySelector)
SOTA
Predictive Business Process Monitoring with LSTM Neural Networks arXiv:1612.02130
Time Series ForecastingonHelpdesk
Accuracy· 2016-12-07
0.7123
best: 0.743 (QuerySelector)
SOTA
Predictive Business Process Monitoring with LSTM Neural Networks arXiv:1612.02130
Time Series AnalysisonBPI challenge '12
Accuracy· 2016-12-07
0.76
best: 0.79 (QuerySelector)
SOTA
Predictive Business Process Monitoring with LSTM Neural Networks arXiv:1612.02130
Time Series AnalysisonHelpdesk
Accuracy· 2016-12-07
0.7123
best: 0.743 (QuerySelector)
SOTA
Predictive Business Process Monitoring with LSTM Neural Networks arXiv:1612.02130
Multivariate Time Series ForecastingonBPI challenge '12
Accuracy· 2016-12-07
0.76
best: 0.79 (QuerySelector)
SOTA
Predictive Business Process Monitoring with LSTM Neural Networks arXiv:1612.02130
Multivariate Time Series ForecastingonHelpdesk
Accuracy· 2016-12-07
0.7123
best: 0.743 (QuerySelector)
SOTA
Predictive Business Process Monitoring with LSTM Neural Networks arXiv:1612.02130
Trajectory PredictiononVi-Fi Multi-modal Dataset
SUM· 2024-04-02
116.01
best: 200.9 (ViTag)
OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising arXiv:2404.02227
Time Series PredictiononSunspot
RMSE· uses extra data
0
Time Series AnalysisonSunspot
RMSE· uses extra data
0
Fire DetectiononNIST Report of Test FR 4016
F1-Score
0.86
best: 0.93 (rTPNN)
Fire DetectiononNIST Report of Test FR 4016
MCC
0.82
best: 0.9 (rTPNN)

Medical5 results

Language ModellingonWikiText-103
Test perplexity· 2016-12-13
48.7
best: 2.4 (RETRO (7.5B))
SOTA
Improving Neural Language Models with a Continuous Cache arXiv:1612.04426
Language ModellingonWikiText-103
Validation perplexity· 2020-05-17
52.73
best: 13.11 (Ensemble of All)
How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?arXiv:2005.08199
Language Modellingonenwik8
Bit per Character (BPC)· 2019-09-04
1.195
best: 1.67 (LSTM (7 layers))
Mogrifier LSTM arXiv:1909.01792
Language ModellingonWikiText-103
Test perplexity· 2018-03-27
36.4
best: 2.4 (RETRO (7.5B))
Fast Parametric Learning with Activation Memorization arXiv:1803.10049
Language ModellingonWikiText-103
Validation perplexity· 2018-03-27
36
best: 13.11 (Ensemble of All)
Fast Parametric Learning with Activation Memorization arXiv:1803.10049

Speech4 results

Keyword SpottingonGoogle Speech Commands
Google Speech Commands V2 20· 2019-07-10
93.72
best: 97.8 (Wav2KWS)
Multi-layer Attention Mechanism for Speech Keyword Recognition arXiv:1907.04536
Keyword SpottingonGoogle Speech Commands
Google Speech Commands V1 12· uses extra data· 2017-11-20
92.9
best: 98.56 (TripletLoss-res15)
Hello Edge: Keyword Spotting on Microcontrollers arXiv:1711.07128
Speech Emotion RecognitiononQuechua-SER
CCC (Arousal)
0.764
Speech Emotion RecognitiononQuechua-SER
CCC (Valence)
0.648

Methodology3 results

ClassificationonSHD - Adding
Accuracy (%)· 2023-06-14
10
best: 82 (ELM Neuron)
The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks arXiv:2306.16922
Representation LearningonGoogle Dataset
CR
0.38
best: 0.43 (BiLSTM)
Representation LearningonGoogle Dataset
F1
0.82
best: 0.855 (SLAHAN (LSTM+syntactic-information))

Audio3 results

Speech RecognitiononTIMIT
Percentage error· 2018-11-19
16
best: 8.3 (wav2vec 2.0)
The PyTorch-Kaldi Speech Recognition Toolkit arXiv:1811.07453
Emotion RecognitiononQuechua-SER
CCC (Arousal)
0.764
Emotion RecognitiononQuechua-SER
CCC (Valence)
0.648

Computer Vision2 results

Gesture RecognitiononDVS128 Gesture
Accuracy (%)· 2020-05-02
86.81
best: 100 (TENNs-PLEIADES)
Comparing SNNs and RNNs on Neuromorphic Vision Datasets: Similarities and Differences arXiv:2005.02183
Image Classificationonnoise padded CIFAR-10
% Test Accuracy· 2019-02-26
11.6
best: 62.4 (UnICORNN)
AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks arXiv:1902.09689

Knowledge Base2 results

Text SummarizationonGoogle Dataset
CR
0.38
best: 0.43 (BiLSTM)
Text SummarizationonGoogle Dataset
F1
0.82
best: 0.855 (SLAHAN (LSTM+syntactic-information))

Adversarial1 result

Text GenerationonCleaned E2E NLG Challenge
METEOR (Validation set)· 2021-02-02
0.394
SOTA
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics arXiv:2102.01672

Music1 result

Music ModelingonNottingham
NLL· 2018-03-04
3.29
best: 4.05 (RNN)
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling arXiv:1803.01271