Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Speech
/
Dialogue
Dialogue
183 benchmarks
0 papers
Dialogue is notoriously hard to evaluate. Past approaches have used human evaluation.
Benchmarks
Dialogue on
Visual Dialog v1.0 test-std
NDCG (x 100)
MRR (x 100)
R@1
R@5
Mean
R@10
Dialogue on
VisDial v0.9 val
R@1
R@10
R@5
Mean Rank
MRR
Dialogue on
Fluent Speech Commands
Accuracy (%)
Dialogue on
Switchboard corpus
Accuracy
Dialogue on
KVRET
Entity F1
BLEU
Dialogue on
Wizard-of-Oz
Joint
Request
Dialogue on
CoSQL
question match accuracy
interaction match accuracy
Dialogue on
LRE07
3 sec
10 sec
30 sec
Average
Dialogue on
ICSI Meeting Recorder Dialog Act (MRDA) corpus
Accuracy
Dialogue on
Second dialogue state tracking challenge
Joint
Area
Food
Price
Request
Dialogue on
Snips-SmartLights
Accuracy (%)
Dialogue on
MULTIWOZ 2.0
MultiWOZ (Success)
MultiWOZ (Inform)
BLEU
BLEU-4
Score
Dialogue on
Persona-Chat
Avg F1
BLEU-1
BLEU-2
Distinct-1
Distinct-2
CIDr
METEOR
ROUGE-L
Dialogue on
SIMMC2.0
Act F1
Slot F1
Dialogue on
Snips-SmartSpeaker
Accuracy-EN (%)
Accuracy-FR (%)
Dialogue on
VoxForge European
Accuracy (%)
Dialogue on
YouTube News dataset (No Noise)
Accuracy
F1 Score
Dialogue on
YouTube News dataset (White Noise)
Accuracy
F1 Score
Dialogue on
irc-disentanglement
VI
P
R
F
1-1
Dialogue on
rt-inod-jailbreaking
Best-of
Dialogue on
MULTIWOZ 2.1
BLEU
MultiWOZ (Inform)
MultiWOZ (Success)
Joint Acc
MultiWOZ (Joint Goal Acc)
Dialogue on
OpenViDial 2.0
BLEU
Dis-1
Dis-2
Dis-3
Dis-4
Dialogue on
Spoken-SQuAD
F1 score
Dialogue on
VoxForge Commonwealth
Accuracy (%)
Dialogue on
DeliData
AUC
Dialogue on
IndicTTS
Classification Accuracy
Dialogue on
Linux IRC (Ch2 Elsner)
1-1
Local
Shen F-1
Dialogue on
Linux IRC (Ch2 Kummerfeld)
1-1
Local
Shen F-1
Dialogue on
Timers and Such
Accuracy (%)
Dialogue on
VoxForge
Accuracy
Dialogue on
EmpatheticDialogues
BLEU
BLEU-4
F1
ROUGE-L
Dialogue on
FusedChat
Slot Accuracy
Joint SA
Inform
Inform_mct
Success
Success_mct
BLEU
PPL
Sensibleness
Specificity
SSA
Dialogue on
Harry Potter Dialogue Dataset
mauve
Recall 10@1
Dialogue on
KALAKA-3
PC
EC
EO
PO
Dialogue on
MMConv
Categorical Accuracy
Non-Categorical Accuracy
Overall
Dialogue on
MULTIWOZ 2.2
MultiWOZ (Joint Goal Acc)
Dialogue on
SGD
METEOR
Dialogue on
VOXLINGUA107
0..5sec
5..20sec
Average
Dialogue on
VisDial v1.0 test-std
MRR
Mean Rank
NDCG
R@1
R@10
R@5
Dialogue on
YouTube News dataset (Background Music)
Accuracy
F1 Score
Dialogue on
YouTube News dataset (Crackling Noise)
Accuracy
F1 Score
Dialogue on
ABCD
In-domain EM
In-domain CE
Cross-domain EM
Cross-domain CE
Dialogue on
Amazon-5
1 in 10 R@2
Dialogue on
BlendedSkillTalk
BLEU-4
F1
ROUGE-L
Dialogue on
CMU-DoG
F1
Meteor
ROUGE-1
Rouge-L
Dialogue on
ConvAI2
BLEU-4
F1
ROUGE-L
Dialogue on
DSTC9 Track 3 - Task 2
Overall Human Rating
Coherent
Error Recovery
Consistent
Diversity
Topic Depth
Likeable
Understanding
Flexible
Informative
Inquisitive
Dialogue on
EMOTyDA
Accuracy
Dialogue on
GIF Reply Dataset
nDCG@10
Dialogue on
Image-Chat
BLEU-4
F1
ROUGE-L
Dialogue on
Kvret
Entity F1
BLEU
Embedding Average
Greedy Matching
Vector Extrema
Dialogue on
PG-19
Perplexity
Dialogue on
ProsocialDialog
Accuracy
Dialogue on
Reddit (multi-ref)
interest (human)
relevance (human)
Dialogue on
SSD_NAME
Dialogue Success Rate
Joint Acc
Slot Acc
Dialogue on
Switchboard Dialog Act Corpus
Accuracy
Dialogue on
Switchboard dialogue act corpus
Accuracy
Dialogue on
Twitter Dialogue (Noun)
F1
Precision
Recall
Dialogue on
Ubuntu Dialogue (Activity)
F1
Precision
Recall
Dialogue on
Ubuntu Dialogue (Entity)
F1
Precision
Recall
Dialogue on
Wizard of Wikipedia
BLEU-4
F1
ROUGE-L
Dialogue on
automata
Dialogue Success Rate
Dialogue on
Twitter Dialogue (Tense)
Accuracy
Dialogue on
Ubuntu Dialogue (Cmd)
Accuracy
Dialogue on
Ubuntu Dialogue (Tense)
Accuracy
Dialogue on
Untranscribed mixed-speech dataset
ACC
PRC
RCL