Denoising QA
Reported on 9 benchmarks across 2 tasks
Note: results are matched by exact model name. Different papers may use the same name for different model variants.
Natural Language Processing9 results
- 42.2best: 54 (Cluster-Former (#C=512))
- EM (Quasar-T)42.2best: 42.3 (Evidence Aggregation via R^3 Re-Ranking)
- F1 (Quasar-T)49.3best: 49.6 (Evidence Aggregation via R^3 Re-Ranking)
- 58.8best: 68 (Cluster-Former (#C=512))
- 64.5best: 84.8 (SpanBERT)
- EM (Quasar-T)42.2best: 42.3 (Evidence Aggregation via R^3 Re-Ranking)
- F1 (Quasar-T)49.3best: 49.6 (Evidence Aggregation via R^3 Re-Ranking)
- 58.8best: 68 (Cluster-Former (#C=512))
- 64.5best: 84.8 (SpanBERT)