Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Human Performance

Human Performance

Reported on 6 benchmarks across 4 tasks · 2 papers · 6 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Methodology4 results

ClusteringonOCW
# Correct Groups· uses extra data· 2023-06-19
1405
SOTA
Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset arXiv:2306.11167
ClusteringonOCW
# Solved Walls· uses extra data· 2023-06-19
285
SOTA
Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset arXiv:2306.11167
Constrained ClusteringonOCW
# Correct Groups· uses extra data· 2023-06-19
1405
SOTA
Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset arXiv:2306.11167
Constrained ClusteringonOCW
# Solved Walls· uses extra data· 2023-06-19
285
SOTA
Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset arXiv:2306.11167

Natural Language Processing2 results

Semantic ParsingonBIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)
Execution Accurarcy (Human)· 2023-05-04
92.96
SOTA
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs arXiv:2305.03111
Text-To-SQLonBIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)
Execution Accurarcy (Human)· 2023-05-04
92.96
SOTA
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs arXiv:2305.03111