TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Human Performance

Human Performance

Reported on 6 benchmarks across 4 tasks · 2 papers · 6 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Methodology4 results

  • ClusteringonOCW
    # Correct Groups· uses extra data· 2023-06-19
    1405
    SOTA
    Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall DatasetarXiv:2306.11167
  • ClusteringonOCW
    # Solved Walls· uses extra data· 2023-06-19
    285
    SOTA
    Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall DatasetarXiv:2306.11167
  • Constrained ClusteringonOCW
    # Correct Groups· uses extra data· 2023-06-19
    1405
    SOTA
    Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall DatasetarXiv:2306.11167
  • Constrained ClusteringonOCW
    # Solved Walls· uses extra data· 2023-06-19
    285
    SOTA
    Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall DatasetarXiv:2306.11167

Natural Language Processing2 results

  • Semantic ParsingonBIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)
    Execution Accurarcy (Human)· 2023-05-04
    92.96
    SOTA
    Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLsarXiv:2305.03111
  • Text-To-SQLonBIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)
    Execution Accurarcy (Human)· 2023-05-04
    92.96
    SOTA
    Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLsarXiv:2305.03111