Probing Language Models on KAMEL

Metric: Average F1 (higher is better)

LeaderboardDataset
#ModelAverage F1Extra DataPaperDateCode
1OPT-13b17.62No--Code