Large Language Model on PubMedQA corpus with metadata

Metric: ANS-EM (higher is better)

LeaderboardDataset
Loading chart...