Large Language Model on JerichoWorld

Metric: Set accuracy (higher is better)

LeaderboardDataset
Loading chart...