Long-Context Understanding on Ada-LEval (BestAnswer)

Metric: 128k (higher is better)

LeaderboardDataset