Two-sample testing on CIFAR-10 vs CIFAR-10.1 (1000 samples)

Metric: Avg accuracy (higher is better)

LeaderboardDataset
Loading chart...