Metric: ICAT Score (higher is better)
| # | Model↕ | ICAT Score▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | GPT-2 (small) | 72.97 | No | StereoSet: Measuring stereotypical bias in pretr... | 2020-04-20 | Code |
| 2 | XLNet (large) | 72.03 | No | StereoSet: Measuring stereotypical bias in pretr... | 2020-04-20 | Code |
| 3 | GPT-2 (medium) | 71.73 | No | StereoSet: Measuring stereotypical bias in pretr... | 2020-04-20 | Code |
| 4 | BERT (base) | 71.21 | No | StereoSet: Measuring stereotypical bias in pretr... | 2020-04-20 | Code |
| 5 | GPT-2 (large) | 70.54 | No | StereoSet: Measuring stereotypical bias in pretr... | 2020-04-20 | Code |
| 6 | BERT (large) | 69.89 | No | StereoSet: Measuring stereotypical bias in pretr... | 2020-04-20 | Code |
| 7 | RoBERTa (base) | 67.5 | No | StereoSet: Measuring stereotypical bias in pretr... | 2020-04-20 | Code |
| 8 | GAL 120B | 65.6 | No | Galactica: A Large Language Model for Science | 2022-11-16 | Code |
| 9 | XLNet (base) | 62.1 | No | StereoSet: Measuring stereotypical bias in pretr... | 2020-04-20 | Code |
| 10 | GPT-3 (text-davinci-002) | 60.8 | No | Galactica: A Large Language Model for Science | 2022-11-16 | Code |
| 11 | OPT 175B | 60 | No | Galactica: A Large Language Model for Science | 2022-11-16 | Code |