Bias Detection on rt-inod-bias

Metric: Best-of (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	Best-of▼	Extra Data	Paper	Date↕	Code
1	GPT-4	0.5	No	Benchmarking Llama2, Mistral, Gemma and GPT for ...	2024-04-15	Code
2	Gemma	0.41	No	Benchmarking Llama2, Mistral, Gemma and GPT for ...	2024-04-15	Code
3	Baseline	0.41	No	Benchmarking Llama2, Mistral, Gemma and GPT for ...	2024-04-15	Code
4	Mistral	0.36	No	Benchmarking Llama2, Mistral, Gemma and GPT for ...	2024-04-15	Code
5	Llama2	0.34	No	Benchmarking Llama2, Mistral, Gemma and GPT for ...	2024-04-15	Code

#1GPT-4SOTA
0.5
Best-of· 2024-04-15
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations Code
#2Gemma
0.41
Best-of· 2024-04-15
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations Code
#3Baseline
0.41
Best-of· 2024-04-15
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations Code
#4Mistral
0.36
Best-of· 2024-04-15
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations Code
#5Llama2
0.34
Best-of· 2024-04-15
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations Code