TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/EQ-Bench: An Emotional Intelligence Benchmark for Large La...

EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models

Samuel J. Paech

2023-12-11BenchmarkingEmotional IntelligenceMMLU
PaperPDFCode(official)

Abstract

We introduce EQ-Bench, a novel benchmark designed to evaluate aspects of emotional intelligence in Large Language Models (LLMs). We assess the ability of LLMs to understand complex emotions and social interactions by asking them to predict the intensity of emotional states of characters in a dialogue. The benchmark is able to discriminate effectively between a wide range of models. We find that EQ-Bench correlates strongly with comprehensive multi-domain benchmarks like MMLU (Hendrycks et al., 2020) (r=0.97), indicating that we may be capturing similar aspects of broad intelligence. Our benchmark produces highly repeatable results using a set of 60 English-language questions. We also provide open-source code for an automated benchmarking pipeline at https://github.com/EQ-bench/EQ-Bench and a leaderboard at https://eqbench.com

Results

TaskDatasetMetricValueModel
Emotional IntelligenceEQ-BenchEQ-Bench Score62.52OpenAI gpt-4-0613
Emotional IntelligenceEQ-BenchEQ-Bench Score54.83migtissera/SynthIA-70B-v1.5
Emotional IntelligenceEQ-BenchEQ-Bench Score53.39OpenAI gpt-4-0314
Emotional IntelligenceEQ-BenchEQ-Bench Score52.44Qwen/Qwen-72B-Chat
Emotional IntelligenceEQ-BenchEQ-Bench Score52.14Anthropic Claude2
Emotional IntelligenceEQ-BenchEQ-Bench Score51.56meta-llama/Llama-2-70b-chat-hf
Emotional IntelligenceEQ-BenchEQ-Bench Score51.0301-ai/Yi-34B-Chat
Emotional IntelligenceEQ-BenchEQ-Bench Score49.17OpenAI gpt-3.5-0613
Emotional IntelligenceEQ-BenchEQ-Bench Score47.61OpenAI gpt-3.5-turbo-0301
Emotional IntelligenceEQ-BenchEQ-Bench Score44.4Open-Orca/Mistral-7B-OpenOrca
Emotional IntelligenceEQ-BenchEQ-Bench Score43.76Qwen/Qwen-14B-Chat
Emotional IntelligenceEQ-BenchEQ-Bench Score43.73OpenAI text-davinci-003
Emotional IntelligenceEQ-BenchEQ-Bench Score43.61Intel/neural-chat-7b-v3-1
Emotional IntelligenceEQ-BenchEQ-Bench Score39.44OpenAI text-davinci-002
Emotional IntelligenceEQ-BenchEQ-Bench Score37.08openchat/openchat 3.5
Emotional IntelligenceEQ-BenchEQ-Bench Score36.52lmsys/vicuna-33b-v1.3
Emotional IntelligenceEQ-BenchEQ-Bench Score33.02meta-llama/Llama-2-13b-chat-hf
Emotional IntelligenceEQ-BenchEQ-Bench Score32.85lmsys/vicuna-13b-v1.1
Emotional IntelligenceEQ-BenchEQ-Bench Score25.43meta-llama/Llama-2-7b-chat-hf
Emotional IntelligenceEQ-BenchEQ-Bench Score24.92Koala 13B
Emotional IntelligenceEQ-BenchEQ-Bench Score22.24lmsys/vicuna-7b-v1.1
Emotional IntelligenceEQ-BenchEQ-Bench Score15.19OpenAI text-davinci-001
Emotional IntelligenceEQ-BenchEQ-Bench Score2.25OpenAI ADA
Emotional IntelligenceEQ-BenchEQ-Bench Score2.25OpenAI ADA

Related Papers

Visual Place Recognition for Large-Scale UAV Applications2025-07-20Training Transformers with Enforced Lipschitz Constants2025-07-17Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Learning What Matters: Probabilistic Task Selection via Mutual Information for Model Finetuning2025-07-16DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15A Multi-View High-Resolution Foot-Ankle Complex Point Cloud Dataset During Gait for Occlusion-Robust 3D Completion2025-07-15