TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SocialDial: A Benchmark for Socially-Aware Dialogue Systems

SocialDial: A Benchmark for Socially-Aware Dialogue Systems

Haolan Zhan, Zhuang Li, YuFei Wang, Linhao Luo, Tao Feng, Xiaoxi Kang, Yuncheng Hua, Lizhen Qu, Lay-Ki Soon, Suraj Sharma, Ingrid Zukerman, Zhaleh Semnani-Azad, Gholamreza Haffari

2023-04-24Synthetic Data GenerationCultural Vocal Bursts Intensity Prediction
PaperPDFCode(official)

Abstract

Dialogue systems have been widely applied in many scenarios and are now more powerful and ubiquitous than ever before. With large neural models and massive available data, current dialogue systems have access to more knowledge than any people in their life. However, current dialogue systems still do not perform at a human level. One major gap between conversational agents and humans lies in their abilities to be aware of social norms. The development of socially-aware dialogue systems is impeded due to the lack of resources. In this paper, we present the first socially-aware dialogue corpus - SocialDial, based on Chinese social culture. SocialDial consists of two parts: 1,563 multi-turn dialogues between two human speakers with fine-grained labels, and 4,870 synthetic conversations generated by ChatGPT. The human corpus covers five categories of social norms, which have 14 sub-categories in total. Specifically, it contains social factor annotations including social relation, context, social distance, and social norms. However, collecting sufficient socially-aware dialogues is costly. Thus, we harness the power of ChatGPT and devise an ontology-based synthetic data generation framework. This framework is able to generate synthetic data at scale. To ensure the quality of synthetic dialogues, we design several mechanisms for quality control during data collection. Finally, we evaluate our dataset using several pre-trained models, such as BERT and RoBERTa. Comprehensive empirical results based on state-of-the-art neural models demonstrate that modeling of social norms for dialogue systems is a promising research direction. To the best of our knowledge, SocialDial is the first socially-aware dialogue dataset that covers multiple social factors and has fine-grained labels.

Related Papers

Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training2025-07-11DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations2025-07-08How Good Are Synthetic Requirements ? Evaluating LLM-Generated Datasets for AI4RE2025-06-26SoK: Can Synthetic Images Replace Real Data? A Survey of Utility and Privacy of Synthetic Image Generation2025-06-24PuckTrick: A Library for Making Synthetic Data More Realistic2025-06-23RoboMonkey: Scaling Test-Time Sampling and Verification for Vision-Language-Action Models2025-06-21Latent Noise Injection for Private and Statistically Aligned Synthetic Data Generation2025-06-19Graph-Convolutional-Beta-VAE for Synthetic Abdominal Aorta Aneurysm Generation2025-06-16