Papers With Code 2 | ML Benchmarks, SotA Results & Code

AART serves as an automated alternative to the current manual red-teaming efforts. The primary goal is to evaluate the safety of LLM generations in various application contexts.

Adversarial Testing of LLMs: Adversarial testing is crucial for ensuring the safe and responsible deployment of LLMs. The authors introduce a novel approach to generate adversarial evaluation datasets. These datasets are used to assess the safety of LLM outputs in real-world scenarios.

Key Features of AART: Diverse Data Generation: AART generates evaluation datasets with high diversity of content characteristics. This includes concepts that are sensitive, harmful, and specific to various cultural and geographic regions and application scenarios.

AI-Assisted Recipes: The data generation process is steered by AI-assisted recipes. These recipes define, scope, and prioritize diversity within the application context. Structured LLM-Generation Process: AART feeds the diverse data into a structured LLM-generation process. This helps scale up evaluation priorities. Promising Results: Compared to some state-of-the-art tools, AART demonstrates promising results in terms of concept coverage and data quality.