Papers With Code 2 | ML Benchmarks, SotA Results & Code

This dataset consists of 225 malicious tasks, which were integrated into ten distinct jailbreaking prompts. The malicious tasks were divided into five categories, namely,

Misinformation and Disinformation
Security Threats and Cybercrimes
Unlawful Behaviors and Activities
Hate Speech and Discrimination
Substance Abuse and Dangerous Practices.

The jailbreaking prompts were carefully selected to cover a diverse range of scenarios. These scenarios included role-playing, simulations, attention-shifting, and privileged execution, and the placement of the malicious task within the jailbreaking prompts was also varied.

List of malicious tasks only: https://github.com/CrystalEye42/eval-safety/blob/main/malicious_tasks_dataset.yaml

Malicious tasks with jailbreaking prompts: https://github.com/CrystalEye42/eval-safety/blob/main/integrated.yaml

HarmfulTasks