HarmfulTasks
Harmful and Malicious Tasks for LLMs in Jailbreaking Prompts
TextsMITIntroduced 2024-01-19
This dataset consists of 225 malicious tasks, which were integrated into ten distinct jailbreaking prompts. The malicious tasks were divided into five categories, namely,
- Misinformation and Disinformation
- Security Threats and Cybercrimes
- Unlawful Behaviors and Activities
- Hate Speech and Discrimination
- Substance Abuse and Dangerous Practices.
The jailbreaking prompts were carefully selected to cover a diverse range of scenarios. These scenarios included role-playing, simulations, attention-shifting, and privileged execution, and the placement of the malicious task within the jailbreaking prompts was also varied.
List of malicious tasks only: https://github.com/CrystalEye42/eval-safety/blob/main/malicious_tasks_dataset.yaml
Malicious tasks with jailbreaking prompts: https://github.com/CrystalEye42/eval-safety/blob/main/integrated.yaml