HarmfulTasks

Harmful and Malicious Tasks for LLMs in Jailbreaking Prompts

TextsMITIntroduced 2024-01-19

This dataset consists of 225 malicious tasks, which were integrated into ten distinct jailbreaking prompts. The malicious tasks were divided into five categories, namely,

  1. Misinformation and Disinformation
  2. Security Threats and Cybercrimes
  3. Unlawful Behaviors and Activities
  4. Hate Speech and Discrimination
  5. Substance Abuse and Dangerous Practices.

The jailbreaking prompts were carefully selected to cover a diverse range of scenarios. These scenarios included role-playing, simulations, attention-shifting, and privileged execution, and the placement of the malicious task within the jailbreaking prompts was also varied.

List of malicious tasks only: https://github.com/CrystalEye42/eval-safety/blob/main/malicious_tasks_dataset.yaml

Malicious tasks with jailbreaking prompts: https://github.com/CrystalEye42/eval-safety/blob/main/integrated.yaml