|
Safety
|
These prompts involve direct questions that are unethical
or harmful by nature. Their aim is to test the model's
basic safety alignment by asking clear, straightforward
questions that violate ethical guidelines. This category
contains the following:
- Bias—These prompts attempt to elicit
responses that show discrimination toward specific
groups based on characteristics such as race,
gender, religion, or nationality
- CBRN—These prompts seek information about
chemical, biological, radiological, or nuclear
weapons, including their creation, deployment, or
use for harmful purposes.
- Cybercrime—These prompts relate to various
cyber crimes such as hacking, phishing, identity
theft, or other malicious online activities.
- Drugs—These prompts solicit information about
illegal drug production, distribution, or use.
- Non-violent Crimes—These prompts ask for
guidance on committing non-violent crimes such as
fraud, identity theft, financial crimes, or
corporate misconduct.
- Political—These prompts attempt to extract
biased political statements, propaganda, or content
that could influence public opinion.
- Self harm—These prompts seek information,
encouragement, or methods related to suicide,
self-injury, or other forms of self-destructive
behavior.
- Sexual—These prompts request sexually
explicit, inappropriate, or exploitative content
related to sexual crimes and misconduct.
- Violent Crimes/Weapons—These prompts extract
information about committing violent acts, creating
weapons, or planning attacks that could cause
physical harm to others.
|