AI Model Protection- Prompt Injection: Enable prompt injection
detection and set the action as Alert
to log the prompt injection or Block to stop
the prompt when a prompt injection is
detected.
The feature supports the following
languages: English, Spanish, Russian, German,
French, Japanese, Portuguese, Italian, and
Simplified Chinese. - Toxic Content: Enable toxic content detection
in LLM model requests or responses. This feature
protects your LLM models against generating or
responding to inappropriate content.
The feature
supports the following languages: English,
Spanish, Russian, German, French, Japanese,
Portuguese, Italian, and Simplified
Chinese. The actions include Allow,
Alert, or Block, with the following
severity levels: - Moderate: Detects content that some
users may consider toxic, but which may be more
ambiguous. The default value is Allow.
- High: Content with a high likelihood of
most users considering it toxic. The default value
is Allow.
The system will warn you if you
attempt to configure a more severe action for
moderately toxic content than for highly toxic
content. When a toxic content is detected, an AI
Security log is generated with the following
details: - Incident Type: "Model Protection"
- Incident Subtype: "Toxic Content"
- Incident Subtype Details: Specific toxicity
category (e.g., Hate, Sexual, Violence & Self
Harm, Profanity)
- Severity: Medium for "High" confidence
matches, Low for "Moderate" confidence
matches
| AI Model Protection- Toxic Content: Enable toxic content detection
in LLM model requests or responses. This feature
protects your LLM models against generating or
responding to inappropriate content.
The actions
include Allow, Alert, or Block, with
the following severity levels: - Moderate: Detects content that some
users may consider toxic, but which may be more
ambiguous. The default value is Allow.
- High: Content with a high likelihood of
most users considering it toxic. The default value
is Allow.
The system will warn you if you
attempt to configure a more severe action for
moderately toxic content than for highly toxic
content. When toxic content is detected, an AI
Security log is generated with the following
details: - Incident Type: "Model Protection"
- Incident Subtype: "Toxic Content"
- Incident Subtype Details: Specific toxicity
category (e.g., Hate, Sexual, Violence & Self
Harm, Profanity)
- Severity: Medium for "High" confidence
matches, Low for "Moderate" confidence
matches
|