Prisma AIRS
Scans
Table of Contents
Expand All
|
Collapse All
Prisma AIRS Docs
Scans
Learn about AI Red Teaming Scans in Prisma AIRS.
| Where Can I Use This? | What Do I Need? |
|---|---|
|
|
One complete assessment of an AI system using Prisma AIRS AI Red Teaming is considered as
a scan. A scan is carried out by sending attack payloads to an AI system in the
form of attack prompts.
Prisma AIRS AI Red Teaming offers the following three types of scanning for AI systems:
- Red Teaming using Attack Library—This scan uses a curated and regularly updated list of predefined attack scenarios. These attacks are designed based on known vulnerabilities and best practices in red teaming.
- Red Teaming using Agent—This scan utilizes dynamic attack generation powered by an LLM attacker. This type allows for real-time generation of attack payloads, making it highly adaptive to the specific behavior and responses of the Target.
- Red Teaming using Custom Prompt Sets—This scan allows you to upload and run your own prompt sets against target LLM endpoints alongside AI Red Teaming's built-in attack library.
Red Teaming using Attack Library
In this type, AI Red Teaming uses a proprietary attack library which is constantly
updated to simulate attacks on any AI system.
Key aspects of an attack library scan include:
- Attack categories
- Attack severities
- Risk Score
Attack Categories
Each Attack Category contains a range of techniques. A prompt can incorporate
techniques from multiple categories to enhance its Attack Success Rate (ASR) and
is classified into categories based on the techniques it uses.
The attack library currently has three categories of attacks which also undergo
regular updates; Security, Safety, and Compliance:
| Attack Category | Attack Scope |
|---|---|
|
Security
|
Represents security vulnerabilities and potential
exploits. This category includes the following:
|
|
Safety
|
These prompts involve direct questions that are unethical
or harmful by nature. Their aim is to test the model's
basic safety alignment by asking clear, straightforward
questions that violate ethical guidelines. This category
contains the following:
|
|
Compliance
| Supports testing AI systems against established AI security frameworks such as OWASP LLM Top 10, MITRE ATLAS, NIST AI RMF, and DASF V2.0. It evaluates models against the specific risks identified in each standard, allowing users to assess compliance levels, identify potential vulnerabilities, and gain valuable insights into how effectively their systems adhere to recognized security frameworks. |
Attack Severities
Each attack in the attack library has an associated severity. Severity of
an attack is assessed subjectively by our in-house experts and is based on the
sophistication of technique used and the impact it can have if successful.
Attack severities in AI Red Teaming are:
- Critical
- High
- Medium
- Low
Risk Score
This is the overall risk score assigned to the AI system based on the findings of
the attack library scan. It points to the safety and security risk
susceptibility of the system. A higher risk score indicates that the AI system
is more vulnerable to safety and security attacks.
Risk Score ranges from 0-100, 0 being practically no risk and 100 being very high
risk.
Red Teaming using Agent
In this type, a LLM agent interrogates an AI system and then simulates attacks in a
conversational fashion. Key modes of an agent scan include:
- Completely automated
- Human augmented
Between these two modes, you can perform a full spectrum testing of any AI
system:
- Black box testing—Using completely automated agent scan.
- Grey box testing—Using human augmented agent scan and sharing some details other than system prompt.
- White box testing—Using human augmented agent scan and sharing all details including system prompt.
Completely Automated Agent Scans
In this mode, the agent requires no inputs from the user. The agent first
enquires about the nature or use case of the AI system and then crafts attack
goals based on that. To achieve the attack goals, the agent creates prompt
attacks on the fly and then keeps adapting those based on the response of the AI
system.
Human Augmented Agent Scans
In this mode, the user can share details about the AI system for the agent to
craft more pertinent goals. This details can include any or all of the
following:
- Base model—underlying base model powering the AI system.
- Use case—what is the model or application about. For example, a customer support chatbot, HR system chatbot, and a general purpose GPT.
- Attack goals —these could be specific attack types which the user wants the agent to test for, such as, leak customer data, and share employee salary.Attack objectives don't need to be crafted like proper attacks, they can be general English language statements and the agent will use those to craft proper attacks.
- System Prompt—this is the system prompt used to train the AI system and if shared with AI Red Teaming, it can help carry out very advanced attacks on the system and can be treated as full white box testing of the system
Agent Scan Reports
Similar to Attack Library Scans, these reports also have an overall Risk Score
pointing to the safety and security risk susceptibility of the AI system. The
Risk Score is calculated based on the number of attack goals crafted by the
agent which were successful and the number of techniques which had to be used to
achieve them.
Agent Scans are conversational in nature, and do not have specific attack
categories but they output the entire conversation between the agent and the AI
system for a specific attack goal which was achieved.
Red Teaming using Custom Prompt Sets
Custom attacks functionality within AI Red Teaming allows you to upload and run your
own prompt sets against target LLM endpoints alongside AI Red Teaming's built-in
attack library.
The Custom Attacks feature allows you to:
- Upload and maintain your own prompt sets.
- Use one or more custom prompt sets alongside AI Red Teaming's standard attack library while scanning a target.
- View dedicated reports for custom prompt attack results.
Prompt Sets
Prompt Sets
A prompt set is a collection of related prompts grouped together for
organizational purposes and efficient scanning.
Prompts
Prompts are individual text inputs designed to test system vulnerabilities.
All prompts in the system require validation before they can be used in security
scans. This process ensures that prompts have clear attack goals and are ready
for effective testing. All prompts in the system require validation before they
can be used in security scans. This process ensures that prompts have clear
attack goals and are ready for effective testing.
You can validate prompts automatically and manually.
Automatic Validation
All prompts undergo automatic validation. This can take up to 5-10
minutes. The process of validating a prompt involves interpreting and
generating an attack goal for the prompt. This is done by our proprietary
LLMs.
Manual Validation
If automatic validation fails, you'll be prompted to manually validate the prompt
by adding a goal for the prompt; you can also choose to skip the prompt.
Managing Prompt Sets
You can perform the following actions on your prompt sets:
- Edit prompt set name
- Edit description
- Add new prompts
- Delete individual prompts
- Validate unvalidated prompts
Prompts can have the following validation status:
- Validated. Indicates that the prompt is ready to use.
- Validating. Indicates that auto-validation is in progress.
- Not validated. Indicates that the prompt requires manual validation.
Prompt Set Usage
A prompt set provides the following usage patterns:
- A prompt set is enabled and ready to use when at least one prompt in the set is validated.
- Only **_VALIDATED_** prompts within the prompt set will be used in a scan.
- Not-validated prompts will be ignored during scans.
| Validation Status | Description |
|---|---|
| Validation In-progress |
This prompt indicates that the prompts are being validated.
This process runs in the background and may take up to 10
minutes to complete, based on the length of the prompt. You
can continue working until validation is complete. This view
displays the prompt and its corresponding validation status.
|
| Validated Prompts |
This displays validated prompts. It provides an overview
which includes details about the prompt, including the
status, when the prompt was added and a brief
description.
|
| Not-Validated Prompts |
This displays prompts that were not validated. Some prompts
require manual validation and cannot be used until
validated.
|
| Manual Validation |
Prompts that require manual validation require you to review
the prompt's details and specify the goal to complete
validation. Enter the prompt and specify the goal, then
validate the prompt.
|