The Unified Incident Framework provides detailed management over the detection,
raising, and clearing of network incidents across your Prisma Access environments. This
feature allows you to define custom thresholds and monitoring parameters for critical
network events, ensuring that your monitoring strategy aligns with your operational
requirements and service level objectives.
You can now define specific conditions that must be met before an incident is
raised or cleared, including customizable time windows, event frequency thresholds, and
state persistence requirements. This dynamic approach minimizes incident volume by
preventing transient network anomalies from triggering unnecessary incidents, while
simultaneously ensuring sustained critical issues receive immediate, high-priority
attention.
For tunnel and BGP connection monitoring, you can specify how long a resource
must remain in a down state before an alert is raised, and similarly, how long it must
be operational before an existing alert is cleared. This helps you prevent incident
fatigue by filtering out brief connectivity interruptions that may resolve automatically
without intervention. For site long duration monitoring, you can customize the
parameters for both raising and clearing incidents related to prolonged site capacity
overutilization. For example, an incident is raised if the capacity utilization exceeds
a set threshold for a specified Minimum Breach Hours Per Day over a set number of
Minimum Breach Days. To clear an incident, utilization must remain below the threshold
for a required amount of Minimum Breach Hours Per Day across a designated number of
Minimum Days to Clear. Both conditions allow for granular control over alert sensitivity
by adjusting the duration and frequency required to transition between states.
This framework uses the longest-match algorithm to determine which incident
settings apply to a particular resource, enabling you to create a hierarchy of
monitoring policies that range from global defaults to site-specific or tunnel-specific
configurations. This hierarchical approach provides the flexibility to implement
monitoring for critical infrastructure while maintaining more relaxed thresholds for
less sensitive components. See Incident Setting Resolution for more
information.
Here are the Prisma Access incident codes that support customization:
Select New Custom to create a new incident setting.
Enter the Setting Name and Description.
Select the Product, Incident Category, Incident Subcategory, and
Incident Code.
After selecting the fields, if you change the product, then the
other fields will be reset.
Select the Object Type and the condition associated with it.
Select the actions that Strata Cloud Manager has to take when the
above conditions are met. Select Raise or Suppress and set the priority.
Severity of the incident is derived from the incident code.
Configure the raise and clear conditions of an incident.
If an incident supports customization, you can configure the
specific conditions that must be met before the incident is raised or
cleared, including customizable time windows, event frequency thresholds,
and state persistence requirements.
Here is an example of the INC_RN_PRIMARY_WAN_BGP_DOWN incident that
supports customization of raise and clear conditions.
In this example, the incident is configured to be raised when the
BGP status remains 'down' for a minimum of 1 minute. Conversely, the
incident is cleared once the BGP status is consistently 'up' for a period of
at least 5 minutes.
Clicking Revert to default values restores the raise and
clear conditions to their original settings. For instance, the default
configuration dictates that this incident is raised when BGP remains down
for at least 10 minutes and cleared when BGP is up for a minimum of 8
minutes.
Here is another example of the incident:
INC_RN_SITE_LONG_DURATION_CAPACITY_EXCEEDED_THRESHOLD. In this example, this
incident is raised when the capacity utilization exceeds 80% for at
least 1 hour per day consistently for 1 day within a 2
day evaluation window. You can customize all the parameters. The breach days
should be less than or equal to the evaluation window.
Let us take an example of the incident
INC_RN_PRIMARY_WAN_TUNNEL_FLAP. This incident triggers when a primary WAN
tunnel transitions from an UP to a DOWN state for a specific number of
occurrences within a defined window. In this example, Strata Cloud Manager
raises this incident if the tunnel flaps three times within ten minutes,
counting only from when the tunnel is initially UP. You can upgrade these
thresholds; for instance, a 15-minute duration allows a flap count between 3
and 9. Note that the clear time interval must always be greater than or
equal to the raised time interval—if the incident raise window is increased
to 18 minutes, the clear interval must be adjusted to at least 18 minutes as
well.