inspect performance-policy incidents
Focus
Focus
Prisma SD-WAN

inspect performance-policy incidents

Table of Contents

inspect performance-policy incidents

Use the inspect performance-policy incidents command to display performance policy incidents detected at the site, including rule violations and their severity.
Use the inspect performance-policy incidents command to investigate active or historical performance policy alarms. When a performance policy detects that a monitored metric such as packet loss on a circuit, CPU utilization, or bandwidth consumption has crossed a configured threshold, it raises an incident. This command shows the policy rules and policy sets, the current alarm state, the monitored paths, the threshold values, and recent sample data for the incident. Use this information to confirm whether an alarm is valid, identify the root cause, or verify that conditions have returned below the clear threshold.

Command

inspect performance-policy incidents type ( link-quality circuit= Circuit_ID <summary | details> | system-health health-type= <cpu | memory | disk> | circuit-health circuit= Circuit_ID )

Options

summaryDisplay a high-level incident summary, including alarm state and monitored paths. Applies to both link-quality circuit incidents and system-health incidents.
detailsDisplay full incident details, including threshold values, monitoring approach, and current and previous bucket statistics. Applies to both link-quality circuit incidents and system-health incidents.
circuit-healthDisplay incident information for a circuit, including bandwidth utilization, monitoring approach, and the raise and clear threshold values. Enter the circuit ID to identify the circuit.
health-typeDisplay incident information for a system health metric. Enter cpu, memory, or disk to specify the metric type.

When to Use

  • When an alarm fires, to see the EMA trend, threshold context, and sample data behind it, not just that the alarm exists.
  • After conditions appear to have recovered, to confirm Alarm Standing is false before closing an investigation.
  • When a performance policy action has just triggered, to attribute it to a specific circuit and threshold rather than a general policy condition.

Command Notes

RoleSuper, Read Only, Monitor
Related CommandsNone
Introduced inRelease 6.3.1

Example

The following example displays full incident details for a link quality circuit:
inspect performance-policy incidents type link-quality circuit=1697698664341010637 details ======================================== Circuit ID : 1697698664341010637 ======================================== Policy Rule : Filters (1701923892838000737) Policy Set : Hello (1701923835766003137) Idle Since : 0 Minutes EMA (Bad samples percent) : 75.234567 Alarm Standing : true Monitored Paths : Path ID : 1702032497450001538 Raise Alarm Above : 70 Clear Alarm Below : 50 Monitoring Approach : MODERATE Current Bucket Stats : Monitored Paths : Path ID : 1702032497450001538 Bad Paths : Path ID : 1702032497450001538 First Sample At : 11 Dec 2023 09:27:45 Last Sample At : 11 Dec 2023 09:30:06 Previous Bucket Stats : NA Monitored Paths : Path ID : 1702032497450001538 Bad Paths : Path ID : 1702032497450001538 First Sample At : 11 Dec 2023 09:30:10 Last Sample At : 11 Dec 2023 09:44:55
The following example displays a summary for the same circuit:
inspect performance-policy incidents type link-quality circuit=1697698664341010637 summary ======================================== Circuit ID : 1697698664341010637 ======================================== Policy Rule : Filters (1701923892838000737) Policy Set : Hello (1701923835766003137) EMA (Bad samples percent) : 0.000000 Alarm Standing : false Monitored Paths : Path ID : 1702032497450001537
The following example displays incident details for a CPU system health incident:
inspect performance-policy incidents type system-health health-type=cpu details System Type : CPU ======================================== Policy Rule : CPUUtilization123 (1712646980313000337) Policy Set : TestPolicySet (1711686637498019637) Idle Since : 0 Minutes EMA (Bad samples percent) : 0.000000 Alarm Standing : false Raise Alarm Above : 70 Clear Alarm Below : 50 Monitoring Approach : MODERATE Current Bucket Stats : Bad Samples : Sample : 33.757069 % Sample : 33.284089 % First Sample At : 25 Apr 2024 13:59:35 Last Sample At : 25 Apr 2024 14:00:35 Previous Bucket Stats : NA

Output Fields

  • Circuit ID / System Type: The identifier of the circuit or system resource being monitored.
  • Policy Rule: The performance policy rule name and ID associated with the incident.
  • Policy Set: The policy set the rule belongs to.
  • Idle Since: The number of minutes since the last threshold event.
  • EMA (Bad samples percent): The exponential moving average of the percentage of bad samples, used to smooth threshold evaluations.
  • Alarm Standing: Whether an alarm is currently active for this incident.
  • Monitored Paths: The path IDs being evaluated for this circuit.
  • Raise Alarm Above: The threshold percentage above which the alarm is raised.
  • Clear Alarm Below: The threshold percentage below which the alarm is cleared.
  • Monitoring Approach: The sensitivity setting used for threshold evaluation.
  • Current / Previous Bucket Stats: Sample data collected in the current and previous monitoring windows, including first and last sample timestamps.

Troubleshooting

ConditionPossible CauseAction
Alarm Standing is true but there is no visible traffic impactThe EMA crossed the raise threshold based on recent samples, but conditions may already be improvingMonitor the EMA value across successive runs; check Current Bucket Stats to see whether recent samples are trending below the clear threshold
Idle Since shows a large number of minutes for an alarm that is still standingNo new threshold events occurred in that window; the device raised the alarm earlier and it has not clearedVerify that conditions have genuinely improved below the clear threshold; re-run with details to check recent bucket samples
Previous Bucket Stats shows NA for an active ruleThe device created the incident within the current monitoring window and no previous window data exists yetThis is expected for recently configured rules or newly observed circuits; wait for the monitoring window to complete before interpreting trend data