Incidents and Alerts
Learn about the incidents and alerts managed in Prisma SD-WAN.
Generate alerts and incidents when the system reaches system-defined or customer-defined
thresholds or there is a fault in the system.
Where Can I Use This? | What Do I Need? |
Prisma SD-WAN generates alerts and incidents when the system
reaches system-defined or customer-defined thresholds or there is a fault in the system.
You will see the Overview tab that lists the
Category-wise events that are
Critical, Warning, or
Informational in nature. It also displays the
Incidents by Priority, Your Top
Incidents, and Your Top Alerts.
Use the Incidents and Alerts to troubleshoot the system.
An alert may or may not be an indication of a fault in the network. An alert is raised
when the system reaches system-defined or customer-defined thresholds.
An incident is an indication of a fault in the system. Incidents are raised and cleared
and vary in severity:
Critical—Whole or part of a network is down and requires immediate action.
Warning—Impacts the network and needs immediate attention.
Informational—Network is degraded and needs attention soon.
Use the
Settings tab to
Setup Incident Policies to manage event code suppression based on the
specified classifications and action attributes configured. You can use incident policy
rules to suppress or escalate incidents that arise during a scheduled time period. In
addition, you can also change the default priority of system generated incidents to a
priority level that is more aligned with your business requirements.
Learn about the incidents and alerts generated in the Prisma SD-WAN system.
Filter Alerts and Incidents
Filter and sort alerts and incidents by various parameters so that you
can take appropriate action on the events that require attention. Select the
Filter widget on the
Troubleshooting page to filter alerts and incidents.
Filter and sort alerts and incidents based on the following
criteria:
Acknowledge indicates that you are aware of the incident but
may not be taking any action at this time. You Acknowledge
only unresolved incidents. Acknowledging an incident enables you to display and
focus on incidents that require attention. You can
select one or more incidents (bulk acknowledge) for
Acknowledge.
Unacknowledge indicates that you
are aware of the incident but may not be taking any action at this time. You
Unacknowledge only acknowledged incidents. You can select
one or more incidents for Unacknowledge.
Filter By—Filter alerts and incidents by their status:
Show Resolved—Displays only resolved incidents when the fault causing
the incident is removed.
Include Acknowledged—Displays acknowledged and unacknowledged
incidents.
Show Only Acknowledged—Displays only acknowledged incidents.
Show Only Suppressed—Displays only suppressed incidents.
Include Suppressed—Displays suppressed and unsuppressed
incidents.
Only incidents are filtered as acknowledged and suppressed. Only Acknowledged incidents are
filtered and you can unacknowledge those incidents.
Sort By—Sort alerts and incidents by time or severity to display the
latest alerts and incidents first.
Sites—Sort alerts and incidents by sites to display based on:
Site—Name or address search.
Viewing—Traffic volume, initiation failure, transaction failure.
Site type—Branch or data center.
Admin state of the site—Active, monitor or disabled.
Severity—Sort alerts and incidents based on the following severity
categories:
Critical—Whole or part of a network is down and requires immediate
action.
Warning—Impacts the network and needs immediate attention.
Informational—Degrades the network and needs attention soon.
Priority—Sort alerts and incidents based on the priority level:
Priority 1 (P1)
Priority P2 (P2)
Priority P3 (P3)
Priority P4 (P4)
Priority P5 (P5)
Category—Sort alerts and incidents based on the following
options:
Network—Indicates network faults.
Device—Indicates device hardware, software, interface, or
registration issues.
Cellular—Indicates cellular issues.
Application—Indicates application issues.
Policy—Indicates policy issues.
Branch HA—Indicates spoke HA issues.
Authentication—Indicates authentication failures.
User ID—Indicates User ID issues.
Code—Sort alerts and incidents based on the alert and incident event
codes.
Time—Sort alerts by time to display the latest alerts and incidents
first.
Correlation ID—Correlation ID is a system-generated ID for a raised
incident. An incident is associated with raise and clear states. There can
be multiple incidents with the same event code in either a raised or cleared
state at any given time. Using the correlation ID, you may distinguish
between incidents with the same event code. When an incident is cleared, the
correlation ID indicates that the specific incident is cleared. This ID is
always associated with an incident even if the incident is cleared or
resolved.
Event Correlation of Incidents
The event engine performs multiple functions such as incident
correlation, suppression, and escalation depending on the network conditions and the
administrator configured event policy rules. This improves the operational
efficiency of the app-fabric by automatically correlating incidents into an event
and the comprehensive event framework control granted by setting the event
policies.
The controller analyzes the incoming incidents from the ION devices to
determine if they are related and then it aggregates the incidents into a single
incident in real time. For example, if the controller receives multiple VPN down
incidents, the controller analyzes the incident in real time, determines if they are
related, and generates a single Secure Fabric Link incident for the event, while
suppressing the original list of incidents.