Troubleshoot Alarms

Lets see how to troubleshoot the alarms in Prisma SD-WAN.
Follow the troubleshooting steps for each alarm in the order listed below. Each step is intended to resolve the issue. Proceed to the next step only if the previous step did not resolve the problem.
For each alarm raised on the web interface, you can select to
Troubleshoot
to follow a step-by-step troubleshooting procedure. If the issue persists, select
Go to Support
to create a support ticket. A Palo Alto Networks Support executive will contact you.
Alarm Code
Troubleshooting Steps
APPLICATION_CUSTOM_RULE_CONFLICT
  1. Log in to the Prisma SD-WAN web interface, select
    Policies > Stacked Policies > Apps
    .
  2. Locate the required application for troubleshooting.
  3. Click the ellipsis under
    Action
    and select
    View
    .
  4. If the App has an override, you may need to access the override rules to make the shift, possibly before clicking the ellipsis.
  5. You can make changes on the App configuration screen.
DEVICEHW_DISKENC_SYSTEM
This event code was raised when one disk partition failed to convert into an encrypted partition during the last device upgrade.
  1. Log in to the Prisma SD-WAN web interface, select
    Map > Claimed devices
    .
  2. Locate your device and click
    Upgrade
    .
  3. Select the required software version to upgrade.
    After the upgrade is complete, repeat the same steps and downgrade to the target version. If the target version is the latest and you still see the error, upgrade to the version before the latest and then upgrade to the latest again.
DEVICEHW_DISKUTIL_PARTITIONSPACE
This event code is raised due to high disk capacity utilization. To verify, follow the steps:
  1. SSH to the device and run the command:
    dump disk info
  2. Check the available space for the attached volumes and contact the Palo Alto Networks Support team to clear the utilized volume.
DEVICEHW_INTERFACE_ ERRORS
This event code is raised due to a faulty cable, SFP, port, or patch panel connection.
  1. Inspect and replace the cable. Ensure that the correct cable is used.
  2. If the port requires a transceiver, replace the SFP. Ensure that the SFP is correct.
  3. Try a different patch panel port if you are using a patch panel.
  4. Attempt using a different port on the ION device.
  5. Inspect the device and port that the ION device connects to. Sometimes the issue could be due to the other hardware device.
DEVICEHW_INTERFACE_HALFDUPLEX
This event code is raised due to issues with port configuration and cable. First, verify the cable connection and swap the cable. Second, check the port and the remote end (auto or hard-coded).To change the port configuration:
  1. Log in to the Prisma SD-WAN web interface, select
    Map > Claimed devices
    .
  2. Locate your device and click it.
  3. Go to
    Interfaces
    and select the interface with half/duplex.
  4. Select
    Advanced Options - PHYSICAL
    and change it.
  5. Verify if the connection is UP and has a correct configuration with full-duplex
DEVICEHW_INTERFACE_DOWN
Interface down requires an assessment to see if the alarm is intentional or real.
  1. SSH to the device and run the commands:
    dump interface status <port number>
    dump interface config <port number>
  2. Check if the port is admin up and connected/not connected.
DEVICEHW_MEMUTIL_SWAPSPACE
To verify if High Memory Utilization is happening in real-time:
  1. Log in to the Prisma SD-WAN web interface, select
    Activity > System
    .
  2. Select the ION device and confirm the free memory.
  3. SSH to the device and run the
    inspect memory summary
    command to verify the memory.
DEVICEHW_POWER_LOST
This event code is raised by an unplugged or a loose power cable.
  1. Try using a new cable or re-seating the existing cable. If this does not help, replace the power supply unit (PSU). Note down which PSU failed for devices that have dual PSUs. Order a replacement PSU from Palo Alto Networks for the particular ION device.
  2. When the new PSU is available on hand at the device's site, pull out and replace the affected PSU.
DEVICEIF_ADDRESS_DUPLICATE
If static IP address configuration is used, confirm that the IP address used is not explicitly assigned to another device or within a range already allocated by a DHCP server.
DEVICESW_CONCURRENT_FLOWLIMIT_EXCEEDED
To verify the concurrent flows:
  1. Log in to the Prisma SD-WAN web interface, select
    Activity > Network
    .
  2. Select the ION device and verify the concurrent flows generated by UDP/TCP packets and time.
  3. Verify in
    Flow Browser
    for the same flow, the source/destination IP address initiating multiple sessions. Then, based on the source/destination IP address, confirm a network scanning in the environment.
DEVICESW_DHCPRELAY_ RESTART
Process stopped requires further investigation. Contact Palo Alto Networks Support.
DEVICESW_DHCPSERVER_ERRORS
  1. Check interfaces configuration and state.
  2. Verify that at least one device interface is active and configured with static IP configuration.
  3. Check DHCP server configuration.
  4. Verify that the subnet address does not overlap across the site.
  5. If custom options are configured, verify that the custom option definition and option value are compatible with each other.
  6. If the problem still persists, contact Palo Alto Networks Support.
DEVICESW_DHCPSERVER_RESTART
Process stopped requires further investigation. Contact Palo Alto Networks Support.
DEVICESW_DISCONNECTED_FROM_CONTROLLER
  1. Check if there is any network connectivity problem at the site. Look for invalid interface configurations, interface alarms or network alarms. If present, clear those faults.
  2. Check if there are any process alarms which indicate that processes are stopped. If present, take action on those faults.
  3. Check if any firewall rules both on the ION device (if used) and external to the ION device prevent communication between the ION device and controller. If present, fix those rules.
  4. Ensure that the controller is not undergoing maintenance. If notification indicates maintenance activity, wait until the activity is completed.
  5. If none of the choices apply, please open a case with Palo Alto Networks Support.
DEVICESW_FPS_LIMIT_ EXCEEDED
  1. Check the flow browser and identify the rogue client and isolate it.
  2. If the problem still persists, contact Palo Alto Networks Support.
DEVICESW_GENERAL_PROCESSRESTART
Process restart is an alert and does not require immediate action. If several process restart alerts repeat in a given hour or day, contact Palo Alto Networks Support.
DEVICESW_GENERAL_PROCESSSTOP
Process stopped requires further investigation. Contact Palo Alto Networks Support.
DEVICESW_IMAGE_INCOMPATIBLE
  1. Check the software version of the device on the
    Device List
    screen.
  2. Click
    Upgrade
    and check if the device's software version is present in the available software list.
  3. If the software version on the device is not on the available software list, upgrade or downgrade the device to an available software version. After successful software change, issue
    Recheck SW Version
    command from the device list for that device.
  4. If the software version is not on the available software list but the software version on the device is the desired software version for your network, contact Palo Alto Networks Support for further instructions.
DEVICESW_LICENSE_VERIFICATION_FAILED
  1. Obtain additional licenses or free up unused licenses and then bring up the virtual ION device.
  2. If the problem still persists, contact Palo Alto Networks Support.
DEVICESW_MONITOR_DISABLED
System monitoring disabled requires further investigation.
  1. Attempt a device reboot to clear the alarm.
  2. If system monitoring disabled alarm is raised again after a reboot, contact Palo Alto Networks Support.
DEVICESW_NTP_NO_SYNC
Could not reached the configured NTP server. Contact Palo Alto Networks Support.
DEVICESW_SNMP_AGENT_ RESTART
Process stopped requires further investigation. Contact Palo Alto Networks Support.
DEVICESW_SYSTEM_BOOT
Device reboot is an alert and may need further investigation.
  1. If the device rebooted due to operations performed including forced reboot by administrator or a software upgrade, the alert is normal and for informational purposes only.
  2. If the device rebooted itself without any administrator operation reasons, contact Palo Alto Networks Support.
DEVICESW_TOKEN_VERIFICATION_FAILED
  1. Generate a new token and use that token in the creation of virtual ION device metadata.
  2. If the problem still persists, contact Palo Alto Networks Support.
DEVICESW_CONNTRACK_FLOWLIMIT_EXCEEDED
  1. Use the device toolkit to dump and inspect the entries in the connection tracking table.
  2. Contact Palo Alto Networks Support.
NAT_POLICY_LEGACY_ALG_CONFIG_OVERRIDE
Contact Palo Alto Networks Support to remove the legacy configuration from the device.
NAT_POLICY_STATIC_NATPOOL_OVERRUN
Make sure that traffic selector has a 1:1 mapping for the converted NATPOOL range to CIDR.
NETWORK_DIRECTINTERNET_DOWN (Branch sites only)
  1. Check if there are any interface down alarms on the interfaces connecting to the internet circuit.
  2. Log in to the device via
    SSH/Remote access
    .
  3. Verify internet interface status and reachability on the interface by pinging public IP addresses.
  4. Check the ARP entry of the gateway IP address on the internet interface by running the inspect system arp command.
  5. Capture packets on the internet interface and verify the packet flow.
  6. Check the internet modem, if present, to ensure that it is powered up. Then, as a possible recovery step, power cycle the modem.
  7. If the problem persists, contact Palo Alto Networks Support.
NETWORK_DIRECTPRIVATE_DOWN (Branch sites only)
  1. Check if there are any interface down alarms on the interfaces connecting to private WAN routing devices. Then, follow interface troubleshooting and alarm clearance procedures for that interface.
  2. Log in to the device via
    SSH/Remote access
    .
  3. Verify interface status and reachability by pinging the gateway IP address.
  4. Check if connectivity between the remote office and the data center exists by pinging a data center’s private WAN interface(s) IP addresses from the affected site.
  5. Verify BFD connectivity between the remote office and the data center private WAN interface(s) IP addresses.
  6. Capture packets on the Private wan interface and verify the packet flow. If the problem still persists, contact Palo Alto Networks Support.
NETWORK_POLICY_RULE_CONFLICT
Update the two conflicting policy rules identified or remove one of the rules to ensure that there is no conflict.
NETWORK_POLICY_RULE_DROPPED
Update the identified policy rule to remove some applications or remove some source and destination prefixes in the rule.
NETWORK_PRIVATEWAN_DEGRADED (DC Sites only)
  1. Verify that the prefixes configured on the remote site are correct.
  2. Verify that the BGP configuration on the WAN edge router is such that routes sent to the Palo Alto Networks data center device are received from the provider without any summarization.
NETWORK_PRIVATEWAN_UNREACHABLE (DC Sites only)
  1. Check if there are any interface down alarms on the interfaces connecting to private WAN routing devices. Follow interface troubleshooting and alarm clearance procedures for that interface.
  2. Check if local network endpoints connected to the affected ION device are reachable by pinging the interface through which the private WAN traffic is supposed to traverse.
  3. For a data center site, check for PEERING_EDGE_DOWN alarms. Follow PEERING_EDGE_DOWN troubleshooting and alarm clearance steps.
  4. Check if connectivity between the remote office and the data center exists by pinging the private WAN interface(s) from the affected site. From a data center site, choose one or more remote office sites to ping to.
  5. If the problem still persists, contact Palo Alto Networks Support.
PEERING_BGP_DOWN
  1. Check if there are any interface down faults on the interfaces connecting to peer routing devices. Follow interface troubleshooting and fault clearance procedures for that interface.
  2. Check if local network endpoints connected to the affected ION device are reachable using a ping operation using the interface through which traffic to peer routing device is supposed to traverse.
  3. Check and validate configuration on the peer routing device and check for interface and routing faults.
  4. If none of the choices apply, please open a case with Palo Alto Networks support
PRIORITY_POLICY_RULE_CONFLICT
Update the two conflicting policy rules identified or remove one of the rules to ensure that there is no conflict.
PRIORITY_POLICY_RULE_DROPPED
Update the identified policy rule to remove some applications or remove some source and destination prefixes in the rule.
SITE_CIRCUIT_ABSENT_FOR_POLICY
Assign the labels that have been reported in the alarm as missing to the site WAN interface at the site.
SPOKEHA_CLUSTER_DEGRADED
Check the spoke cluster switch over event history to find out the device for which the effective priority has become zero. If so, then check:
  • If any of the tracked interfaces of the device are down.
  • If any of the system services for the device are down.
SPOKEHA_CLUSTER_DOWN
Check the spoke cluster switch over event history to find out the device for which the effective priority has become zero. If so, then check:
  • If any of the tracked interfaces of the device are down.
  • If any of the system services for the device are down.
SPOKEHA_MULTIPLE_ACTIVE_DEVICES
  1. Check the operational state of the interfaces that are specified as the source interface for cluster operation to find out if they are up.
  2. If the interfaces on both devices are up, check the switch configurations to confirm the interfaces are in the same VLAN.
  3. Ping the IP address on the interface on one of the devices from the other device to confirm the connectivity between the devices.
SPOKEHA_STATE_UPDATE
If the device has become a backup device, check the device configuration, and alarms or alerts to find out:
  • If a failure condition caused the device to become a backup.
  • If another device with a higher priority became active in the cluster.
  • If the device configuration was updated to disable the device.

Recommended For You