: Azure Health Monitoring
Focus
Focus

Azure Health Monitoring

Table of Contents

Azure Health Monitoring

Use the Azure Health Monitoring daemon to perform health checks and notify you about these events.
PA-VMs on Azure are susceptible to unrecoverable outcomes due to backend maintenance events outside of the firewalls' control. These silent events, such as hotplug events, live migrations, and updates to NIC drivers, hypervisors, or networking, can lead to unrecoverable firewall states. The Azure health monitoring daemon focuses on performing checks, notifying customers about these occurrences, and offering remediation through graceful failover.
Azure health monitoring daemon checks for the following:
  • On Azure:
    • Accelerated networking is enabled or not enabled (not enabled generates syslog)
    • Hot plug
    • Azure scheduled event service
  • Panorama connection and registration
The Azure health monitoring daemon also checks for:
  • Interface links to confirm if all the configured links are up
  • System runlevel to check the running state (must be set to true)
  • Interface DHCP if applicable (if all dataplane interfaces received DHCP served IP)
  • Management link (is up)
  • License status (is licensed)
  • Thermite certificate (is present)
The Azure health monitoring daemon does not provide auto remediation for these health checks.

How Does the Azure Health Monitoring Function?

The Azure health monitoring daemon periodically runs checkers to determine the status of the VM-Series firewall and marks the health checker status (pass, fail, or unknown) in the log. Critical failures generate syslog entries. In the event of an unhealthy instance, the Azure scheduled event triggers any of the following remediation methods:
  • Auto-remediate for all: easier solution, results in a remediation action for only scheduled events with event type freeze, detected at runtime.
  • DIY using syslog: You can decide on remediation by using syslog events and adding your own workflows to remediate the event.

Remediation

Prerequisite:
To enable auto-remediation, configure custom data at the time of deployment with the following key-value pair:
health-auto-remedy=true
Bootstrap value (custom data)Firewall action if Freeze event is detectedFirewall action if hotplug is detectedFirewall action if failure due device state is detected
health-auto-remedy=trueActive → Passive failover (how HA pairs)
Healthprobe failure for load balancer fronted FWs
Firewall logs an event in syslogs and newFW logs an event to vm_health.log - new log file
health-auto-remedy=false
Or bootstrap parameter is absent
FW logs an event in syslogsFW logs an event in syslogsFW logs an event to vm_health.log - new log file
Remediation is currently available on high availability and load balancer based Azure scheduled events:
HA Pair Architectures:
If you have an active-passive system and if your active system detects a scheduled event is posted, then the health monitor proactively fails over the active to the passive. A freeze event is detected and a suspend followed by a functional command is triggered on the firewall where the scheduled event is detected. This shifts the firewall from active to passive.
Palo Alto Network suggests that you have your HA pair set up across different availability zones or regions to avoid the HA paired firewalls being impacted by the same event.
Load Balancer Based Architectures:
In case of a freeze event, the health probe is failed. The health check packets are based on a fixed source IP which can be used to drop health check packets. The traffic managed by the load balancer will be stopped and the load balancer will then have to re-balance the traffic. After the event is cleared, the heath probe is re-enabled and the backend instance will show up as healthy again.