: Azure Health Monitoring
Focus
Focus

Azure Health Monitoring

Table of Contents

Azure Health Monitoring

Use the Azure Health Monitoring daemon to perform health checks and notify you about these events.
PA-VMs on Azure are susceptible to unrecoverable outcomes due to backend maintenance events outside of the firewalls' control. These silent maintenance events, such as hotplug of network interfaces (“hotplug events”), live host migrations, and updates to NIC drivers, hypervisors, or networking, can lead to unrecoverable firewall states. The Azure health monitoring daemon focuses on performing checks, notifying customers about these occurrences, and offering remediation through graceful failover.
Azure health monitoring daemon detects the following events and states and performs a default remediation or logging action in accordance to your setting of the auto-remediation bootstrap value.
Azure health monitoring daemon checks for the following:
  • Events:
    • Events of type Freeze on Azure scheduled event service.
    • Hotplug events characterized by rapid de-association and re-association of the NIC on any of the data plane interfaces on the Azure host.
  • States:
    • Any configured dataplane interfaces are down
    • System runlevel running state is false
    • Any dataplane interface did not receive DHCP served IP
    • Management link is down
    • License status is unlicensed
    • Thermite certificate is absent
    • No panorama connection or failed to register
    • Accelerated networking is not enabled

How Does the Azure Health Monitoring Function?

The Azure health monitoring daemon periodically runs checkers to determine the status of the VM-Series firewall and marks the health checker status (pass, fail, or unknown) in the log. Critical failures generate syslog entries. In the event of an unhealthy instance, the Azure scheduled event triggers any of the following remediation methods:
  • Auto-remediate for all: Easier solution, results in a remediation action for only scheduled events with event type freeze, detected at runtime.
  • DIY using syslog: You can decide on remediation by using syslog events and adding your own workflows to remediate the event.

Remediation

Prerequisite:
To enable auto-remediation, configure custom data at the time of deployment with the following key-value pair:
health-auto-remedy=true
Actions summary for events and states for different auto-remediation settings:
Bootstrap value (custom data)Freeze event is detected
health-auto-remedy=trueActive → Passive failover (how HA pairs)
Healthprobe failure for load balancer fronted FWs
health-auto-remedy=false
Or bootstrap parameter is absent
FW logs an event in syslogs
Remediation is currently available on high availability and load balancer based Azure scheduled events:
HA Pair Architectures:
If you have an active-passive system and if your active system detects a scheduled event is posted, then the health monitor proactively fails over the active to the passive. A freeze event is detected and a suspend followed by a functional command is triggered on the firewall where the scheduled event is detected. This shifts the firewall from active to passive.
Palo Alto Network suggests that you have your HA pair set up across different availability zones or regions to avoid the HA paired firewalls being impacted by the same event.
Load Balancer Based Architectures:
In case of a freeze event, the health probe is failed. The health check packets are based on a fixed source IP which can be used to drop health check packets. The traffic managed by the load balancer will be stopped and the load balancer will then have to re-balance the traffic. After the event is cleared, the heath probe is re-enabled and the backend instance will show up as healthy again.
Logging summary for events and states:
Hotplug events always result in an event in syslogs. Detection of any of the states listed above will only result in an event firewall logs to vmhost_health.log. If accelerated networking is not enabled, it will result in a syslog event.