Use the Azure Health Monitoring daemon to perform health checks and notify you about
these events.
PA-VMs on Azure are susceptible to unrecoverable outcomes due to backend
maintenance events outside of the firewalls' control. These silent maintenance events,
such as hotplug of network interfaces (“hotplug events”), live host migrations, and
updates to NIC drivers, hypervisors, or networking, can lead to unrecoverable firewall
states. The Azure health monitoring daemon focuses on performing checks, notifying
customers about these occurrences, and offering remediation through graceful
failover.
Azure health monitoring daemon detects the following events and states and
performs a default remediation or logging action in accordance to your setting of the
auto-remediation bootstrap value.
Azure health monitoring daemon checks for the following:
Events:
Events of type Freeze on Azure scheduled event
service.
Hotplug events characterized by rapid
de-association and re-association of the NIC on any of the data
plane interfaces on the Azure host.
States:
Any configured dataplane interfaces are down
System runlevel running state is false
Any dataplane interface did not receive DHCP served IP
Management link is down
License status is unlicensed
Thermite certificate is absent
No panorama connection or failed to register
Accelerated networking is not enabled
How Does the Azure Health Monitoring Function?
The Azure health monitoring daemon periodically runs checkers to determine the status
of the VM-Series firewall and marks the health checker status (pass, fail, or
unknown) in the log. Critical failures generate syslog entries. In the event of an
unhealthy instance, the Azure scheduled event triggers any of the following
remediation methods:
Auto-remediate for all: Easier solution, results in a
remediation action for only scheduled events with event type
freeze, detected at runtime.
DIY using syslog: You can decide on remediation by using syslog
events and adding your own workflows to remediate the event.
Remediation
Prerequisite:
To enable auto-remediation, configure custom data at the time of deployment
with the following key-value pair:
health-auto-remedy=true
Actions summary for events and states for different auto-remediation settings:
Bootstrap value (custom data)
Freeze event is detected
health-auto-remedy=true
Active → Passive failover (how HA pairs)
Healthprobe
failure for load balancer fronted FWs
health-auto-remedy=false
Or bootstrap parameter is absent
FW logs an event in syslogs
Remediation is currently available on high availability and load balancer based Azure
scheduled events:
HA Pair Architectures:
If you have an active-passive system and if your active system detects a
scheduled event is posted, then the health monitor proactively fails over the active
to the passive. A freeze event is detected and a suspend followed by a functional
command is triggered on the firewall where the scheduled event is detected. This
shifts the firewall from active to passive.
Palo Alto Network suggests that you have your HA pair set up
across different availability zones or regions to avoid the HA paired firewalls
being impacted by the same event.
Load Balancer Based Architectures:
In case of a freeze event, the health probe is failed. The health check packets are
based on a fixed source IP which can be used to drop health check packets. The
traffic managed by the load balancer will be stopped and the load balancer will then
have to re-balance the traffic. After the event is cleared, the heath probe is
re-enabled and the backend instance will show up as healthy again.
Logging summary for events and states:
Hotplug events always result in an event in syslogs. Detection of any of the states
listed above will only result in an event firewall logs to
vmhost_health.log. If accelerated networking is not enabled, it
will result in a syslog event.