Use the Azure Health Monitoring daemon to perform health checks and notify you about
these events.
PA-VMs on Azure are susceptible to unrecoverable outcomes due to backend
maintenance events outside of the firewalls' control. These silent events, such as
hotplug events, live migrations, and updates to NIC drivers, hypervisors, or networking,
can lead to unrecoverable firewall states. The Azure health monitoring daemon focuses on
performing checks, notifying customers about these occurrences, and offering remediation
through graceful failover.
Azure health monitoring daemon checks for the following:
On Azure:
Accelerated networking is enabled or not enabled (not
enabled generates syslog)
Hot plug
Azure scheduled event service
Panorama connection and registration
The Azure health monitoring daemon also checks for:
Interface links to confirm if all the configured links are up
System runlevel to check the running state (must be set to
true)
Interface DHCP if applicable (if all dataplane interfaces received
DHCP served IP)
Management link (is up)
License status (is licensed)
Thermite certificate (is present)
The Azure health monitoring daemon does not provide auto remediation
for these health checks.
How Does the Azure Health Monitoring Function?
The Azure health monitoring daemon periodically runs checkers to determine the status
of the VM-Series firewall and marks the health checker status (pass, fail, or
unknown) in the log. Critical failures generate syslog entries. In the event of an
unhealthy instance, the Azure scheduled event triggers any of the following
remediation methods:
Auto-remediate for all: easier solution, results in a
remediation action for only scheduled events with event type
freeze, detected at runtime.
DIY using syslog: You can decide on remediation by using syslog
events and adding your own workflows to remediate the event.
Remediation
Prerequisite:
To enable auto-remediation, configure custom data at the time of deployment
with the following key-value pair:
health-auto-remedy=true
Bootstrap value (custom data)
Firewall action if Freeze event is detected
Firewall action if hotplug is detected
Firewall action if failure due device state is detected
health-auto-remedy=true
Active → Passive failover (how HA pairs)
Healthprobe
failure for load balancer fronted FWs
Firewall logs an event in syslogs and new
FW logs an event to vm_health.log - new log
file
health-auto-remedy=false
Or bootstrap parameter is absent
FW logs an event in syslogs
FW logs an event in syslogs
FW logs an event to vm_health.log - new log
file
Remediation is currently available on high availability and load balancer based Azure
scheduled events:
HA Pair Architectures:
If you have an active-passive system and if your active system detects a
scheduled event is posted, then the health monitor proactively fails over the active
to the passive. A freeze event is detected and a suspend followed by a functional
command is triggered on the firewall where the scheduled event is detected. This
shifts the firewall from active to passive.
Palo Alto Network suggests that you have your HA pair set up
across different availability zones or regions to avoid the HA paired firewalls
being impacted by the same event.
Load Balancer Based Architectures:
In case of a freeze event, the health probe is failed. The health check packets are
based on a fixed source IP which can be used to drop health check packets. The
traffic managed by the load balancer will be stopped and the load balancer will then
have to re-balance the traffic. After the event is cleared, the heath probe is
re-enabled and the backend instance will show up as healthy again.