Focus

Failover

Table of Contents

LACP and LLDP Pre-Negotiation for Active/Passive HA

Failover

Failover from one HA peer to another occurs for a number of reasons; you can use link or path monitoring to trigger a failover.

When a failure occurs on one firewall and the peer in the HA pair (or a peer in the HA cluster) takes over the task of securing traffic, the event is called a failover. A failover is triggered, for example, when a monitored metric on a firewall in the HA pair fails. The metrics that the firewall monitors for detecting a firewall failure are:

Heartbeat Polling and Hello messages
The firewalls use hello message and heartbeats to verify that the peer firewall is responsive and operational. Hello messages are sent from one peer to the other at the configured Hello Interval to verify the state of the firewall. The heartbeat is an ICMP ping to the HA peer over the control link, and the peer responds to the ping to establish that the firewalls are connected and responsive. By default, the interval for the heartbeat is 1000 milliseconds. A ping is sent every 1000 milliseconds and if there are three consecutive heartbeat losses, a failovers occurs. For details on the HA timers that trigger a failover, see HA Timers.
Link Monitoring
You can specify a group of physical interfaces that the firewall will monitor (a link group) and the firewall monitors the state of each link in the group (link up or link down). You determine the failure condition for the link group: Any link down or All links down in the group constitutes a link group failure (but not necessarily a failover).
You can create multiple link groups. Therefore, you also determine the failure condition of the set of link groups: Any link group fails or All link groups fail, which determines when a failover is triggered. The default behavior is that failure of Any one link in Any link group causes the firewall to change the HA state to non-functional (or to tentative state in active/active mode) to indicate a failure of a monitored object.
Path Monitoring
You can specify a destination IP group of IP address that the firewall will monitor. The firewall monitors the full path through the network to mission-critical IP addresses using ICMP pings to verify reachability of the IP address. The default interval for pings is 200ms. An IP address is considered unreachable when 10 consecutive pings (the default value) fail. You specify the failure condition for the IP addresses in a destination IP group: Any IP address unreachable or All IP addresses unreachable in the group. You can specify multiple destination IP groups for a path group for a virtual wire, VLAN, or virtual router; you specify the failure condition of destination IP groups in a path group: Any or All, which constitutes a path group failure. You can configure multiple virtual wire path groups, VLAN path groups, and virtual router path groups.
You also determine the global failure condition: Any path group fails or All path groups fail, which determines when a failover is triggered. The default behavior is that Any one of the IP addresses becoming unreachable in Any destination IP group in Any virtual wire, VLAN, or virtual router path group causes the firewall to change the HA state to non-functional (or to tentative state in active/active mode) to indicate a failure of a monitored object.

In addition to the failover triggers listed above, a failover also occurs when the administrator suspends the firewall or when preemption occurs.

On PA-3200 Series, PA-5200 Series, and PA-7000 Series firewalls, a failover can occur when an internal health check fails. This health check is not configurable and is enabled to monitor the critical components, such as the FPGA and CPUs. Additionally, general health checks occur on any platform, causing failover.

The following describes what occurs in the event of a failure of a Network Processing Card (NPC) on a PA-7000 Series firewall that is a member of an HA cluster:

If the NPC that is being used to hold the HA clustering session cache (a copy of the other members’ sessions) goes down, the firewall goes non-functional. When this occurs, the session distribution device (such as a load balancer) must detect that the firewall is down and distribute session load to the other members of the cluster.
If the NPC of a cluster member goes down and no link monitoring or path monitoring was enabled on that NPC, the PA-7000 Series firewall member will stay up, but with a lower capacity because one NPC is down.
If the NPC of a cluster member goes down and link monitoring or path monitoring was enabled on that NPC, the PA-7000 Series firewall will go non-functional and the session distribution device (such as a load balancer) must detect that the firewall is down and distribute session load to the other members of the cluster.

Device Priority and Preemption

LACP and LLDP Pre-Negotiation for Active/Passive HA