Failover
Focus
Focus

Failover

Table of Contents

Failover

Failover from one HA peer to another occurs for a number of reasons; you can use link or path monitoring to trigger a failover.
When a failure occurs on one firewall and the peer in the HA pair (or a peer in the HA cluster) takes over the task of securing traffic, the event is called a failover. A failover is triggered, for example, when a monitored metric on a firewall in the HA pair fails. The metrics that the firewall monitors for detecting a firewall failure are:
  • Heartbeat Polling and Hello messages
    The firewalls use hello message and heartbeats to verify that the peer firewall is responsive and operational. Hello messages are sent from one peer to the other at the configured Hello Interval to verify the state of the firewall. The heartbeat is an ICMP ping to the HA peer over the control link, and the peer responds to the ping to establish that the firewalls are connected and responsive. By default, the interval for the heartbeat is 1000 milliseconds. A ping is sent every 1000 milliseconds and if there are three consecutive heartbeat losses, a failovers occurs. For details on the HA timers that trigger a failover, see HA Timers.
  • Link Monitoring
    You can specify a group of physical interfaces that the firewall will monitor (a link group) and the firewall monitors the state of each link in the group (link up or link down). You determine the failure condition for the link group:
    Any
    link down or
    All
    links down in the group constitutes a link group failure (but not necessarily a failover).
    You can create multiple link groups. Therefore, you also determine the failure condition of the set of link groups:
    Any
    link group fails or
    All
    link groups fail, which determines when a failover is triggered. The default behavior is that failure of
    Any
    one link in
    Any
    link group causes the firewall to change the HA state to non-functional (or to tentative state in active/active mode) to indicate a failure of a monitored object.
  • Path Monitoring
    You can specify a destination IP group of IP address that the firewall will monitor. The firewall monitors the full path through the network to mission-critical IP addresses using ICMP pings to verify reachability of the IP address. The default interval for pings is 200ms. An IP address is considered unreachable when 10 consecutive pings (the default value) fail. You specify the failure condition for the IP addresses in a destination IP group:
    Any
    IP address unreachable or
    All
    IP addresses unreachable in the group. You can specify multiple destination IP groups for a path group for a virtual wire, VLAN, or virtual router; you specify the failure condition of destination IP groups in a path group:
    Any
    or
    All
    , which constitutes a path group failure. You can configure multiple virtual wire path groups, VLAN path groups, and virtual router path groups.
    You also determine the global failure condition:
    Any
    path group fails or
    All
    path groups fail, which determines when a failover is triggered. The default behavior is that
    Any
    one of the IP addresses becoming unreachable in
    Any
    destination IP group in
    Any
    virtual wire, VLAN, or virtual router path group causes the firewall to change the HA state to non-functional (or to tentative state in active/active mode) to indicate a failure of a monitored object.
In addition to the failover triggers listed above, a failover also occurs when the administrator suspends the firewall or when preemption occurs.
On PA-3200 Series, PA-5200 Series, and PA-7000 Series firewalls, a failover can occur when an internal health check fails. This health check is not configurable and is enabled to monitor the critical components, such as the FPGA and CPUs. Additionally, general health checks occur on any platform, causing failover.
The following describes what occurs in the event of a failure of a Network Processing Card (NPC) on a PA-7000 Series firewall that is a member of an HA cluster:
  • If the NPC that is being used to hold the HA clustering session cache (a copy of the other members’ sessions) goes down, the firewall goes non-functional. When this occurs, the session distribution device (such as a load balancer) must detect that the firewall is down and distribute session load to the other members of the cluster.
  • If the NPC of a cluster member goes down and no link monitoring or path monitoring was enabled on that NPC, the PA-7000 Series firewall member will stay up, but with a lower capacity because one NPC is down.
  • If the NPC of a cluster member goes down and link monitoring or path monitoring was enabled on that NPC, the PA-7000 Series firewall will go non-functional and the session distribution device (such as a load balancer) must detect that the firewall is down and distribute session load to the other members of the cluster.

Recommended For You