Failover
Failover from one HA peer to another occurs for a number
of reasons; you can use link or path monitoring to trigger a failover.
When a failure occurs on one firewall and the peer in
the HA pair (or a peer in the HA cluster) takes over the task of
securing traffic, the event is called a
failover. A
failover is triggered, for example, when a monitored metric on a
firewall in the HA pair fails. The metrics that the firewall monitors
for detecting a firewall failure are:
Heartbeat
Polling and Hello messages
The firewalls use hello message
and heartbeats to verify that the peer firewall is responsive and
operational. Hello messages are sent from one peer to the other
at the configured
Hello Interval to verify the state
of the firewall. The heartbeat is an ICMP ping to the HA peer over
the control link, and the peer responds to the ping to establish
that the firewalls are connected and responsive. By default, the
interval for the heartbeat is 1000 milliseconds. A ping is sent
every 1000 milliseconds and if there are three consecutive heartbeat
losses, a failovers occurs. For details on the HA timers that trigger
a failover, see
HA
Timers.
Link Monitoring
You
can specify a group of physical interfaces that the firewall will
monitor (a link group) and the firewall monitors the state of each link
in the group (link up or link down). You determine the failure condition
for the link group: Any link down or All links
down in the group constitutes a link group failure (but not necessarily
a failover).
You can create multiple link groups. Therefore,
you also determine the failure condition of the set of link groups: Any link
group fails or All link groups fail, which
determines when a failover is triggered. The default behavior is
that failure of Any one link in Any link
group causes the firewall to change the HA state to non-functional
(or to tentative state in active/active mode) to indicate a failure
of a monitored object.
Path Monitoring
You
can specify a destination IP group of IP address that the firewall
will monitor. The firewall monitors the full path through the network
to mission-critical IP addresses using ICMP pings to verify reachability
of the IP address. The default interval for pings is 200ms. An IP
address is considered unreachable when 10 consecutive pings (the
default value) fail. You specify the failure condition for the IP addresses
in a destination IP group: Any IP address
unreachable or All IP addresses unreachable
in the group. You can specify multiple destination IP groups for
a path group for a virtual wire, VLAN, or virtual router; you specify the
failure condition of destination IP groups in a path group: Any or All,
which constitutes a path group failure. You can configure multiple
virtual wire path groups, VLAN path groups, and virtual router path
groups.
You also determine the global failure condition: Any path
group fails or All path groups fail, which
determines when a failover is triggered. The default behavior is
that Any one of the IP addresses becoming unreachable
in Any destination IP group in Any virtual
wire, VLAN, or virtual router path group causes the firewall to
change the HA state to non-functional (or to tentative state in
active/active mode) to indicate a failure of a monitored object.
In addition to the failover triggers listed above, a failover
also occurs when the administrator suspends the firewall or when
preemption occurs.
On PA-3200 Series, PA-5200 Series, and PA-7000 Series firewalls,
a failover can occur when an internal health check fails. This health check
is not configurable and is enabled to monitor the critical components,
such as the FPGA and CPUs. Additionally, general health checks occur
on any platform, causing failover.
The following describes what occurs in the event of a failure
of a Network Processing Card (NPC) on a PA-7000 Series firewall
that is a member of an HA cluster:
- If the NPC that is being used to hold the HA clustering
session cache (a copy of the other members’ sessions) goes down,
the firewall goes non-functional. When this occurs, the session
distribution device (such as a load balancer) must detect that the
firewall is down and distribute session load to the other members
of the cluster.
- If the NPC of a cluster member goes down and no link monitoring
or path monitoring was enabled on that NPC, the PA-7000 Series firewall
member will stay up, but with a lower capacity because one NPC is
down.
- If the NPC of a cluster member goes down and link monitoring
or path monitoring was enabled on that NPC, the PA-7000 Series firewall
will go non-functional and the session distribution device (such
as a load balancer) must detect that the firewall is down and distribute
session load to the other members of the cluster.