NGFW Clusters
Focus
Focus

NGFW Clusters

Table of Contents

NGFW Clusters

NGFW clustering concepts for PA-7500 Series firewalls.
Learn about NGFW clustering before creating a cluster:

Benefits and Structure of an NGFW Cluster

Before describing an NGFW cluster, let's review the two legacy HA modes:
  • Active/Passive (A/P): One firewall actively manages traffic while the other is synchronized to the first firewall (in configuration and states) and is ready to transition to the active state if a failure occurs.
  • Active/Active (A/A): Both firewalls in the pair are active, process traffic, and work synchronously to handle session setup and session ownership. Both firewalls individually maintain session tables and synchronize to each other. The two firewalls support two separate routing domains.
For the PA-7500 Series firewalls in an NGFW cluster, the control planes (management planes) of the two firewalls are in active/passive mode where the passive firewall is fully synchronized to the active firewall in configuration and states. Additionally, each PA-7500 Series firewall has a maximum of seven data processing cards (DPC); each DPC has six data planes. The data planes of the two firewalls are in active/active mode where the firewall with the active management plane incorporates its own interfaces with all of the interfaces of the passive chassis. All of the interfaces of both chassis are representable and controllable on a single, centralized chassis, which is considered the leader or leader node. One node is elected the leader; the other firewall is a non-leader node. In an NGFW cluster, the firewalls are logically merged into one logical firewall from the control plane and management perspective. They have a single routing domain. The NGFW cluster replaces legacy HA pairs and legacy HA clustering, which aren't available on PA-7500 Series firewalls (whether in an NGFW cluster or not).
You must use Panorama to configure the PA-7500 Series firewalls for an NGFW cluster. The first node you assign to the cluster automatically becomes Node 1. After you assign a node to a cluster, you can't configure the node locally, but must use Panorama. All PA-7500 Series firewalls in a cluster should be placed in the same template stack.
NGFW cluster nodes must have the Advanced Routing Engine enabled; the legacy routing engine isn't supported.
Unlike interfaces outside of an NGFW cluster, interfaces on firewalls in a cluster include the Node ID at the beginning of the interface name. For example, node1:ethernet2/1 or node2:ethernet1/19.
NGFW clustering is independent of legacy HA pairs and legacy HA clusters. NGFW clustering is the only HA or clustering solution available on PA-7500 Series firewalls. However, the firewalls in the NGFW cluster use the Group ID that is used in legacy HA. The Group ID helps differentiate MAC addresses when two HA pairs (or an HA pair and an NGFW cluster) in the same Layer 2 network share MAC addresses. The Group ID is in the same location within the virtual MAC address for an NGFW firewall node as it is for a legacy HA firewall node.
Firewalls in an NGFW cluster do not support multiple virtual systems (multi-vsys). Even if you have a multi-vsys license installed on the PA-7500 Series firewalls in an NGFW cluster, those firewalls ignore the license. (The multi-vsys license is usable if the firewall becomes standalone.)

Inter-Firewall Links (IFLs)

PA-7500 Series firewalls in an NGFW cluster have a chassis interconnection through their HSCI interfaces, which function at Layer 2. The firewalls are back-to-back; HSCI-A ports on each firewall connect to each other, and HSCI-B ports on each firewall connect to each other. The connections between the HSCI interfaces are high-bandwidth, inter-firewall links (IFLs) that handle asymmetric traffic, along with cluster synchronization at a control and dataplane level. The two IFLs function in active/backup mode. HSCI-A is the default active link; the active and backup roles are not configurable. If HSCI-A goes down, HSCI-B becomes the active link.

MACsec

The IFL connection utilizes the HSCI- ports on the NGFW exclusively. The HSCI-A and HSCI-B can be configured for redundancy using an active/backup model.
Beginning with PAN-OS 11.1.5, NGFW clustering allows you to configure Media Access Control Security (MACsec) on the HSCI interface. MACsec is a standard (802.1AE) that operates at Layer 2. MACsec provides data confidentiality and integrity between endpoints by adding a security tag and an integrity check value to each Ethernet frame. It provides confidentiality through encryption of the data so that only endpoints with the correct encryption key can access the data. MACsec provides integrity through a cryptographic mechanism, ensuring that data has not been tampered with in transit. MACsec also provides authentication by ensuring that only known endpoints are allowed to communicate on the Ethernet segment.
You can configure MACsec for the HSCI-A and HSCI-B ports (the active and backup ports) to protect the Layer 2 connections between the cluster peers. MACsec is disabled by default; it requires configuration to enable it. (Once enabled, you can remove the configuration items to disable MACsec.) MACsec runs over each HSCI port and has an associated pre-shared key (PSK) that must match on the two HSCI-A ports, and a PSK that must match on the two HSCI-B ports. Each session confirms that the PSKs match.
On the Communications tab, configuring MACsec involves a field and two profiles for each node: Key Server Priority (a ranged integer), Crypto Profile (a dropdown for profiles), and Pre-Shared Key Profile (a dropdown for profiles).

MC-LAGs, Orphan Ports and Orphan LAGs

The NGFW cluster supports an MC-LAG, which is a type of LAG. Recall that a LAG (also known as an aggregated Ethernet (AE) link or aggregate interface group) is a group of links that appear as one link to provide link redundancy. Links in a LAG that connect to endpoints on multiple chassis are configured as part of an MC-LAG. The multiple chassis are seen as a single firewall, providing node redundancy for Layer 3 and virtual wire interfaces only. (Layer 2 does not support MC-LAGs.)
An MC-LAG is an AE interface group that has members spread across both firewalls (also referred to as nodes or chassis). The illustration below represents different MC-LAG scenarios. MC-LAGs are controlled by a single control plane and seen as a single system with a backup management plane. MC-LAG supports redundancy in the event of link failure, card failure, or chassis failure. Each MC-LAG supports a maximum of eight members. A pair of firewalls in an NGFW cluster support a maximum of 64 AE interface groups; the AE interface groups support Layer 3 within the cluster. (You can have an AE interface group using Layer 2 on a single device, but not an AE interface group using Layer 2 on an MC-LAG in the cluster.)
Even a single device scenario can use an MC-LAG, which protects the firewall against failure, but doesn't protect against failure of the connected third-party device.
An orphan port is a single Layer 2, Layer 3, or virtual wire link used for connection from one node. Instead of using a floating IP address, an orphan port has its own IP address. The first packet of a session determines the data plane that owns that session flow. Return data take the reverse path over the same hops back to the source.
An orphan LAG is an AE interface group that has all members originating or terminating on a single firewall (similar to a standalone AE interface group). The single device in the graphic illustrates an example of an orphan LAG. Firewalls supported orphan ports and orphan LAGs prior to the introduction of NGFW clustering.
Aggregate Ethernet local bias is behavior where traffic forwarding prefers a local member over remote ports. Because an MC-LAG has members on both nodes, local bias enforces traffic egress from the local node, instead of forwarding traffic over an HSCI link to the remote node. (The typical behavior of a LAG is to hash to forward traffic to any members.)

Node States Determine the Cluster State

The combined node states of the nodes in an NGFW cluster determine the cluster state. First, let's consider the state of a cluster node, which can be one of these states:
  • UNKNOWN—Clustering is not enabled. Node remains in this state until a cluster configuration push from Panorama or a commit enables clustering.
  • INIT—Node transitions from UNKNOWN to INIT state after clustering is enabled. Node remains in INIT state until cluster initialization of node is complete. Node transitions to ONLINE state if INIT criteria are met. If INIT criteria fail to be met, node transitions to ONLINE state after a timeout.
  • ONLINE—Node is passing traffic and working as expected.
  • DEGRADED—Node transitions to DEGRADED state when a soft fault occurs. DEGRADED state allows L7 continuity for sessions that the DEGRADED state device owns. Traffic links are down in DEGRADED state. Node can transition from DEGRADED to INIT state if all the faults are resolved.
  • FAILED—Node transitions to FAILED state when a hard fault occurs. FAILED state has traffic ports down and doesn't allow L7 continuity. Node can transition from FAILED to INIT state if all the faults are resolved.
  • SUSPENDED—Triggered by administrator. Another cause of SUSPENDED state is if a node state flaps to DEGRADED or FAILED state repeatedly; the node is SUSPENDED after six flaps. An administrator can unsuspend the node. SUSPENDED state has traffic ports down and doesn't allow L7 continuity.
Because the collective states of the nodes in an NGFW cluster determine the cluster state, the cluster state will be:
  • OK— If all nodes are in ONLINE state.
  • IMPACTED—If at least one node is in ONLINE state and another node isn't in ONLINE state.
  • ERROR—If there isn't a single node in ONLINE state.
The System Monitoring portion of the NGFW cluster configuration allows you to specify the minimum number of network cards and data processing cards that must be functional. If the node drops below that minimum, the node state transitions to DEGRADED or FAILED (whichever you configured).

System Faults

Soft faults and hard faults affect whether the node state is DEGRADED or FAILED. Soft faults result in a node state of DEGRADED; hard faults result in a node state of FAILED. Causes of a soft fault are:
  • ID Map synchronization has failed.
  • IPSec VPN Security Association (SA) synchronization has failed.
  • System reported out-of-memory (OOM) fault.
  • Chassis doesn't have minimum capacity and degraded state is configured.
  • Cluster node is in degraded state waiting to be suspended.
Causes of a hard fault are:
  • Cluster infrastructure service has failed.
  • System reported disk fault.
  • Chassis has no DPC slots up.
  • FIB synchronization has failed.
  • Chassis does not have minimum capacity and failed state is configured.
  • Cluster node configuration is incompatible with other nodes.
  • Cluster messaging service has failed.
  • Cluster node is avoiding split brain.
  • Cluster node is recovering from split brain.

Role of the Leader Node

As mentioned, one node of the NGFW cluster is elected the leader node, and the other node is a non-leader node. The leader and non-leader nodes synchronize the following system runtime information:
  • Management Plane:
    • User to Group Mappings
    • User to IP Address Mappings
    • DHCP Lease (as server)
    • Forwarding Information Base (FIB)
    • PAN-DB URL Cache
    • Content (manual sync)
    • PPPoE and PPPoE Lease
    • DHCP Client Settings and Lease
    • SSL VPN Logged in User List
  • Dataplane:
    • ARP Table
    • Neighbor Discovery (ND) Table
    • MAC Table
    • IPSec SAs [Security Associations] (phase 2)
    • IPSec Sequence Number (anti-replay)
    • Virtual MAC
Upon a leader node failover, the following protocols and functions are renegotiated:
  • BGP
  • OSPF
  • OSPFv3
  • RIP
  • PIM
  • BFD
  • DHCP client
  • PPPoE client
  • Static route path monitor

Layer 7 Support and Graceful Failover

In general, NGFW clustering aims to provide parity with HA regarding Layer 7 support. The following table lists Layer 7 features and whether they are supported on a standalone PA-7500 Series firewall, whether they are supported on NGFW cluster nodes, and whether they support graceful failover.
Layer 7 FeatureStandalone PA-7500 Series FirewallNGFW Cluster NodeGraceful Failover
Proxy
Decryption (Forward Proxy/Inbound Inspection)YesYesNo
Hardware Security Module (HSM)YesYesN/A (HSM must be configured per node)
Certificate Revocation (CRL/OCSP)YesYesService Route
Decryption Port MirrorYesYesN/A
Network Packet BrokerYesYes, but no redundancy supportNo
SSH DecryptionYesYesNo
GlobalProtect—IPSec TunnelYesYesNo
GlobalProtect—SSLVPN TunnelYesYesNo
Clientless SSLVPNYesNoNo
LSVPN (Satellite)YesYesYes
IDmgrYesYesYes
Content Threat Detection (CTD)
Application-Level Gateway (ALG)YesYesNo
dnsproxyYesYesNo
varrcvrYesYesNo
threat detectionYesYesNo
Advance Data Loss Prevention (DLP)YesYesNo
URL FilteringYesYesNo
WIF featuresYesYesNo
App-ID Cloud Engine (ACE)YesYesNo
UserID and Configuration
User IdentificationYesYesYes
Device IdentificationYesYesYes
CUIDYesYesYes
Device QuarantineYesYesYes
Dynamic Address Group (DAG)YesYesYes
DUGYesYesYes
ID Map Sync—Management planes of NGFW cluster nodes individually generate IDs for various types of objects (such as IP address objects and security profile objects) in a firewall configuration. These are known as Management Global IDs. Each node maps its own set of IDs with the set of IDs of the peer cluster node. Some IDs are used in session flow data. Upon a chassis failover, Layer 4 sessions are marked as orphaned and sent to the peer chassis for continued processing. During this process, the IDs in the session flow data are updated to match the set of IDs that the peer's management plane provided. NGFW cluster nodes must complete their ID map synchronization before leaving INIT state.
Additionally, Panorama generates a subset of IDs (zone, logical interface, and virtual router IDs) and pushes them to each cluster node. These are Cluster Global IDs.
If you need to migrate from a non-PA-7500 Series firewall, proceed to Migrate to NGFW Clustering.
If you are using a PA-7500 Series firewall, proceed to Configure an NGFW Cluster and then view NGFW Cluster Summary and Monitoring information.