Configuring High Availability

For all topologies, configuring high availability improves the reliability of a network by replacing a single SteelConnect gateway with two gateways in the branch or three gateways in the data center. A backup gateway maintains uninterrupted service in the event of a power, hardware, software, or WAN uplink failure. Configuring HA provides network redundancy and reliability.

This table lists the common use cases for SteelConnect gateways.

Gateway model	Use case
SDI-130	Small branch or retail
SDI-330	Medium branch
SDI-1030	Medium to large branch
SDI-2030	Regional hubs, small data centers, and large branches
SDI-5030	Campus or data center

SteelConnect Manager (SCM) connects the branch gateway pair that includes the master and backup gateways over the links in the management network zone to monitor and route traffic.

The two gateways use active-passive mode. In active-passive mode, only the master gateway processes traffic while the backup gateway remains in standby mode, ready to take over if the master gateway fails.

SCM sends the master gateway configuration to both gateways. The first gateway to send Virtual Router Redundancy Protocol (VRRP) packets on the network becomes the master and SCM applies the master configuration to the gateway. No additional configuration is required on the backup gateway.

Gateways in an HA pair establish an encrypted communication channel between each other. After the communication channel is established between the master and backup gateway, the communication channel replicates all DHCP lease release and renewals between the master and backup in both directions, so that in the event of a failover, a new master gateway doesn’t assign duplicate leases.

The gateway pair also synchronizes firewall state and connection tracking information between the master and backup gateway providing stateful transition if failover occurs.

A failover due to failure of the master gateway will trigger within 3 to 4 seconds of the master gateway going offline. After the backup gateway assumes the master role, it can pass internet traffic in approximately 9 to 10 seconds. The AutoVPN tunnels are typically reestablished after an additional 4 to 5 seconds.

AutoVPN tunnels and site-to-site connectivity after failover can take more than five minutes to reestablish when MAC address cloning is not enabled. We recommend enabling MAC address cloning when using DHCP uplinks. For details, see What impact does a failover from a backup to a master gateway have on the uplinks?.

SCM supports box-to-box redundancy for these gateway models:

You can pair two gateways of the same model, two shadow appliances of the same model, or one hardware and one shadow appliance of the same model for high availability.

To ensure minimal service interruption during a firmware upgrade for an HA pair, SCM uses this smart updating process to gracefully install firmware updates:

2. The master appliance immediately starts downloading the image. The backup appliance downloads the image through a proxied connection through the master appliance.

At this point, the master gateway has received the new firmware file; however, it’s still handling client traffic for the HA pair and a failover has not yet occurred.

4. After SCM receives a notification from the backup gateway that it has rebooted and is running the new firmware, SCM instructs the master gateway to install the firmware and reboot.

6. After the previous master gateway comes back online, it remains in backup mode until the active gateway triggers a failover and relinquishes the active role.

HA protects against local WAN uplink issues such as:

Failover triggers after an Internet Control Message Protocol (ICMP) ping detects that one or more uplinks are down. The gateway dynamically determines an appropriate upstream IP address to ping. The ICMP uplink monitoring disregards short uplink drop-outs to avoid reporting false negatives.

A WAN uplink failover triggers within 13 to 16 seconds after a down uplink is detected. After the backup gateway assumes the master role, it can pass internet traffic in approximately 9 to 10 seconds. The AutoVPN tunnels are typically reestablished after an additional 4 to 5 seconds.

For network stability, a failover can’t occur within 60 seconds of a previous failover. WAN uplink failover uses a 60-second dampening factor to limit the advertisements of up and down link transition states. For 60 seconds after a failover, the system suppresses subsequent failovers until it has enough time to verify the uplink state and analyze the gateway heuristics.

Uplinks are shared between the master and backup gateways. For example, uplink 1 and (optionally) uplink 2 are physically connected to both the master and backup gateways, so if an upstream outage occurs, both gateways are affected. To provide continued connectivity after an upstream outage, you can create a traffic rule that selects a secondary path. For details, see To create a traffic rule.

For uplinks using IPv4 and IPv6 addresses, both IP address versions must be down before the uplink failover triggers. For example, if the IPv4 address goes down but the IPv6 address if up, the uplink doesn’t fail over.

Before configuring high availability, check these requirements and recommendations.

Both gateways must be:

We recommend that the gateways are cabled exactly the same for redundancy.

When HA is configured, never plug in a device other than a switch directly into the gateway. For HA failover to work properly, you can configure a switch in between devices and the HA pair, so that devices can access whichever gateway is currently the master.

You can connect one or more switches directly to the HA pair; however, keep in mind that the HA pair will not forward Layer 2 traffic among the connected switches. To forward Layer 2 traffic, you must configure a core switch as a Layer 2 aggregation layer.

Make sure that the switches connected to the HA gateways are set to either a single-zone port or a multizone port, based on your requirements.

Mirrored uplinks configure identical ports for the HA pair. You can assign an individual uplink to a gateway, and the upstream router assigns a port for each member of the HA pair.

Individual uplinks don’t require a WAN-side switch, as each uplink has its own Layer 3 configuration.

We recommend using mirrored uplinks. In an active-passive HA configuration, the backup gateway is passive with all uplinks and LAN ports down. The uplinks on the backup gateway aren’t actively routing traffic. However, for deployments where WAN edge equipment can provide Layer 3 ports for greater flexibility, you can associate individual uplinks with a WAN. Mirrored and individual uplinks shows a deployment example using one mirrored and two individual uplinks.

In Mirrored and individual uplinks, the HA pair has connectivity to two WANs: an internet and an MPLS. Three uplinks are configured for the HA pair. The first is a single internet uplink in mirrored mode, as the ISP only provides a single port on their router. A WAN-side switch is necessary to achieve connectivity for both appliance’s mirrored uplink port to the single port on the internet router. The MPLS provider provides two Layer 3 ports on their PE router. For each port on the MPLS router, an individual, nonmirrored uplink is created with a Layer 3 configuration for that port and assigned to each partner in the HA pair. No switch is necessary on the MPLS WAN when using individual uplinks, because each uplink has its own unique Layer 3 configuration.

We recommend that you configure the LAN ports such that each gateway mirrors the other; however, you can configure individual, nonmirrored LAN ports per gateway if required. Take care to ensure that the physical cabling respects the port configuration on each gateway.

•Singlezone - Enables a single zone, which is the equivalent to putting a port in access mode. The zone is assigned to only one VLAN. After selecting Singlezone, select the zone for the port to carry.

•Uplink - Enables an individual uplink on each partner in an HA pair. After selecting Uplink, select the uplink to use. See Individual uplink IP addresses.

•Mirrored Uplink - Configures identical ports for the HA pair. One of the nodes in the HA pair needs to have an uplink configured by selecting Uplink in the Port mode field and selecting the specific uplink that needs to be mirrored. On the HA partner, select the corresponding port by selecting Mirrored Uplink. For the mirrored uplink option to be available on the HA partner, the port number must match the partner node’s port number. Select the uplink used in the partner node. SCM configures the port identically to the corresponding port on its partner HA gateway.

When MAC cloning is enabled, mirrored uplinks inherit a virtual MAC address from one of the HA partners. SCM overrides and disables the virtual MAC address on all mirrored uplinks, and it populates the virtual MAC address on one of the HA nodes (indeterminate as to which one is selected by SCM) with the MAC address of the corresponding port on the other node.

The Spanning Tree Protocol (STP) prevents network malfunction by blocking ports that can cause loops in redundant network paths. SteelConnect gateways and switches implement the Multiple Spanning Tree (MST) protocol defined in the IEEE 802.1s specification. MST is not supported on branch gateways configured for high availability.

SCM 2.9 and later extend the port failover logic for a high-availability pair with the ability to trigger failover for the HA pair when a specific LAN port in a pair fails.

We recommend two deployment options: You can use one multizone or a singlezone LAN port per gateway to the LAN switch to avoid Layer 2 loops and MAC address flapping. On the LAN-side switch we recommend disabling STP on all switch ports associated with the gateway.

For example, in a deployment with a SteelHead CX sitting behind a gateway, you can track the physical link for the LAN port for lost connectivity. After configuring tracking for one or more ports, at least one of the tracked ports in the pair must fail to trigger failover. When two ports in an HA pair are tracked, only one of the ports needs to go down to trigger failover.

We do not recommend using tracked ports with a common switch, as a switch failure could potentially trigger undesired HA failover. We recommend using tracked ports to protect against switch failure when two redundant stacked switches are deployed on the LAN side.

Tracking the master port in the pair doesn’t automatically track the backup port. To track both ports, you need to repeat this procedure for the backup port as well as the master.

An HA pair configuration uses a dedicated port that allows you to select a physical port to be used for HA control traffic. The dedicated physical port is directly cabled back-to-back between the appliances.

Dedicated port mode designates a single LAN port as the HA control port. The control port is used for VRRP and routing SCM traffic for backup gateways. VRRP runs directly on the physical interface.

There is no virtual IP on this LAN port, and Layer 3 addressing is statically configured. The LAN port is isolated from customer traffic.

Configuring a dedicated port is strongly recommended for all branch HA pairs. It is required when setting up high availability between a 1030 gateway pair because it prevents loops and spanning tree issues.

We strongly recommend that you cable the gateways to each other directly using dedicated ports, and that you don’t add a switch in between the gateways. Running the HA dedicated port through switch equipment increases the risk of instability. The dedicated port is a single point of failure and any disruption on the dedicated port will cause a network outage. However, running the HA dedicated port through a switch will work, as long as the ports have Layer 2 connectivity. For this configuration, turn off the spanning tree on the switch ports that the dedicated port connects to, or at least enable PortFast on those ports to limit the risk of instability.

A dedicated port is not restricted to only the SDI-VGW and SDI-1030 gateways. You can also configure a dedicated port for the SDI-130 and SDI-330 gateways. For the SDI-1030 gateway, a dedicated port is the only option.

5. Select the control port from the drop-down list. This port must be cabled directly to the other gateway in the HA pair. When you dedicate a port to a gateway, it’s no longer available for use with other gateways.

High availability supports all uplink types. When an HA pair switches a backup gateway over to a master gateway, the uplinks are impacted differently, depending on the uplink type:

•DHCP client - An optional virtual media access control (MAC) cloning feature is available to support addressing on the WAN interfaces. This feature clones the MAC address on the WAN uplinks for both interfaces in the HA pair. The backup gateway then uses the cloned MAC from the master gateway. This feature is useful when using a cable modem/router as an uplink and an ISP expects a consistent MAC address. The ISP can block access if it receives traffic from an unknown MAC address. This feature is disabled by default.

The cloned MAC address will also be used during failover to update the backup gateway with a new virtual MAC address. Without MAC cloning enabled, AutoVPN tunnels can take longer to reestablish after a fail over. For details on failover performance, see Gateway failover performance.

MAC cloning only applies to mirrored uplinks. If you require specific MAC addresses on nonmirrored, individual uplinks, use the virtual MAC address feature directly. For details, see To override a port’s default MAC address.

SCM 2.9 and later extend the uplink failover logic for a high availability pair with specific uplink failure. For example, in a deployment with several internet uplinks and one MPLS uplink, you can track the master node in an HA pair configured for the MPLS for lost connectivity.

After configuring tracking for one or more uplinks, at least one of the tracked uplinks in the pair must fail to trigger failover. When two uplinks in an HA pair are tracked, only one of the uplinks needs to go down to trigger failover.

When no tracked uplinks are configured, the gateway falls back to the default behavior, which triggers a failover only when all uplinks are detected to be down.

Tracking the master uplink in the pair doesn’t automatically track the backup uplink. To track both uplinks, you need to repeat this procedure for the backup uplink as well as the master.

SCM displays all gateways belonging to a high availability pair with a blue HA icon in all views. After the gateway reports its HA state to SCM, the icon indicates whether it is the master or the backup.

The pair stays together in appliance lists to make it clear that the gateway is a partner that belongs in an HA pair. SCM manages both gateways in a pair as one.

When an HA pair is separated, the gateways continue running with the same port settings, AutoVPN setting, and so on used in the HA pair. SCM unmirrors the uplinks, so one gateway will typically no longer have an uplink associated with it.

High availability (HA) maintains uninterrupted service for a data center gateway cluster in the event of a power, hardware, software, or link failure.

SteelConnect 2.11 provides active-active HA for SDI-2030 appliances located at the data center. For details, see Overview in the SteelHead SD User Guide.

SteelConnect Manager (SCM) connects three or more 5030 gateways (nodes) to monitor and route traffic. Configuring high availability between SDI-5030 nodes provides network redundancy and reliability.

The SteelConnect data center solution uses the notation n + k to describe the engineered capacity (n) and resiliency (k) of nodes in a high-availability solution.

Because redundancy is critical, the minimum SDI-5030 high-availability cluster is deployed in an n+ k arrangement of 2 + 1.

To increase throughput, you can scale out the deployment by adding more active and spare SDI-5030 nodes.

A cluster is made up of multiple SDI-5030 nodes in a single data center. In an out-of-path deployment, the data center cluster is deployed on the server side. You can achieve resiliency by deploying at least three data center SDI-5030 nodes out-of-path at one site. In a cluster of three 5030 nodes, all three of the nodes actively handle traffic. A 2 + 1 cluster is a three appliance active quorum that tolerates one complete SDI-5030 gateway failure and remains operational.

High availability ensures no single component failure can bring down an entire cluster. Failure handling is tied to reliable semantics to detect a downed cluster node or service node. Failure recovery is initiated based on the failure notification.

A healthy cluster automatically enables data center high availability. There is no need to enable high availability after creating a cluster. For details, see Creating clusters.

Each SDI-5030 node in the cluster is individually connected to SCM. SCM sends the configuration to all three SDI-5030 nodes.

Removing or upgrading a cluster node causes connections on that node to failover and tunnels from affected data centers to reconnect.

After a node failure, a cluster rebalances the active traffic load to resume traffic flow through the nodes in under a minute.

When an active node fails, traffic flow rebalancing occurs automatically. Branch gateways handled by the failed node reconnect to the newly assigned active node. The cluster health is degraded but remains operational.

The control virtual machine (CVM) manages appliance start up, licenses, initial configuration, and interface addressing. CVMs are interconnected through data center Layer 3 connectivity and represent an entire data center cluster as a combined manageable entity.

A CVM failure triggers a SDI-5030 node reboot. The SDI-5030 node also reboots after the CVM encounters any errors during the recovery process.

The external Border Gateway Protocol (eBGP) is used when a tunnel endpoint (TEP) moves from one SDI-5030 node to another during a node failover. When an SDI-5030 node owning a TEP fails, the cluster transfers the TEP from the previous active node to a spare node. The spare node becomes active and now owns the TEP. It advertises the TEP into eBGP so it can attract traffic to itself.

All data center SDI-5030 nodes must use a private autonomous system number (ASN) to determine the best path between two points and also to prevent looping. During a failover, the SDI-5030 node uses the AS path prepend as follows:

•In a steady, functioning state, an SDI-5030 node prepends the ASN two times in its TEP advertisement. This creates an AS path length of three. Because it’s the only path for the TEP, it becomes the best route.

•After a failover, an SDI-5030 node becomes the new owner of a tunnel endpoint. It prepends the ASN one time to the TEP advertisement, which results in a route with the shortest AS path. This causes its route advertisement to win over any preexisting, longer path advertisements because it has an AS length of two. This route advertisement method improves network convergence time, speeding up the failover.