Introduction to Hot Standby Redundancy

This article explains how the Legacy Orchestrator Hot Standby Redundancy feature works and covers the requirements for deploying the feature. Redundancy ensures continuous operation of the Legacy Orchestrator system by various hardware and software means.

Two identical Legacy Orchestrator sites are set up: one site is active, one site is passive. Data from the active site is continuously replicated to the passive site. Failover from the active site to the passive site is triggered automatically. Hot Standby Redundancy is an optional feature that requires a license.

Note: This article will mention several default names for the interface and partition such as ethX (eth0, eth1,...), sdX (sdc,...); they can differ depending on who sets up the Docker host.

Network Communication

The network communication setup for redundancy consists of:

Management interface

Used for communication with the Legacy Orchestrator web user interface and the northbound API interface
Configured on socli at each site

Replication interface

Used to send real-time database updates from the active site to the passive site
Configured on socli at each site

Monitoring interface

Used to monitor:

Communication between the two sites
State of resources on the active site (database, web application server, northbound API server). Sent from the active site to the passive site

Configured on socli at each site

Virtual IP address (optional)

A virtual interface for the Legacy Orchestrator system
Always assigned to the management interface of the active site

Initial Setup and Startup

This section describes the initial setup for redundancy and what happens when redundancy starts. For the procedure to configure redundancy, see Configuring Hot Standby.

The initial setup for Hot Standby Redundancy is as follows:

The Legacy Orchestrator site is set up and activated on Site-A

Each site has a redundancy state, which is either:

active – The active site executes business logic and has established sessions to Provider Connectivity Assurance devices.
passive – The passive site has no active sessions and does not execute business logic.

When redundancy is configured, a preferred site can be defined.

If a preferred site is configured, it will be the active site when redundancy starts.
If no preferred site is configured, Site-A will be the active site when redundancy starts.

For more information about the preferred site, see Preferred Site and Recovery after Failover.

When redundancy starts:

Data on the active site is continuously replicated to the passive site.
Connectivity and resources on the active site are monitored continuously.
The passive site is ready to be activated if the active site fails.

Automatic Failover

If the active site fails, failover to the passive site is triggered automatically.

Some points to note about automatic failover:

Automatic failover can be suspended when necessary. For example, during a maintenance window. See Controlling Redundancy.
Automatic failover can be disabled in the redundancy configuration, if your organization decides it will initiate failover manually. See Disabling Automatic Failover.

Preferred Site and Recovery After Failover

It is possible to configure a preferred site. The purpose of a preferred site is to determine which site will be active in many circumstances. For example, after both sites reboot.
If a preferred site is configured:

Startup: When redundancy starts, the preferred site will be the active site.
Failover: If the preferred site fails, the system automatically fails over and the other site becomes the active site.
Recovery after Failover: When the preferred site is once again operational, it becomes the active site again.

If a preferred site is not configured (the default: preferred site is set to None):

Startup: When redundancy starts, Site-A will be the active site.
Failover: If the active site fails (initially Site-A), the system automatically fails over and the passive site (initially Site-B) becomes the active site.
Recovery after a Failover: Even after the site that failed is operational again, the site that became active during the failover remains active.

Note: The "config preferred <site-a/site-b>" must be configured correctly with the latest active site so as to prevent the issue of losing the latest database; this issue occurs when HA starts or restarts again.

Conditions That Trigger Automatic Failover

The following conditions trigger automatic failover from the active to the passive site:

Passive site detects loss of communication with active site
The passive site cannot communicate with the active site over the monitoring channel. See Network Communication above.
Active site fails resources check
The resources check determines whether all the resources required for proper operation of the Legacy Orchestrator system are available on the active site. This check includes ensuring that the database is up, that the web application server is up, that the northbound interface server is up and that at least one mediation server is running.

Split Brain Condition

When both sites become active (due to loss of communication on both inter-site connections), this condition is known as split brain.
The redundancy feature detects and handles a split brain condition as follows.

Note: In the following scenario, Site-A is the preferred site and the active site before the split brain condition occurs. A split brain condition could also arise if Site-B were the active site before the split brain condition occurred.

Before split brain occurs - Redundancy feature is operating normally:

Both inter-site communication connections (replication and monitoring) are up and running. The two sites are communicating normally.
The active site (Site-A) is collecting data from Provider Connectivity Assurance devices.
The passive site (Site-B) is receiving replicated data from the active site and monitoring communication and resources on the active site.

Split brain condition occurs:

Communication is lost over both inter-site connections (replication and monitoring). The two sites cannot communicate.
Site-A remains active.
Site-B becomes active. It starts collecting data from the same Provider Connectivity Assurance devices (same replicated database).

Redundancy feature detects the split brain condition:

Via the Provider Connectivity Assurance Sensors, Site-B detects that Site-A is active.
Site-A detects that Site B has connected to Provider Connectivity Assurance devices.
Site-A becomes inactive and disables all communications with the Provider Connectivity Assurance devices and stops data collection.
Site-B becomes the active site. It collects data from the Provider Connectivity Assurance devices and can update the configuration of the Provider Connectivity Assurance devices.
A replication failure alarm is raised. It is visible in the Appliance Monitor CLI on both sites and in the Legacy Orchestrator web user interface.

When Site-A is operational again and both inter-site connections are re-established:

Site-A becomes the active site again (because it is the preferred site). It starts collecting data from the Provider Connectivity Assurance devices.
Site-B becomes the passive site.
Communication over the replication and monitoring connections is re-established between the two sites.

Note: The Legacy Orchestrator data store will revert to the content and state that were present at the beginning of the split brain condition.

Requirements

Hot Standby Redundancy has the following requirements:

Network requirements between the two sites:

Minimum 100 Mbps link
150 ms round-trip latency

Both Docker hosts for installing Legacy Orchestrator must be of the same type (one of the following):
- Virtual machines is deployed on either KVM host or ESXi host.
- Hardware machines
An empty partition to dedicate for the Hot Standby Redundancy function on each site:

The minimum of this partition size is 30GB.
This partition on each site must be the same size.

Host names for all appliances must be unique.
IP addresses for all Docker hosts must be IPv4. Subnets must be specified by a CIDR value.
All Docker hosts must use NTP to set the date and time.
All Docker hosts must be running the same version of Legacy Orchestrator.
Optional: Virtual IP address for Legacy Orchestrator:

Your network administrator must set up a single virtual IP address and (optionally) a primary interface (for the virtual IP) that are available at both sites.
If your network cannot support a virtual IP, you must set up an equivalent technology. If this is necessary, contact Accedian Technical Support.

Three interfaces must be configured on the Docker hosts at both sites:

Interface eth0 is the Legacy Orchestrator MGMT interface.
One interface (typically, eth1) is required for the data replication connection.
One interface (typically, eth2) is required for the monitoring connection.
All interfaces must be on distinct subnets.

TCP and UDP ports must be opened in the firewall. These are the default ports:

TCP: 7788 and 7789 (for the data replication connection) on the replication link.
TCP: 6969 (for HA management) on the monitor link.
UDP: 5405 (for the monitoring connection) on the monitor link.
UDP: 5406 (for the monitoring connection) on the replication link.
You can use other ports if necessary. However, you should ensure there are no conflicts with the ports required by Legacy Orchestrator for other purposes.

© 2025 Cisco and/or its affiliates. All rights reserved.

For more information about trademarks, please visit: Cisco trademarks
For more information about legal terms, please visit: Cisco legal terms
For legal information about Accedian Skylight products, please visit: Accedian legal terms and tradmarks