Hot Standby Redundancy
  • 05 Feb 2024
  • 7 Minutes to read
  • Contributors
  • PDF

Hot Standby Redundancy

  • PDF

Article summary

Introduction

This article explains how the Skylight orchestrator Hot Standby Redundancy feature works and covers the requirements for deploying the feature. Redundancy ensures continuous operation of the Skylight orchestrator system by various hardware and software means.

Skylight orchestrator supports two redundancy methods:

  • Hot Standby Redundancy (covered in this section). Two identical Skylight orchestrator sites are set up: one site is active, one site is passive. Data from the active site is continuously replicated to the passive site. Failover from the active site to the passive site is triggered automatically. Hot Standby Redundancy is an optional feature that requires a license.
  • Warm Standby Redundancy. Two identical Skylight orchestrator systems are set up. Data is mirrored from the primary Skylight orchestrator system to the secondary system at regular intervals and failover is initiated manually. For more information, see Warm Standby Redundancy.

Network Communication

The network communication setup for redundancy consists of:

  • Management interface
    • Used for communication with the Skylight orchestrator web user interface and the northbound API interface.
    • Configured on the appliance at each site.
  • Replication interface
    • Used to send real-time database updates from the active site to the passive site.
    • Configured on the appliance at each site.
  • Monitoring interface
    • Used to monitor
      • Communication between the two sites.
      • State of resources on the active site (database, web application server, northbound API server). Sent from the active site to the passive site.
    • Configured on the appliance at each site.
  • Virtual IP address (optional)
    • A virtual interface for the Skylight orchestrator system.
    • Always assigned to the management interface of the active site.

21.png

Initial Setup and Startup

This section describes the initial setup for redundancy and what happens when redundancy starts. For the procedure to configure redundancy, see Configuring Hot Standby Redundancy.

The initial setup for Hot Standby Redundancy is as follows:

  • The Skylight orchestrator site is set up and activated on Site-A
  • Each site has a redundancy state, which is either:
    • active – The active site executes business logic and has established sessions to Skylight devices.
    • passive – The passive site has no active sessions and does not execute business logic.
  • When redundancy is configured, a preferred site can be defined.
    • If a preferred site is configured, it will be the active site when redundancy starts.
    • If no preferred site is configured, Site-A will be the active site when redundancy starts.

For more information about the preferred site, see Preferred Site and Recovery after Failover.

  • When redundancy starts:
    • Data on the active site is continuously replicated to the passive site.
    • Connectivity and resources on the active site are monitored continuously.
    • The passive site is ready to be activated if the active site fails.
The following figure shows the initial setup for hot standby redundancy.

23.png

Automatic Failover

If the active site fails, failover to the passive site is triggered automatically. The figure below shows the automatic failover scenario.

24.png

Some points to note about automatic failover:

  • Automatic failover can be suspended when necessary. For example, during a maintenance window. See Controlling Redundancy.
  • Automatic failover can be disabled in the redundancy configuration, if your organization decides it will initiate failover manually. See Disabling Automatic Failover.

Preferred Site and Recovery After Failover

It is possible to configure a preferred site. The purpose of a preferred site is to determine which site will be active in many circumstances. For example, after both sites reboot.
If a preferred site is configured:

  • Startup: When redundancy starts, the preferred site will be the active site.
  • Failover: If the preferred site fails, the system automatically fails over and the other site becomes the active site.
  • Recovery after Failover: When the preferred site is once again operational, it becomes the active site again.

If a preferred site is not configured (the default: preferred site is set to None):

  • Startup: When redundancy starts, Site-A will be the active site.
  • Failover: If the active site fails (initially Site-A), the system automatically fails over and the passive site (initially Site-B) becomes the active site.
  • Recovery after a Failover: Even after the site that failed is operational again, the site that became active during the failover remains active.


Note: The "config preferred <site-a/site-b>" must be configured correctly with the latest active site so as to prevent the issue of losing the latest database; this issue occurs when HA starts or restarts again.

Conditions That Trigger Automatic Failover

The following conditions trigger automatic failover from the active to the passive site:

  • Passive site detects loss of communication with active site
    The passive site cannot communicate with the active site over the monitoring channel. See Network Communication.

  • Active site fails resources check
    The resources check determines whether all the resources required for proper operation of the Skylight orchestrator system are available on the active site. This check includes ensuring that the database is up, that the web application server is up, that the northbound interface server is up and that at least one mediation server is running.

Split Brain Condition

When both sites become active (due to loss of communication on both inter-site connections), this condition is known as split brain.
The redundancy feature detects and handles a split brain condition as follows.


Note: In the following scenario, Site-A is the preferred site and the active site before the split brain condition occurs. A split brain condition could also arise if Site-B were the active site before the split brain condition occurred.

  1. Before split brain occurs - Redundancy feature is operating normally:
    • Both inter-site communication connections (replication and monitoring) are up and running. The two sites are communicating normally.
    • The active site (Site-A) is collecting data from Skylight devices.
    • The passive site (Site-B) is receiving replicated data from the active site and monitoring communication and resources on the active site.
  2. Split brain condition occurs:
    • Communication is lost over both inter-site connections (replication and monitoring). The two sites cannot communicate.
    • Site-A remains active.
    • Site-B becomes active. It starts collecting data from the same Skylight devices (same replicated database).
  3. Redundancy feature detects the split brain condition:
    • Via the Skylight elements, Site-B detects that Site-A is active.
    • Site-A detects that Site B has connected to Skylight devices.
    • Site-A becomes inactive and disables all communications with the Skylight devices and stops data collection.
    • Site-B becomes the active site. It collects data from the Skylight devices and can update the configuration of the Skylight devices.
    • A replication failure alarm is raised. It is visible in the Appliance Monitor CLI on both sites and in the Skylight orchestrator web user interface.
  4. When Site-A is operational again and both inter-site connections are re-established:
    • Site-A becomes the active site again (because it is the preferred site). It starts collecting data from the Skylight devices.
    • Site-B becomes the passive site.
    • Communication over the replication and monitoring connections is re-established between the two sites.


    Note: The Skylight orchestrator data store will revert to the content and state that were present at the beginning of the split brain condition.

Requirements

Hot Standby Redundancy has the following requirements:

  1. Network requirements between the two sites:
    • Minimum 100 Mbps link
    • 150 ms round-trip latency
  2. Appliances at both sites must be of the same type (one of the following):
    • KVM virtual appliances (with three network adapters)
      See Planning Network Interfaces.
    • VMware virtual appliances (with three network adapters)
      See Adding Network Adapters to Support Additional Ports/Interfaces
  3. Host names for all appliances must be unique.
  4. IP addresses for all appliances must be IPv4. Subnets must be specified by a CIDR value.
  5. All appliances must use NTP to set the date and time.
  6. All appliances must be running the same version of Skylight orchestrator.
  7. Optional: Virtual IP address for Skylight orchestrator:
    • Your network administrator must set up a single virtual IP address and (optionally) a primary interface (for the virtual IP) that are available at both sites.
    • If your network cannot support a virtual IP, you must set up an equivalent technology. If this is necessary, contact Accedian Technical Support.
  8. Three interfaces must be configured on the appliances at both sites:
    • Interface eth0 is the Skylight orchestrator MGMT interface.
    • One interface (typically, eth1) is required for the data replication connection.
    • One interface (typically, eth2) is required for the monitoring connection.
    • All interfaces must be on distinct subnets.
  9. TCP and UDP ports must be opened in the firewall. These are the default ports:
    • TCP: 7788 and 7789 (for the data replication connection) on the replication link.
    • TCP: 6969 (for HA management) on the monitor link.
    • UDP: 5405 (for the monitoring connection) on the monitor link.
    • UDP: 5406 (for the monitoring connection) on the replication link.
    • You can use other ports if necessary. However, you should ensure there are no conflicts with the ports required by Skylight orchestrator for other purposes.
    • See Infrastructure Communications - Network Requirements.

    © 2024 Cisco and/or its affiliates. All rights reserved.
     
    For more information about trademarks, please visit: Cisco trademarks
    For more information about legal terms, please visit: Cisco legal terms

    For legal information about Accedian Skylight products, please visit: Accedian legal terms and tradmarks



Was this article helpful?

Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.