Docker Redundancy
  • 21 Jun 2024
  • 28 Minutes to read
  • Contributors
  • PDF

Docker Redundancy

  • PDF

Article summary

Introduction

This article explains how the Skylight orchestrator Hot Standby Redundancy feature works and covers the requirements for deploying the feature. Redundancy ensures continuous operation of the Skylight orchestrator system by various hardware and software means.

Two identical Skylight orchestrator sites are set up: one site is active, one site is passive. Data from the active site is continuously replicated to the passive site. Failover from the active site to the passive site is triggered automatically. Hot Standby Redundancy is an optional feature that requires a license.


Note: This article will mention several default names for the interface and partition such as ethX (eth0, eth1,...), sdX (sdc,...); they can differ depending on who sets up the Docker host.

Network Communication

The network communication setup for redundancy consists of:

  • Management interface
    • Used for communication with the Skylight orchestrator web user interface and the northbound API interface.
    • Configured on socli at each site.
  • Replication interface
    • Used to send real-time database updates from the active site to the passive site.
    • Configured on socli at each site.
  • Monitoring interface
    • Used to monitor
      • Communication between the two sites.
      • State of resources on the active site (database, web application server, northbound API server). Sent from the active site to the passive site.
    • Configured on socli at each site.
  • Virtual IP address (optional)
    • A virtual interface for the Skylight orchestrator system.
    • Always assigned to the management interface of the active site.

Initial Setup and Startup

This section describes the initial setup for redundancy and what happens when redundancy starts. For the procedure to configure redundancy, see Configuring Hot Standby.

The initial setup for Hot Standby Redundancy is as follows:

  • The Skylight orchestrator site is set up and activated on Site-A
  • Each site has a redundancy state, which is either:
    • active – The active site executes business logic and has established sessions to Skylight devices.
    • passive – The passive site has no active sessions and does not execute business logic.
  • When redundancy is configured, a preferred site can be defined.
    • If a preferred site is configured, it will be the active site when redundancy starts.
    • If no preferred site is configured, Site-A will be the active site when redundancy starts.

For more information about the preferred site, see Preferred Site and Recovery after Failover.

  • When redundancy starts:
    • Data on the active site is continuously replicated to the passive site.
    • Connectivity and resources on the active site are monitored continuously.
    • The passive site is ready to be activated if the active site fails.
The following figure shows the initial setup for hot standby redundancy.

23.png

Automatic Failover

If the active site fails, failover to the passive site is triggered automatically. The figure below shows the automatic failover scenario.

24.png

Some points to note about automatic failover:

  • Automatic failover can be suspended when necessary. For example, during a maintenance window. See Controlling Redundancy.
  • Automatic failover can be disabled in the redundancy configuration, if your organization decides it will initiate failover manually. See Disabling Automatic Failover.

Preferred Site and Recovery After Failover

It is possible to configure a preferred site. The purpose of a preferred site is to determine which site will be active in many circumstances. For example, after both sites reboot.
If a preferred site is configured:

  • Startup: When redundancy starts, the preferred site will be the active site.
  • Failover: If the preferred site fails, the system automatically fails over and the other site becomes the active site.
  • Recovery after Failover: When the preferred site is once again operational, it becomes the active site again.

If a preferred site is not configured (the default: preferred site is set to None):

  • Startup: When redundancy starts, Site-A will be the active site.
  • Failover: If the active site fails (initially Site-A), the system automatically fails over and the passive site (initially Site-B) becomes the active site.
  • Recovery after a Failover: Even after the site that failed is operational again, the site that became active during the failover remains active.


Note: The "config preferred <site-a/site-b>" must be configured correctly with the latest active site so as to prevent the issue of losing the latest database; this issue occurs when HA starts or restarts again.

Conditions That Trigger Automatic Failover

The following conditions trigger automatic failover from the active to the passive site:

  • Passive site detects loss of communication with active site
    The passive site cannot communicate with the active site over the monitoring channel. See Network Communication.

  • Active site fails resources check
    The resources check determines whether all the resources required for proper operation of the Skylight orchestrator system are available on the active site. This check includes ensuring that the database is up, that the web application server is up, that the northbound interface server is up and that at least one mediation server is running.

Split Brain Condition

When both sites become active (due to loss of communication on both inter-site connections), this condition is known as split brain.
The redundancy feature detects and handles a split brain condition as follows.


Note: In the following scenario, Site-A is the preferred site and the active site before the split brain condition occurs. A split brain condition could also arise if Site-B were the active site before the split brain condition occurred.

  1. Before split brain occurs - Redundancy feature is operating normally:
    • Both inter-site communication connections (replication and monitoring) are up and running. The two sites are communicating normally.
    • The active site (Site-A) is collecting data from Skylight devices.
    • The passive site (Site-B) is receiving replicated data from the active site and monitoring communication and resources on the active site.
  2. Split brain condition occurs:
    • Communication is lost over both inter-site connections (replication and monitoring). The two sites cannot communicate.
    • Site-A remains active.
    • Site-B becomes active. It starts collecting data from the same Skylight devices (same replicated database).
  3. Redundancy feature detects the split brain condition:
    • Via the Skylight elements, Site-B detects that Site-A is active.
    • Site-A detects that Site B has connected to Skylight devices.
    • Site-A becomes inactive and disables all communications with the Skylight devices and stops data collection.
    • Site-B becomes the active site. It collects data from the Skylight devices and can update the configuration of the Skylight devices.
    • A replication failure alarm is raised. It is visible in the Appliance Monitor CLI on both sites and in the Skylight orchestrator web user interface.
  4. When Site-A is operational again and both inter-site connections are re-established:
    • Site-A becomes the active site again (because it is the preferred site). It starts collecting data from the Skylight devices.
    • Site-B becomes the passive site.
    • Communication over the replication and monitoring connections is re-established between the two sites.


    Note: The Skylight orchestrator data store will revert to the content and state that were present at the beginning of the split brain condition.

Requirements

Hot Standby Redundancy has the following requirements:

  1. Network requirements between the two sites:
    • Minimum 100 Mbps link
    • 150 ms round-trip latency
  2. Both Docker hosts for installing Skylight orchestrator must be of the same type (one of the following):
    • Virtual machines is deployed on either KVM host or ESXi host.
    • Hardware machines
  3. An empty partition to dedicate for the Hot Standby Redundancy function on each site:
    • The minimum of this partition size is 30GB.
    • This partition on each site must be the same size.
  4. Host names for all appliances must be unique.
  5. IP addresses for all Docker hosts must be IPv4. Subnets must be specified by a CIDR value.
  6. All Docker hosts must use NTP to set the date and time.
  7. All Docker hosts must be running the same version of Skylight orchestrator.
  8. Optional: Virtual IP address for Skylight orchestrator:
    • Your network administrator must set up a single virtual IP address and (optionally) a primary interface (for the virtual IP) that are available at both sites.
    • If your network cannot support a virtual IP, you must set up an equivalent technology. If this is necessary, contact Accedian Technical Support.
  9. Three interfaces must be configured on the Docker hosts at both sites:
    • Interface eth0 is the Skylight orchestrator MGMT interface.
    • One interface (typically, eth1) is required for the data replication connection.
    • One interface (typically, eth2) is required for the monitoring connection.
    • All interfaces must be on distinct subnets.
  10. TCP and UDP ports must be opened in the firewall. These are the default ports:
    • TCP: 7788 and 7789 (for the data replication connection) on the replication link.
    • TCP: 6969 (for HA management) on the monitor link.
    • UDP: 5405 (for the monitoring connection) on the monitor link.
    • UDP: 5406 (for the monitoring connection) on the replication link.
    • You can use other ports if necessary. However, you should ensure there are no conflicts with the ports required by Skylight orchestrator for other purposes.

Configuring Hot Standby Redundancy (Docker)

Configuring redundancy involves the following tasks:

A. Obtain all the information that you will need for the procedures.
See Information Needed to Configure Hot Standby Redundancy.

B. Ensure that all required appliances are installed.
See Ensuring All Required Appliances Are Installed.

C. Perform basic configuration of all appliances at both sites.
See Basic Appliance Configuration.

D. Copy the license file for redundancy to both sites.
See Copying the License File to Both Sites.

E. Configure replication partition to Both Sites
See Configuring Replication Partition on Both Sites.

F. Configure and start the redundancy.
See Configuring and Starting Redundancy.


CAUTION: For all changes to the redundancy configuration, a redundancy restart is required for the change to take effect.


Note: The redundancy feature must be stopped before reconfiguring the hostname of the Docker host.

Information Needed to Configure Hot Standby Redundancy

InformationSite-ASite-BNotes
IP Docker hosts and username/passwordOnly for Skylight orchestrator deployments on Docker hosts.

One user has sudo privileges or root access on the Docker host.

Host name of each Docker hostMust be unique for the entire Docker host.

Root user or user with sudo privilege credentials are required.

IP address/CIDR for management interfaceWill be used for interface eth0
IP address/CIDR for replication interfaceTypically used for interface eth1
IP address/CIDR for monitoring interfaceTypically used for interface eth2
Default gateway IP address
Static routes
Preferred siteOptional. Possibly values: none (default), site-A, site-B. See Preferrred Site and Recovery After Failover.
Virtual IP address[single address for both sites][single address for both sites]Optional. Same subnet should be present at both sites
Primary interface for Virtual IP addressOptional, Virtual IP primary interface name (for expample, eth4) is optional. Defaults to eth0 if not set.
IP addresses of DNS serversCan set one or two.
IP addresses of NTP servers[list of NTP servers used for all appliances][list of NTP servers used for all appliances]Can set two or more.
Redundancy license fileObtained from Accedian Technical Support
Automatic failoverEnable (default) Disable (need to disable)Enable (default) Disable (need to disable)For more information, see Disabling Automatic Failover

Basic Docker Host Configuration for Hot Standby Redundancy

Configuration taskNotes
Configure the management interfaceThe management interface is normally eth0
Set host nameHost names for all appliances must be unique for the entire deployment (both sites).
Configure NTP clientThe same list of NTP servers must be set on all appliances at both sites.
Configure DNS serversThe same list of DNS servers must be set on all appliances at both sites.
Add an interface for data replicationTypically assigned to interface eth1 Address must be in IPv4 format.
Add an interface for monitoringTypically assigned to interface eth2 Address must be in IPv4 format.
Add routes (optional)Although not required, we recommend routing the traffic of the monitoring and replication interfaces over a distinct gateway. Sending all traffic to the default gateway will work but will become a single point of failure that could result in a split brain condition.
An empty partition to dedicate for the Hot Standby Redundancy function on each siteThis partition on each site must be the same size.
Name of network interfaces on each Docker hostThe Name of network interfaces on Docker host must be the same together for both sites (e.g: eth0, eth1, eth2)

Ensuring All Required Appliances Are Installed

You must ensure that all appliances required at Site-A and Site-B have been installed and are connected to the network.

If you are setting up redundancy for an existing Skylight orchestrator system, you will need to install the required appliance(s) at the additional site.

After the Docker host is configured to meet all requirements in the “Basic Docker Host Configuration for Hot Standby Redundancy” table. SO Docker can deploy on this Docker host then.

For detailed information about installing Skylight orchestrator on Docker host, see:

If the Docker host runs RedHat 9.3, follow the steps below to ensure that partitions are correctly identified in the correct order after rebooting the Docker host.

  1. Log into your Docker host with the administrator login credentials.

  2. Edit the /etc/default/grub file.

sudo nano /etc/default/grub
  1. Add the sd_mod.probe=sync option to the GRUB_CMDLINE_LINUX line in the file. For example:
GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet sd_mod.probe=sync"
  1. Run the following command to update GRUB:
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
  1. Reboot the system.

Accessing the climanager Service (socli)

All procedures must be performed in the climanager service (socli). You can connect to the the climanager service (socli) in one of the following ways:

  • Enter: “socli.sh” command via SSH connection (port 22) to Docker host
  • Use an SSH client to access SSH with port 2200 to Docker host

The procedures must be executed as the skylight user. You must know the account credentials.

Basic Docker Host Configuration

The following table summarizes the basic configuration that is required on all Docker hosts at both sites. Ensure that the Docker host matches these requirements.

Basic Docker Host Configuration for Hot Standby Redundancy

Configuration taskNotes
Configure the management interfaceThe management interface is normally eth0
Set host nameHost names for all appliances must be unique for the entire deployment (both sites).
Configure NTP clientThe same list of NTP servers must be set on all appliances at both sites.
Configure DNS serversThe same list of DNS servers must be set on all appliances at both sites.
Add an interface for data replicationTypically assigned to interface eth1. Address must be in IPv4 format.
Add an interface for monitoringTypically assigned to interface eth2. Address must be in IPv4 format.
Add routes (optional)Although not required, we recommend routing the traffic of the monitoring and replication interfaces over a distinct gateway. Sending all traffic to the default gateway will work but will become a single point of failure that could result in a split brain condition.
An empty partition to dedicate for the Hot Standby Redundancy function on each siteThis partition on each site must be the same size.
Name of network interfaces on each Docker hostThe Name of network interfaces on Docker host must be the same together for both sites (e.g: eth0, eth1, eth2)

The basic configuration must have been done on all Docker hosts. All procedures must be performed on newly installed Docker hosts. Certain procedures can be skipped on previously installed Docker hosts. The number of Docker hosts that must be configured depends on the installation scenario:

  • If you are setting up a second site for an existing Skylight orchestrator Docker system consisting of a single Skylight orchestrator Docker, you must configure the Docker host at the new site only.
  • If both sites are new installations and each site only includes a single Skylight orchestrator Docker, you must configure both Docker hosts.

Copying the License File to Both Sites

The redundancy feature requires a license. The license must be available on the Docker host of both sites so that you can import it during the procedure in the next section.

You will need an SCP client (such as WinSCP) on your computer.

  1. Obtain the license file from Accedian Technical Support and save to your computer.

  2. Copy the license file to Site-A:
    a. Use the SCP client and the skylight account to access the Docker host for Site-A.
    b. Copy the redundancy license file from your computer to the /home/skylight/ directory on the appliance for Site-A.

  3. Copy the license file to Site-B:
    a. Use the SCP client and the skylight account to access the Docker host for Site-B.
    b. Copy the redundancy license file from your computer to the /home/skylight directory on the appliance for Site-B.

  4. If you are not already logged in on the socli, open an SSH terminal session to the Skylight orchestrator CLI on port 2200 of Site-B and log in as the skylight user.
    The Skylight prompt is displayed.


    Note: Perform the procedure below (step 5) in the socli of the Skylight orchestrator for Site-B

  5. Import the license for the redundancy feature by entering:

redundancy license import filename fullPath/licenseFilename

Example of full path and filename: /data/drbd-proxy.license

  1. If you are not already logged in on the socli, open an SSH terminal session to the Skylight orchestrator CLI on port 2200 of Site-A and log in as the skylight user.


    Note: Perform the procedure below (step 7) in the socli of the Skylight orchestrator for Site-A.

  2. Import the license for the redundancy feature by entering:

redundancy license import filename fullPath/licenseFilename

Example of full path and filename: /home/skylight/drbd-proxy.license

Configuring Replication Partition to Both Sites

  1. If you are not already logged in on the socli, open an SSH terminal session to the Skylight orchestrator CLI on port 2200 of Site-A and log in as the skylight user.
    The Skylight prompt is displayed.


Note: Perform the procedure below (step 2 and 3) in the socli of the Skylight orchestrator for Site-A.

  1. Configure replication partition for the redundancy feature by entering:
redundancy config replication-partition <partition name>  host-admin-user <a user with sudo privilege>


CAUTION: While this operation is running, the partition in the command above will be unmounted and formatted. To prevent data loss on this partition, take care to specify the correct partition name before running this command.

  1. If prompted, provide the password of the user that has sudo privileges.
    You will need to provide the password twice (once for login as the user with sudo privilege and once for sudo privilege).
    Example:

Skylight: redundancy config replication-partition /dev/sdc host-admin-user visionems

Password:
[sudo] password for visionems:

The partition '/dev/sdc' will be unmounted and formatted.
Proceed ? (y/N)
y

Skylight:

  1. If you are not already logged in on the socli, open an SSH terminal session to the Skylight orchestrator CLI on port 2200 of Site-B and log in as the skylight user.
    The Skylight prompt is displayed.


Note: Perform the procedure below (step 5 and 6) in the socli of the Skylight orchestrator for Site-B.

  1. Configure replication partition for the redundancy feature by entering:
redundancy config replication-partition <partition name>  host-admin-user <a user with sudo privilege>


CAUTION: While this operation is running, the partition in the command above will be unmounted and formatted. To prevent data loss on this partition, take care to specify the correct partition name before running this command.

  1. If prompted, provide the password of the user that has sudo privileges.
    You will need to provide the password twice (once for login as the user with sudo privilege and once for sudo privilege).
    Example:

Skylight: redundancy config replication-partition /dev/sdc host-admin-user visionems

Password:
[sudo] password for visionems:

The partition '/dev/sdc' will be unmounted and formatted.
Proceed ? (y/N)
y

Skylight:

Configuring and Starting Redundancy

The procedures in this section cover all the tasks required to configure and start the redundancy feature, this procedure needs to be executed on Site-A only.
You will need to set the preferred site, including:

  • Configure the virtual IP
  • Start the redundancy feature
  • Test that the redundancy feature is operating normally.


CAUTION: You must configure and activate redundancy on Site-A. The configuration will be automatically replicated to Site-B.


Note: The redundancy feature must be stopped before reconfiguring the hostname of the Docker host.

To configure redundancy

  1. Configure redundancy by entering these commands:
redundancy config site-a hostname nameSiteA 
redundancy config site-a replication-ip a.a.a.a
redundancy config site-a monitor-ip c.c.c.c

where:
nameSiteA is the hostname that was previously assigned to the Docker host of Site-A.
a.a.a.a is the address of the interface previously configured for data replication.
c.c.c.c is the address of the interface previously configured for monitoring.

  1. Configure redundancy by entering these commands, and provide Site-B details:
redundancy config site-b hostname nameSiteB 
redundancy config site-b monitor-ip b.b.b.b 
redundancy config site-b replication-ip d.d.d.d

where:
nameSiteB is the hostname that was previously configured for the Docker host at Site-B
b.b.b.b is the address of the interface previously configured for data replication
d.d.d.d is the address of the interface previously configured for monitoring

  1. If you want to designate the preferred site (this will be the active site at startup and after recovery from a failover), enter:
redundancy config preferred siteOption

where:
siteOption is your choice of preferred site. Possible values: none (default), site-a, site-b

  1. Configure the virtual IP for the Skylight orchestrator system as follows:


Note: By default, the virtual IP state is enabled.

If the user needs to configure the virtual IP, they must follow the two steps below.

a. Set the virtual IP address by entering:

redundancy config virtual-ip vip-address e.e.e.e

where:
e.e.e.e is the virtual IP address (previously configured for the Skylight orchestrator system)

b. Configure the primary interface associated with virtual IP address:

redundancy config virtual-ip vip-primary-interface interfaceName

where:
interfaceName is the primary interface (previously configured for the virtual IP address)

Example for interfaceName: eth0/eth1/ens160/ens224,...

If the user does not need to configure the virtual IP, enter:

redundancy config virtual-ip vip-state disable


CAUTION: The next step (disabling auto-failover) is NOT recommended. For more information, see Disabling Automatic Failover.

  1. If you want to disable automatic failover, enter:
redundancy config auto-failover disable

6. Display the redundancy configuration by entering:

redundancy show configuration

The configuration should be similar to the following:

Docker Hot Standby Redundancy output.png

  1. Start the redundancy feature by entering:
redundancy control start

After a short delay, redundancy becomes operational and the Skylight prompt is displayed. If a preferred site has been set, it is the active site. If preferred site is set to none (default value), Site-A is the active site. Data is being replicated from the active site to the passive site. Connectivity between the two sites is being monitored.

  1. Check whether the redundancy feature is operating normally by entering:
redundancy test

The test checks that redundancy is configured properly and that data replication is taking place. The results are displayed.

Disabling Automatic Failover

By default, redundancy is configured with automatic failover enabled. The system will determine when it is necessary to switch from the active to the passive site and will do so without human intervention.

If you prefer to decide when to fail over from the active site to the passive site, you can change the redundancy configuration to disable automatic failover. If you disable automatic failover, replication and monitoring will continue. It will be necessary to manually switch from the active site to the passive site in the event of a failure on the active site. See the redundancy control switch command in Controlling Redundancy.

If you decide to disable automatic failure, we recommend that you do so during the initial configuration of redundancy. See Configuring and Starting Redundancy.

To change the automatic failover configuration

If you decide to disable automatic failover after redundancy has been started, you can do so as explained in this procedure. You can do this on the appliance at Site-A.

  1. If you are not already logged in on the socli (SSH port 2200), log in as the skylight user.

  2. Stop the redundancy feature by entering:

redundancy control stop
  1. Ensure that redundancy has been stopped by entering:
redundancy show status

The output should indicate that the global status is Stopped.

  1. To disable the automatic failover configuration, enter:
redundancy config auto-failover disable
  1. Ensure that redundancy configuration has changed by entering:
redundancy show configuration

The output should indicate that auto-failover has been Disabled.

  1. Start the redundancy feature by entering:
redundancy control start
  1. Ensure that redundancy has been started by entering:
redundancy show status

The output should indicate that the global status is Started.

Three commands allow you to view key information about the redundancy feature:

  • redundancy show configuration
  • redundancy show statistics
  • redundancy show status.

Viewing Redundancy Configuration

Enter the following command to view details of the redundancy configuration:

redundancy show configuration

The output will be similar to the following:
Viewing Redundancy Configuration ouput.png

Note the following points about the redundancy configuration:

  • Replication type: Currently, this is always set to Geo redundant. Other modes may be available in future releases.
  • Auto-failover: Possible values are: Enabled and Disabled. It is set to Enabled by default. The system will switch from the active to the passive when it determines that this is necessary. Auto-failover can be disabled, but this should be done with caution. For more information, see Disabling Automatic Failover.

Viewing Redundancy Status

Enter the following command to view the status of the redundancy feature:

redundancy show status

The output will be similar to the following:
Viewing Redundancy status ouput.png

The possible redundancy status results are as follows:

  • Global Status: Possible values: Started, Stopped, Suspended.
  • Node Status: Possible values: Active, Passive, Offline
  • Replication status: Possible values: Up, Down, Synchronizing:
    • Up means data is being transferred from the active to the passive site.
    • Down means data is not being transferred from the active to the passive site. This is normal if the Global Status is Stopped or Suspended. This is not normal if the Global Status is Started.
    • Synchronizing means the system is catching up on replicating data. This happens after redundancy starts or after a long suspension.

Viewing Redundancy Statistics

Enter the following command to view statistics about redundancy:

redundancy show statistics

The output will be similar to the following:
Viewing Redundancy statistics ouput.png

Note that the redundancy statistics that displayed are expressed in terms of the site on which you are logged on and currently viewing the statistics:

  • Network sent: Data sent from the local site to the other site.
  • Network received: Data received by the other site.
  • Disk write: Data written on the local site.
  • Disk read: Disk read on the local site.
  • Out of sync: This value should be close to 0 during normal operations.

Controlling Redundancy

After the redundancy feature has been configured and is enabled, you can control its behavior using the redundancy control commands:


CAUTION: The commands below MUST only be run on one of the sites. Running commands on both sites simultaneously can lead to the HA system not working properly.

  • redundancy control set-preferred-site – Updates the preferred site. Effective as soon as it is entered. Does not require a redundancy restart.
    • Three options are available: none, site-a, site-b.
  • redundancy control start – Activates redundancy feature. If successful, data replication/resource monitoring will start.
  • redundancy control stop – Stops the redundancy feature. Replication, monitoring and failover functions are all stopped.
      Important:
    • Stopping redundancy will stop data replication. Data will be stored to the data store on the active site only.
    • The redundancy feature can not stop properly if there is any terminal that is accessing the replication-partition folder (/home/skylight/so/mysql-ha/). Ensure that the replication-partition folder (/home/skylight/so/mysql-ha/) is not busy while stopping redundancy feature.
    • Redundancy feature can be stopped on both sites by running redundancy control stop command on a site only.
    • However, the redundancy feature must be stopped by manually rerunning redundancy control stop command again on the remaining site if a site is in shutdown or unreachable while stopping the redundancy feature process.
    • Do not force reboot/shutdown Skylight orchestrator (containers or Docker host) while stopping redundancy feature process.
  • redundancy control restart – Equivalent to entering the start and stop commands. If you change the redundancy configuration, you must enter the restart command.
  • redundancy control switch – Switches the active site (from the site that is currently active to the site that is currently passive). If a manual switch is initiated, the Preferred-site option will be automatically set to none.
  • redundancy control suspend – This command puts the redundancy features completely on hold:
    • It disables failover (automatic and manual).
    • It stops replication and monitoring immediately.
    • Failover will not occur regardless of the conditions.

    To re-enable replication, monitoring and automatic failover, you must enter the resume command.
    The suspend command is typically used to temporarily put the redundancy feature on hold. For example, during a maintenance window, when you need to reboot an appliance or apply a patch to the operating system.

  • redundancy control resume – Re-enables automatic failover after the suspend command was executed.
  • redundancy control handle-split-brain – Handling data cannot synchronize between two sites issue.

All commands are run from the socli. See Accessing the climanager service (socli).

Managing the Redundancy License

The Hot Standby Redundancy requires a license. The license file specifies the MAC addresses of the Skylight orchestrator appliances for Site-A and Site-B.

Two commands allow you to manage the license for the redundancy feature:

  • redundancy license import filename – Importing the license file with this command is the first step in configuring redundancy. If you do not import the license file, you will not be able to start redundancy.
  • redundancy license reset – Use this command to uninstall the license. You should reset the license (that is uninstall it) before re-importing the existing license or importing a new license.


Note: The redundancy license commands must be executed on the Skylight orchestrator Docker at both sites. The redundancy feature will only start if the same valid license has been imported at both sites. All commands can be run from the appliance console. See Accessing the climanager service (socli).

If you want to import a license, the license file must be present on the Skylight orchestrator Docker. See Copying the License File to Both Sites.

Replacing a License

Please follow the three sections below in sequence to complete this procedure.

To stop HA and config preferred site

The commands below must be done via socli on the active site only.

  1. Stop the redundancy feature by entering:
redundancy control stop
  1. Config the preferred site (site-a or site-b) to ensure that this site remains the active site and to prevent the latest database loss on HA system when HA restarts:
redundancy config preferred < site-a or site-b >

To reset and import a license

This procedure must be repeated in the console of the Skylight orchestrator Docker at both sites.

  1. Check HA’s status again and ensure HA’s status is “Stopped” by entering:
redundancy show status



Note: if the HA’s status is “Started” on this site, stop HA again on this site by entering:
redundancy control stop

  1. Uninstall the license that is currently installed by entering:
redundancy license reset
  1. Import the license for the redundancy feature by entering:
redundancy license import filename fullPath/licenseFilename

Example of full path and filename: //home/skylight/drbd-proxy.license

To start HA again

The commands below must only be done via socli on the site of section I.

  1. Start the redundancy feature by entering:
redundancy control start
  1. Ensure that redundancy is started by entering:
redundancy show status

The output should indicate that the global status is Started.

Testing Redundancy

After the redundancy feature has been configured, enabled and started, you can check that it is operating normally with the redundancy test command.
This command is executed from the appliance console. For more information about the appliance console, see Accessing the Appliance Console.

  1. Ensure that redundancy is started by entering:
redundancy show status
  1. Run the test by entering:
redundancy test

The output of the test will be similar to the following:
testing Redundancy ouput.png

Redundancy Commands – CLI Help and Command Summary

Tab-completion help is available for the redundancy commands on the Skylight orchestrator command line.

The following tables of redundancy commands list all the parameters and possible parameter values.

redundancy config

ParametersPossible parameter values
auto-failoverenable, disable
monitor-portdefault
preferrednone, site-a, site-b
replication-port
site-ahostname of appliance at Site-A
site-bhostname of appliance at Site-B
replication-partition

redundancy config virtual-ip

ParametersPossible parameter values
vip-addressIP address used as single point of access to Skylight orchestrator system
vip-primary-interfaceAny interface defined for the virtual IP address. Default: eth0
vip-stateenable, disable

redundancy control

ParametersPossible parameter values
restart--
resume--
set-preferred-sitenone (default), site-a, site-b
start--
stop--
suspend--
switch--
handle-split-brain--

redundancy license

ParametersPossible parameter values
importfullPath/licenseFilename
reset--

redundancy show

ParametersPossible parameter values
configuration--
statistics--
status--

The following alarms related to redundancy may be raised. They are visible in the socli on both sites and in the Skylight orchestrator web user interface.

Alarm IDSeverityService AffectingDescriptionManaged Object ClassAlarm TypeProbable Cause
10.0005.05WarningNoRedundancy is not readyNOT-READYProcessing errorConfiguration or customization error
10.0005.06CriticalNoReplication Failure detectedREPLICATION-FAILProcessing errorSoftware error
10.0005.07MajorNoSite-A is offlineSITE-A-OFFLINECommunicationsLoss of signal
10.0005.08MajorNoSite-B is offlineSITE-B-OFFLINECommunicationsLoss of signal
10.0005.09CriticalNoMalfunction in redundancy featureMALFUNCTIONProcessing errorSoftware error
10.0005.10MajorNoFailover event has occurredFAILOVER-EVENTQoSPerformance degraded

Viewing Redundancy Logs

The redundancy feature writes a log of activity and error messages.

You can find the log file that contains entries about redundancy is located in the following directory on the Docker host for the active site:
By default: /home/skylight/so/logs/hamon/hamon.log


Note: If the so-logs volume-location is configured, the Redundancy Logs is stored at ${volume-location so-logs}/hamon/hamon.log

To view the contents of the redundancy log file

  1. If you are not already logged in on the socli, log in as the skylight user.
    The Skylight prompt is displayed.
  2. Access the OS shell by entering:
shell host
  1. View the redundancy log file in the vi editor by entering:
cat logs/hamon/hamon.log


Note: If the so-logs volume-location is configured, enter:
cat ${volume-location so-logs}/hamon/hamon.log

  1. Exit the shell of Docker host by entering:
exit

You are returned to the Skylight prompt.

© 2024 Cisco and/or its affiliates. All rights reserved.
 
For more information about trademarks, please visit: Cisco trademarks
For more information about legal terms, please visit: Cisco legal terms

For legal information about Accedian Skylight products, please visit: Accedian legal terms and tradmarks



Was this article helpful?

Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.