Fault & Mobility Assurance Collector Installation

Prev Next

Once you have installed Provider Connectivity Assurance, you can install the components used to ingest Faults and Mobility performance monitoring data.
To do this, follow the instructions in the three articles below, in this exact order:

  1. RKE 2 Cluster Installation (see below)

  2. Pipeline Installation

  3. Start-Up and Validation

RKE2 Cluster Installation

This article details the prerequisite requirements for installing RKE2 in an offline clustering state. It is written for an on-prem cluster environment.

A standard RKE2 installation consists of servers dedicated to specific tasks:

  • RKE2 - Security-focused Kubernetes

  • Longhorn - Unified storage layer

Deployment Prerequisite

VMs as per below resources (For Dev/Staging Environment):

VM Name

CPU

RAM

DISK

Registry VM (Internet Enabled)

8

16 GB

300 GB

3 Control Plane VM

12

16 GB

300 GB

N Agent (Worker VM)

12

16 GB

300 GB

  • VM Interfaces should be configured for Dual Stack (IPv4 and IPv6).

  • VM should have a single interface.

  • Setup Access (Server access).

  • SELinux should be disabled on all the cluster nodes and Internet machine.

  • Firewall should be disabled on all nodes.

  • VM Partition should be as per the below table.

  • NTP should be configured on all the nodes.

Note: Below table should be followed for all the nodes except image registry server where /var partition should be more than 150 GB (minimum 150 GB), instead of 50 GB.

Mount Point

Partition Size

File System Type

/

50 GB

xfs

/boot

1 GB

xfs

/boot/efi (optional)

1 GB

xfs

/home

20 GB

xfs

Swap

10 GB

xfs

/var

50 GB

xfs

/matrix

90% of Remaining

xfs

Registry Server Creation

The following section outlines the installation of application services across each server role.

Note: Ensure that the operating system matches that of the other nodes within the cluster.

Download RKE2 Scripts

With the setup ready, it's time to deploy the RKE2 cluster. Download the necessary scripts and RPM packages from the SharePoint link below and upload them to the internet machine and rke2 servers(control plane and worker nodes):

Cross-Domain Analytics - RKE2_Scripts - All Documents

Description

Upload Machine/Server

Upload Path

rke2_deployment.zip
Scripts for building and deploying the RKE2 control plane and worker nodes.

Internet Machine & RKE2 Servers (CP)

/opt/

rke2_docker_rpm_packages.zip
RPM packages are required only on the internet machine.

Internet Machine

/matrix/

rke2_rpm_packages.zip
RPM packages are required on both the internet machine and RKE2 servers.

Internet Machine & RKE2 Servers

/matrix/

Step 1: Login to the Internet Machine Server and Unzip the build files on the Internet Machine

# Connect to the server via SSH:
ssh root@<internet_machine_ip> 
 
# Navigate to the /opt directory:
cd /opt

# Verify the script folder exists:
ls -lrt
Ensure rke2_deployment.zip is listed.
rke2_deployment.zip

# Unzip the rke2_deployment.zip file:
unzip -o rke2_deployment.zip -d /opt

# Check if the scripts are present in the /opt directory:
ls -lrt /opt/rke2_deployment
Ensure the following scripts are listed:
rke2_longhorn_build.sh
rke2_control.sh
rke2_worker.sh
rke2_ltp.sh

# Grant execution permissions to the scripts:
chmod +x *.sh

Step 2: Unzip and Install RPM Packages on the Internet Machine and Start Docker

# Continue from above.
# Navigate to the /matrix directory:
cd /matrix

# Unzip the required RPM packages:
unzip rke2_docker_rpm_packages.zip
unzip rke2_rpm_packages.zip

# Install the RPM packages:
rpm -ivh --force --nodeps /matrix/rke2_docker_rpm_packages/*.rpm
rpm -ivh --force --nodeps /matrix/rke2_rpm_packages/*.rpm

Note: Incase VM flavor different from Alma Linux to another OS please install similar packages according to the base OS.

# To enable and start Docker, run the following commands:
systemctl enable docker 
systemctl start docker 

Step 3: Create the .zst Build and Transfer to the Main Plain Server

# Continue from above.
# Navigate to the /opt directory:
cd /opt/rke2_deployment

Note: Before running the build script, ensure that the /opt directory has at least 25 GB of free space.

# Execute the build script:
./rke2_longhorn_build.sh build

# Run the following command to check if the .zst file exists in the /opt directory:
ls -lhrt /opt
cd /opt

# To securely copy the rke2_rancher_longhorn.zst file to the target server, use the following command:
scp rke2_longhorn.zst <username>@<controld_plane_server-ip>:/opt/

Create a Registry Server

Step 1: Add below entries in /etc/docker/daemon.json on registry server

#Create the daemon.json 
vi /etc/docker/daemon.json 

{ 
  "ipv6": true, 
  "fixed-cidr-v6": "2405:420:54ff:84::/64", 
  "live-restore": true, 
  "userns-remap": "default", 
  "log-level": "info" 
} 

#reload docker daemon 
systemctl restart docker 

Step 2: to Create SSL Certificates for Docker Registry

#Update OpenSSL Configuration with VM IP 
# Add the following entries to the /etc/hosts file on each node in the cluster: 
vim /etc/hosts 

... 
127.0.0.1 <registry-name> 
<Registry-Server-IPv4>  <registry-name> 
<Registry-Server-IPv6>  <registry-name> 
... 
 
#Locate and edit the OpenSSL configuration file: 
find / -name openssl.cnf 
vi /etc/pki/tls/openssl.cnf 
#Add the following entry under [ v3_ca ] section. 
... 
[ v3_ca ]  
subjectAltName=IP:IP_ADDRESS_OF_YOUR_VM 
 
[ alt_names ] 
DNS.1 = <registry-name> 
IP.1 = IP_ADDRESS_OF_YOUR_VM 
... 

#Create a Directory for Certificates. 
mkdir -p /certificates && cd /certificates 
mkdir -p /matrix/docker_data 
 
# Generate SSL Certificates 
openssl req \ 

 -newkey rsa:4096 -nodes -sha256 -keyout docker.key \ 
 -x509 -days 365 -out docker.crt 
 
###During the prompt, enter the required details. When asked for: 
Common Name (e.g. server FQDN or YOUR name) []: IP_ADDRESS_OF_YOUR_VM 
 
#for others press enter key. 

# Generate SSL Certificates 
chmod 775 /certificates /matrix/docker_data 
chmod 444 /certificates/docker.key 
 
#Create a Directory for Authentication Files 
mkdir -p ~/registry/auth 
 
# Install httpd-tools (if not already installed) 
yum install httpd-tools -y 
 
# Install httpd-tools (if not already installed) 
htpasswd -Bbn admin admin123 > ~/registry/auth/htpasswd 
 
sudo mkdir -p /etc/docker/certs.d/<registry-name>:5000 
 
#Please make sure to copy docker.crt as ca.crt (or rename later on)  
sudo cp /certificates/docker.crt /etc/docker/certs.d/<registry-name>:5000/ca.crt 
  
sudo ls /etc/docker/certs.d/<registry-name>:5000/ 
  
#reload docker daemon to use the ca.crt certificate 
systemctl restart docker 

Step 3: Download the Registry Image from Docker Hub

Download the Registry Image from docker hub as per below table.

Service Name

Image Details

Registry

dockerhub.cisco.com/matrixcx-docker/matrix4/rke2-local-registry:3.0.0

Step 4: Create the Local Registry on Registry Server

#Download the registry image on host machine 
 
#Ensure required directories and files exist: 
ls -ld /certificates /root/registry/auth /matrix/docker_data 
ls -l /certificates/docker.crt /certificates/docker.key /root/registry/auth/htpasswd 

#If any are missing, create them and set proper permissions. 
#Install the registry image from dockerhub. 

docker pull <image_name>  
 
#Run the registry container. 

docker run -d \ 
  --name registry \ 
  --restart=on-failure:5 \ 
  --read-only \ 
  -v /certificates:/certificates \ 
  -v /root/registry/auth:/auth \ 
  -v /matrix/docker_data:/var/lib/registry \ 
  -e REGISTRY_HTTP_TLS_CERTIFICATE=/certificates/docker.crt \ 
  -e REGISTRY_HTTP_TLS_KEY=/certificates/docker.key \ 
  -e REGISTRY_AUTH=htpasswd \ 
  -e REGISTRY_AUTH_HTPASSWD_REALM="Registry Realm" \ 
  -e REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd \ 
  -p <VM_IPv4>:5000:5000 \ 
  -p <VM_IPv6>:5000:5000 \ 
 <Registry_Image_name>:<Tag> 

Step 5: Push all Images to the Local Registry Server

(As per above mentioned table.)

# login in your local registry 
docker login <registry-name> 

# Navigate to the /opt directory: 
cd /opt/rke2_deployment 

# Execute the build to push cert image script:  
#Note : It will ask registry name please provide as per actual  
#example : Enter the registry name (e.g., caloregistry.io): 
./rke2_ltp.sh /opt/rancher/images/longhorn

Deploy RKE2 (Control Plane)

Step 1: Login to the Control Plane: Login to the Control Plane Server and Unzip the Build Files

# Connect to the server via SSH:
ssh root@<control plane_ip> 
 
# Navigate to the /opt directory:
cd /opt

# Verify the script folder exists:
ls -lrt
Ensure rke2_deployment.zip is listed.
rke2_deployment.zip

# Unzip the rke2_deployment.zip file:
unzip -o rke2_deployment.zip -d /opt

# Check if the scripts are present in the /opt directory:
ls -lrt /opt/rke2_deployment
Ensure the following scripts are listed:
rke2_longhorn_build.sh
rke2_control.sh
rke2_worker.sh
rke2_ltp.sh

# Grant execution permissions to the scripts:
chmod +x *.sh

# move scripts to /opt dir
mv * /opt
cd ..
rm -rf rke2_deployment

Step 2: Unzip and Install RPM Packages on the Control Plane

# Continue from above.
# Navigate to the /matrix directory:
cd /matrix

# Unzip the required RPM packages:
unzip rke2_rpm_packages.zip

# Install the RPM packages:
rpm -ivh --force --nodeps /matrix/rke2_rpm_packages/*.rpm

Step 3: Disable SELinux and Firewall on the on the System

# edit selinux config file
sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config

# Stop and disable the firewall service to prevent it from starting on boot
systemctl stop firewalld
systemctl disable firewalld

# Reboot the node
reboot

Step 4: Create the main control-plane node

# Navigate to the script directory:
cd /opt/ 

# Run the following command to check if the .zst file exists in the /opt directory:
ls -lhrt /opt/

# Execute the build script:
./rke2_control.sh control

Step 5: Verify RKE2 Server Status and Enable kubectl Binary

# Check if the rke2-server service is running:
systemctl status rke2-server.service

# Enable the kubectl binary by updating environment variables:
echo 'export KUBECONFIG=/etc/rancher/rke2/rke2.yaml' >> ~/.bashrc 
echo 'export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml' >> ~/.bashrc echo 'export PATH=$PATH:/var/lib/rancher/rke2/bin' >> ~/.bashrc

# Apply the changes immediately
source ~/.bashrc

Step 6: Create registries.yaml Configuration File

#Transfer /certificates from resistry machine to path /opt/rancher
# Connect to the server via SSH:
ssh root@<internet_machine_ip> 

# scp registry certificates to control plain server
scp -r /certificates root@<control-plane-ip>:/opt/rancher/

# create the file for editing in control plane:
vi /etc/rancher/rke2/registries.yaml

# Add the following configuration:
mirrors:
  "<registry_name>":					#example: caloregistry5.io
    endpoint:
      - "https://<registry_name>:5000"		#example: caloregistry5.io

configs:
  "<registry_name>:5000":				#example: caloregistry5.io
    auth:
      username: admin
      password: admin123
    tls:
      cert_file: /opt/rancher/certificates/docker.crt
      key_file: /opt/rancher/certificates/docker.key
      insecure_skip_verify: true


# Update the /etc/hosts file:
vi /etc/hosts

# Add the following configuration:
x.x.x.x <registry_name>


# To apply changes and ensure the RKE2 server is running properly, restart the service:
systemctl restart rke2-server.service

# After restarting, check if the service is active and running:
systemctl status rke2-server.service

Deploy Worker Nodes

Step 1: Login to the Worker Server and Unzip the build files

# Connect to the server via SSH:
ssh root@<worker_ip> 
 
# Create the /opt/rancher directory:
mkdir -p /opt/rancher

# Verify the script folder exists:
mount -t nfs <control-plane-ip>:/opt/rancher /opt/rancher 


do scp from control plane
scp -r rke2_worker.sh root@worker_ip:/opt

# Grant execution permissions to the scripts (if not):
chmod +x *.sh

Step 2: Unzip and Install RPM Packages on the worker-nodes

TO COPY CODE:

# Continue from above.
# Navigate to the /matrix directory:
cd /matrix

# Unzip the required RPM packages:
unzip rke2_rpm_packages.zip

# Install the RPM packages:
rpm -ivh --force --nodeps /matrix/rke2_rpm_packages/*.rpm

Step 3: Disable SELinux and Firewall on the System

# edit selinux config file
sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config

# Stop and disable the firewall service to prevent it from starting on boot
systemctl stop firewalld
systemctl disable firewalld

# Reboot the node
reboot

Step 4: Add the Mount Point and Create the Main Worker Node

# Navigate to the /opt directory:
cd /opt

# Run the following command to check if the .zst file exists in the /opt directory:
ls -lhrt /opt

# Execute the build script:
./rke2_worker.sh worker

Step 5: Verify RKE2 Agent Status

# Check if the rke2-agent service is running:
systemctl status rke2-agent.service

Step 6: Create registries.yaml Configuration File

#Transfer /certificates from resistry machine to path /opt/rancher
# Connect to the server via SSH:
ssh root@<internet_machine_ip> 

# scp registry certificates to worker server
scp -r /certificates root@<worker1-ip>:/opt/rancher/

# create the file for editing in worker:
vi /etc/rancher/rke2/registries.yaml	

# Add the following configuration:
mirrors:
  "<registry_name>":					#example: caloregistry5.io
    endpoint:
      - "https://<registry_name>:5000"		#example: caloregistry5.io

configs:
  "<registry_name>:5000":				#example: caloregistry5.io
    auth:
      username: admin
      password: admin123
    tls:
      cert_file: /opt/rancher/certificates/docker.crt
      key_file: /opt/rancher/certificates/docker.key
      insecure_skip_verify: true


# Update the /etc/hosts file:
vi /etc/hosts

# Add the following configuration:
x.x.x.x <registry_name>

# To apply changes and ensure the RKE2 agent is running properly, restart the service:
systemctl restart rke2-agent.service

# After restarting, check if the agent is active and running:
systemctl status rke2-agent.service

Adding Master Nodes for HA

Follow and complete steps 1 to 5 above, from section: Steps to Deploy Worker Nodes and then follow below step to transition to control plane HA nodes.

Step 1: Transition to control plane HA nodes:

# Stop the RKE2 Agent Service:
systemctl stop rke2-agent.service

# Disable the RKE2 Agent Service to prevent it from starting on boot:
systemctl disable rke2-agent.service

# Start the RKE2 Server Service for the control plane:
systemctl enable --now rke2-server.service

# Verify that the server service is running:
systemctl status rke2-server.service

# Enable the kubectl binary by updating environment variables:
echo 'export KUBECONFIG=/etc/rancher/rke2/rke2.yaml' >> ~/.bashrc 
echo 'export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml' >> ~/.bashrc echo 'export PATH=$PATH:/var/lib/rancher/rke2/bin' >> ~/.bashrc

# Apply the changes immediately
source ~/.bashrc

# Verify kubectl is working:
kubectl get nodes

Next, you need to follow and complete Step 6 from Section Deploy RKE2. Once that is done you need to follow the steps below to transition to control plane HA nodes:

Install Helm on 2nd and 3rd Control Plane Nodes

Step 1: Install Helm on an Control plane

# Run the following commands on the control-plane 2 and control plane 3 machine:

cd /opt/rancher/helm
tar -zxvf helm-v3.14.3-linux-amd64.tar.gz > /dev/null 2>&1
rsync -avP linux-amd64/helm /usr/local/bin/ > /dev/null 2>&1

Deploy Longhorn on 1st Control Plane

Note: By default, Longhorn creates 2 replicas, but it provides the flexibility to adjust the replica set configuration by modifying the Values.yaml file on path” cd /opt/rancher/helm/longhorn “

Step 1: Configure Longhorn Replica Settings in values.yaml

# Navigate to the Longhorn Helm Chart Directory
cd /opt/rancher/helm/longhorn

# Open the values.yaml File for Editing
vi values.yaml

# Locate and update these configurations:
# Image Repository Configuration
image:
  repository: <local_registry_name>  # Example: 10.126.87.14/longhornio/livenessprobe
  tag: <update_tag>  # Example: v2.9.0

# Persistence Settings
persistence:
  defaultClassReplicaCount: 2

# Default Settings
defaultSettings:
  defaultDataPath: /matrix

# Install the longhorn the first control plane:

helm install longhorn /opt/rancher/helm/longhorn --namespace longhorn-system --create-namespace --version {{ LONGHORN_VERSION }}

# Install the cert to the first control plane:

helm upgrade -i cert-manager /opt/rancher/helm/cert-manager-{{ CERT_VERSION }}.tgz --namespace cert-manager --create-namespace --set installCRDs=true --set image.repository={{ registry_name }}/cert/cert-manager-controller --set webhook.image.repository={{ registry_name }}/cert/cert-manager-webhook --set cainjector.image.repository={{ registry_name }}/cert/cert-manager-cainjector --set startupapicheck.image.repository={{ registry_name }}/cert/cert-manager-ctl

Step 2: Node Scheduling Disable for Longhorn-PVC.

Node scheduling should be disabled for longhorn-pvc on the master nodes and database nodes.

1) go to the Longhorn –UI via executing the below cli on master node, then it will open the longhorn UI.

Kubectl get po –n longhorn-system
Kubectl port-forward pod/<longhorn-UI-pod-name> --n longhorn-system 8000:8000 --address=’0.0.0.0’

2) Go to the browser
3) https://[ip6]:8000
4) go to under “nodes” section
5) select the database and master nodes 
6) Disable the node scheduling.

Verify the Nodes

#To check the nodes details 
 
kubectl get nodes 
kubectl get nodes -o wide 
 
#To validate deployed longhorn system 
 
kubectl get all -n longhorn-system 
kubectl get pods -n longhorn-system 

# To validate deployed cattle system 
 
kubectl get all -n cattle-system 
kubectl get pods -n cattle-system 

# To validate all deployment 
 
kubectl get all -A 

Verify Core-DNS Pod with Kube-System Namespace

If you find that kube-system pods are in a pending state, follow the steps below:

A file /var/lib/rancher/rke2/server/manifests/rke2-coredns-config.yaml should be created with similar content. After that, the rke2-server service should be restarted to apply the change.

# Create the following file:
vi /var/lib/rancher/rke2/server/manifests/rke2-coredns-config.yaml

# add the below content:
...
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-coredns
  namespace: kube-system
spec:
valuesContent: |-
zoneFiles:
  - filename: doit.tech.conf
  domain: doit.tech
  contents: |
    doit.tech:53 {
    errors
    cache 30
    forward . 10.0.254.1
  }
...


# Restart rke-server service
systemctl restart rke-server.service

# Configmap rke2-coredns-rke2-coredns can be reviewed to determine if the change was successful.
kubectl -n kube-system get configmap rke2-coredns-rke2-coredns -o json

Longhorn PVC Issue (Optional)

If you encounter PVC-related issues while connecting the Longhorn storage class to service deployments, follow these instructions.

Since we are using the Alma 8.8 operating system, certain bugs may be present. Applying this patch is recommended to resolve the issue, though its effectiveness may vary depending on the operating system.

# Edit the RKE2 configuration file:
kubectl -n longhorn-system patch lhsm “pvc-name” --type=merge --subresource status --patch 'status: {state: error}'

# Restart the RKE2 server:
systemctl restart rke2-server.service.

© 2025 Cisco and/or its affiliates. All rights reserved.
 
For more information about trademarks, please visit: Cisco trademarks
For more information about legal terms, please visit: Cisco legal terms

For legal information about Accedian Skylight products, please visit: Accedian legal terms and tradmarks