Fault Management Configuration Guide

Prev Next

Provider Connectivity Assurance provides comprehensive Fault collection, alerting, visualization, and correlation. This article provides step-by-step instructions to configure the Fault Management feature within Provider Connectivity Assurance.

Solution Architecture

Fault management in Provider Connectivity Assurance consists of a collector that receives SNMP traps from network devices and forwards them to Provider Connectivity Assurance for correlation and visualization, and optionally to OSS systems.
The Fault feature is enabled within the Fault & Mobility PM Collector, which includes its own configuration UI in release 25.07 (to be integrated directly into Provider Connectivity Assurance in a future release). The architecture is depicted below:

PCA Fault Solution Architecture

Configuration Procedure

To configure Fault collection, alerting, and visualization in Provider Connectivity Assurance, and to deliver faults to northbound systems, follow these steps:

  1. Enable the Observability Framework within the Fault & Mobility PM Collector.

  2. Add SNMP device credentials to the configuration file in the Fault & Mobility PM Collector to enable device connectivity and trap collection.

  3. Set up sending of faults to northbound integrated systems.

  4. Enable the OTEL-based data flow from the Fault & Mobility PM Collector to Provider Connectivity Assurance.

Step 1: Enable the Observability Framework

The Observability Framework processes SNMP traps from devices, enriches the data, and sends Faults to Provider Connectivity Assurance.

To enable the Observability Framework plug-in using the configuration UI:

  1. Log in to the administrative UI with admin credentials.
  2. Navigate to the App Plug-ins tab.
  3. Enable the plug-in.
  4. Click Add and select Observability Framework from the list.
  5. Click the > arrow to add the plug-in.
  6. The Observability Framework will now appear in the list of chosen plug-ins.
  7. Click Save.

At this stage, the Observability Framework is enabled.

Enable Observability Framework

Step 2: Add SNMP Device Credentials to snmptrapd.conf

Pre-requisite: Ensure all monitored SNMP devices are configured and reachable from the Fault & Mobility PM Collector.
Add SNMP credentials to vmount/snmptrapd/snmptrapd.conf. See below for sample configurations.

For SNMPv2:

# Example SNMPv2 entry
rocommunity public

For SNMPv3:

# Example SNMPv3 entry
createUser username SHA "authpass" AES "privpass"
rouser username

After adding credentials, restart the SNMP trap daemon service:

systemctl restart matrix-of-snmptrapd

Step 3: Set Up Sending Faults to Northbound Integrated Systems

This section explains how to configure northbound integration to OSS via SNMP (SNMPv3 only).

  1. Configure the ITSM Plugin using the Fault & Mobility PM Collector configuration UI.
  2. Enable ItsmSnmpHandler in the Northbound integration pipeline.
  3. Create consumers in the Northbound integration pipeline for SNMP faults and alarms.
  4. Configure the ITSM plugin for Northbound integration using the SNMP method.
  5. Enable the alarm resync feature for resynchronization towards OSS.

Configure ITSM Plugin

From the side navigation bar:
Admin → Tenant → App Plugins → Add New Tenant Plugin → Select Tenant, Add ITSM, Save → Refresh the page.

Configure ITSM Plugin

Configure Handler in Northbound Integration Pipeline

From the side navigation bar:
Admin → Data Pipeline → Handlers → Add New Handler
Configure parameters as shown below, then save and refresh the page.

Configure Handler

Configure Consumers in Northbound Integration Pipeline

From the side navigation bar:
Admin → Data Pipeline → Consumers → Add New Consumer
Configure parameters as shown below, then save and refresh the page.

Configure Consumers

Example Consumer Configuration:

Topic: <global>.snmp
Handler: SnmpHandler
Data Schema: SnmpSchema
Schema Mapping:
{
  "alert_end_ts": "alert_end_ts",
  "alert_type": "status",
  "clear_event": "clear_event",
  "clearing_required": "clearing_required",
  "source_ip": "ip_address"
}
Topic: <global>.processchain_snmp
Handler: ItsmSnmpHandler
Data Schema: SnmpSchema
Schema Mapping:
{
  "alert_end_ts": "alert_end_ts",
  "alert_type": "status",
  "clear_event": "clear_event",
  "clearing_required": "clearing_required",
  "source_ip": "ip_address"
}

Add Northbound Integration for SNMP

In the Fault & Mobility PM Collector UI, follow this path:
Admin → ITSM → Northbound Integration (top panel).
To add a new integration, click the add icon, select the type, complete required fields, and save.

Add Northbound Integration

Alarm Resync

Use the Alarm Resync feature to resend failed alarms to specific or all ITSM API POST instances.
Enable resync in the Northbound Rule section of ITSM. If not enabled, failed alerts will be ignored.
Only one resync request can be in progress at a time.

Alarm Resync

To get the required cookie, visit https://<webapp-ip>/api/v1/iam/token and include it in your alarm resync API request.

Resync Alerts to all enabled API POST:

POST https://<webapp-ip>/api/v1/itsm/alarm-resync/
{
  "instance":[]
}

Resync Alerts to specific API POST:

{
  "instance":["instance_name_created_in_ui"]
}
{
  "instance":["instance_name_created_in_ui", "another_instance_name"]
}

Resync Alerts with Multiple API Requests: To run a resync even if one is already in progress, add force: true:

{
  "instance":["instance_name_created_in_ui"],
  "force": true
}

Multiple API Requests

Step 4: Enable the OTEL Pipeline to Deliver Faults to Provider Connectivity Assurance

Enable the OTEL data pipeline to send faults from the Fault & Mobility PM Collector to Provider Connectivity Assurance.

Pre-requisites:

Ensure the following services are running:

  • otel-transformer

  • otel-load-balancer

  • otel-collector-1

Make the configuration changes below to the respective services.

matrix-of-alertservice

Edit vmount/credentials.json to enable HTTPS connection forwarding to otel-transformer.
Example configuration:

{
  "services": {
    "kafka": {
      ...
      "connections": [
        {
          "topic": "tenantscale.snmp",
          "server": "<Kafka-HOST-IP>:9093",
          "cafile": "/app/ssl/ca-cert",
          "keyfile": "/app/ssl/cert.pem"
        }
      ]
    },
    "https": {
      "factory": "HttpConnectionFactory",
      "connections": [
        {
          "url": "http://<Observability framework-HOST-IP>:8087/api/alerts/process-json",
          "auth_type": "basic",
          "username": "<USER>",
          "password": "<PASSWORD>",
          "send_flag": true
        }
      ]
    }
  },
  "number_of_workers": 150
}

: Host IP of the Kafka service.
: Host IP of the otel-transformer service.
/ : Specify admin credentials as required.

otel-transformer

Ensure the following environment variables are set to forward traffic to the internal otel-load-balancer:

OTEL_TRANSFORMER_USERNAME=<USER>
OTEL_TRANSFORMER_PASSWORD=<PASSWORD>
OTEL_LOAD_BALANCER_ENDPOINT=http://otel-load-balancer:8080

/ : Use the otel-transformer user credentials (base64 encoded if required).

otel-load-balancer

Ensure vmount/otel_nginx_conf/nginx.conf includes the otel-collector-1 receiver endpoints.

OTEL Load Balancer Configuration

otel-collector-1

In vmount/otel_collector/otel-collector-config.yaml, configure the Provider Connectivity Assurance endpoint and access token:

receivers:
  otlp:
    protocols:
      http:
        endpoint: "0.0.0.0:4318"

exporters:
  otlphttp:
    endpoint: "<PCA_ENDPOINT_URL>"
    headers:
      Authorization: "Bearer <ACCESS_TOKEN>"
      Content-Type: "application/json"

: PCA endpoint URL to export SNMP faults.
: Valid access token for PCA endpoint.

Access Token Creation

To create the access token, use Zitadel Admin UI at auth.<deploymentURL> or by IP/port 3443:

  1. Switch to the target tenant organization.

  2. Under Users, create a new Service User.

  3. Assign the “tenant-admin” role (required for deployments ≥ 25.2.232).

  4. Generate a Personal Access Token (set an appropriate expiration date).

  5. Use the generated token as the Authorization header in the OTEL exporter configuration.

For further details, see: HOWTO: PCA with OTEL collector

At this stage, faults are fully configured to flow to Provider Connectivity Assurance and can be viewed in dashboards.