Backup and Restore (Kubernetes)

This document provides an overview of the automated backup and restoration procedures available for on-premises Kubernetes deployments using Provider Connectivity Assurance . It covers backup strategies for targeted application issues and infrastructure failures, guiding users through accessing backups, triggering manual backups, restoration steps, and recovery from catastrophic infrastructure loss.

Cisco Provider Connectivity Assurance ensures data protection and rapid recovery by leveraging Kubernetes CronJobs and MinIO object storage, supporting critical components such as CouchDB, PostgreSQL, Elasticsearch, and Dgraph.

Provider Connectivity Assurance automatically backs up critical data stores using Kubernetes CronJobs that run daily. Backups are stored in the local MinIO object storage deployed as part of the solution.

Backed-Up Components

Component	CronJob Name	Default Schedule	Storage Path
CouchDB	`backup-create-couchdb`	01:00 UTC	`couchDB/v2/couchDB-Backup/`
PostgreSQL	`postgres-backup-create`	01:00 UTC	`postgres/v2/`
Yang PostgreSQL	`yang-postgres-backup-create`	01:00 UTC	`postgres/v2/`
Elasticsearch	`backup-create-elasticsearch`	01:00 UTC	`elasticsearch/v2/elasticsearch-Backup/`
Dgraph	Managed by `stitchit-env-config`	04:00 UTC	`dgraphBackup/v2/`

To verify backup jobs are running:

kubectl get cronjobs -n pca

Targeted Backup and Restoration

Configuration and application data are automatically backed up daily to MinIO. These backups address specific database corruption or accidental data deletion scenarios.

Access Backup

You can access backups through the MinIO pod using the MinIO client (mc):

Connect to the MinIO pod:

kubectl exec -it -n pca pca-minio-pool-0-0 -- sh

Load MinIO credentials:
```
. /tmp/minio/config.env
```

Configure the MinIO client:

mc alias set --insecure pca https://localhost:9000 "$MINIO_ROOT_USER" "$MINIO_ROOT_PASSWORD"

List available backups (example for CouchDB):
```
mc --insecure ls pca/<bucket-name>/couchDB/v2/couchDB-Backup/
```
Replace <bucket-name> with your deployment's bucket name (visible in the backup CronJob configuration).

Manual Backup Trigger

You can trigger an immediate backup from any pod with curl installed (such as airflow):

CouchDB:

kubectl exec -it -n pca airflow-0 -- curl -f couchdb:10003/backup

PostgreSQL:

kubectl exec -it -n pca airflow-0 -- curl http://postgres:10004/backup

Elasticsearch:

kubectl exec -it -n pca airflow-0 -- curl -f elasticsearch01:10005/backup

Restoration Procedures

Restoration procedures vary by component and require careful coordination to avoid data loss. The general process involves:

Export backup files from MinIO to the admin VM
Identify the target node and persistent volume claim (PVC)
Stop the affected service and dependent services
Replace data files with backup contents
Restart services in the correct order
Validate data integrity

For detailed step-by-step restoration procedures, see the component-specific guides:

Important: For complex recovery scenarios or if you encounter issues, contact Cisco TAC for assistance.

Infrastructure Issues Requiring Rebuild

For catastrophic scenarios involving node loss or storage failures, application backups alone may not be sufficient for rapid recovery.

Recommended Snapshots

Maintain regular snapshots of the following for disaster recovery:

All cluster nodes:

/var/lib/embedded-cluster/ — Contains OpenEBS local storage PVCs

Admin node only:

Replicated Admin Console configuration
Deployment configuration files

Storage Considerations

Embedded and air-gapped deployments use OpenEBS local storage (openebs-hostpath), meaning persistent volume data resides directly on node filesystems at:

/var/lib/embedded-cluster/openebs-local/<pvc-id>/

To identify which node hosts a specific service's data:

kubectl get pods -n pca -o wide | grep <service-name>
kubectl get pvc -n pca | grep <service-name>

Recovery from Infrastructure Loss

Recovery from node or storage loss requires:

Restore the node or provision a replacement
Restore filesystem snapshots to the appropriate paths
Rejoin the node to the cluster (if applicable)
Verify PVC bindings and service health

Important: Contact Cisco TAC for infrastructure recovery. TAC provides deployment-specific guidance based on your cluster topology and the nature of the failure.

Post-Restoration Validation

After any restoration:

Verify service health through the Provider Connectivity Assurance UI
Check that data is processing correctly
Review logs for errors:
```
kubectl logs -f -n pca <pod-name>
```
For time-sensitive data, you may need to backfill jobs for the period between the backup timestamp and the restoration time