Backup and Restore Overview for Kubernetes

# Backup and Restore (Kubernetes)

For on-premises Kubernetes deployments, Cisco Provider Connectivity Assurance provides automated backup solutions and restoration procedures for two classes of issues:

Targeted application issues — Database corruption or accidental data deletion
Infrastructure issues — Node or storage loss requiring rebuild

Backup Overview

PCA automatically backs up critical data stores using Kubernetes CronJobs that run daily. Backups are stored in the local MinIO object storage deployed as part of the solution.

Backed-Up Components

Component	CronJob Name	Default Schedule	Storage Path
CouchDB	`backup-create-couchdb`	01:00 UTC	`couchDB/v2/couchDB-Backup/`
PostgreSQL	`postgres-backup-create`	01:00 UTC	`postgres/v2/`
Yang PostgreSQL	`yang-postgres-backup-create`	01:00 UTC	`postgres/v2/`
Elasticsearch	`backup-create-elasticsearch`	01:00 UTC	`elasticsearch/v2/elasticsearch-Backup/`
Dgraph	Managed by `stitchit-env-config`	04:00 UTC	`dgraphBackup/v2/`

To verify backup jobs are running:

kubectl get cronjobs -n pca

Targeted Backup and Restoration

Configuration and application data is automatically backed up daily to MinIO. These backups address specific database corruption or accidental data deletion scenarios.

Accessing Backups

You access backups through the MinIO pod using the MinIO client (mc).

Connect to the MinIO pod:

kubectl exec -it -n pca pca-minio-pool-0-0 -- sh

Load MinIO credentials:
```
. /tmp/minio/config.env
```

Configure the MinIO client:

mc alias set --insecure pca https://localhost:9000 "$MINIO_ROOT_USER" "$MINIO_ROOT_PASSWORD"

List available backups (example for CouchDB):
```
mc --insecure ls pca/<bucket-name>/couchDB/v2/couchDB-Backup/
```
Replace <bucket-name> with your deployment's bucket name (visible in the backup CronJob configuration).

Manually Triggering a Backup

You can trigger an immediate backup from any pod with curl installed (such as airflow):

CouchDB:

kubectl exec -it -n pca airflow-0 -- curl -f couchdb:10003/backup

PostgreSQL:

kubectl exec -it -n pca airflow-0 -- curl http://postgres:10004/backup

Elasticsearch:

kubectl exec -it -n pca airflow-0 -- curl -f elasticsearch01:10005/backup

Restoration Procedures

Restoration procedures vary by component and require careful coordination to avoid data loss. The general process involves:

Export backup files from MinIO to the admin VM
Identify the target node and persistent volume claim (PVC)
Stop the affected service and dependent services
Replace data files with backup contents
Restart services in the correct order
Validate data integrity

For detailed step-by-step restoration procedures, see the component-specific guides:

Important: For complex recovery scenarios or if you encounter issues, contact Cisco TAC for assistance.

Infrastructure Issues Requiring Rebuild

For catastrophic scenarios involving node loss or storage failures, application backups alone may not be sufficient for rapid recovery.

Recommended Snapshots

Maintain regular snapshots of the following for disaster recovery:

All cluster nodes:

/var/lib/embedded-cluster/ — Contains OpenEBS local storage PVCs

Admin node only:

Replicated Admin Console configuration
Deployment configuration files

Storage Considerations

Embedded and air-gapped deployments use OpenEBS local storage (openebs-hostpath), meaning persistent volume data resides directly on node filesystems at:

/var/lib/embedded-cluster/openebs-local/<pvc-id>/

To identify which node hosts a specific service's data:

kubectl get pods -n pca -o wide | grep <service-name>
kubectl get pvc -n pca | grep <service-name>

Recovery from Infrastructure Loss

Recovery from node or storage loss requires:

Restore the node or provision a replacement
Restore filesystem snapshots to the appropriate paths
Rejoin the node to the cluster (if applicable)
Verify PVC bindings and service health

Important: Contact Cisco TAC for infrastructure recovery. TAC provides deployment-specific guidance based on your cluster topology and the nature of the failure.

Post-Restoration Validation

After any restoration:

Verify service health through the PCA UI
Check that data is processing correctly
Review logs for errors:
```
kubectl logs -f -n pca <pod-name>
```
For time-sensitive data, you may need to backfill jobs for the period between the backup timestamp and the restoration time