New: Try our AI‑powered Search (Ctrl + K) — Read more

Backup and Restore Overview for Kubernetes

Prev Next

# Backup and Restore (Kubernetes)

For on-premises Kubernetes deployments, Cisco Provider Connectivity Assurance provides automated backup solutions and restoration procedures for two classes of issues:

  1. Targeted application issues — Database corruption or accidental data deletion
  2. Infrastructure issues — Node or storage loss requiring rebuild

Backup Overview

PCA automatically backs up critical data stores using Kubernetes CronJobs that run daily. Backups are stored in the local MinIO object storage deployed as part of the solution.

Backed-Up Components

Component CronJob Name Default Schedule Storage Path
CouchDB backup-create-couchdb 01:00 UTC couchDB/v2/couchDB-Backup/
PostgreSQL postgres-backup-create 01:00 UTC postgres/v2/
Yang PostgreSQL yang-postgres-backup-create 01:00 UTC postgres/v2/
Elasticsearch backup-create-elasticsearch 01:00 UTC elasticsearch/v2/elasticsearch-Backup/
Dgraph Managed by stitchit-env-config 04:00 UTC dgraphBackup/v2/

To verify backup jobs are running:

kubectl get cronjobs -n pca

Targeted Backup and Restoration

Configuration and application data is automatically backed up daily to MinIO. These backups address specific database corruption or accidental data deletion scenarios.

Accessing Backups

You access backups through the MinIO pod using the MinIO client (mc).

  1. Connect to the MinIO pod:

    kubectl exec -it -n pca pca-minio-pool-0-0 -- sh
    
  2. Load MinIO credentials:

    . /tmp/minio/config.env
    
  3. Configure the MinIO client:

    mc alias set --insecure pca https://localhost:9000 "$MINIO_ROOT_USER" "$MINIO_ROOT_PASSWORD"
    
  4. List available backups (example for CouchDB):

    mc --insecure ls pca/<bucket-name>/couchDB/v2/couchDB-Backup/
    

    Replace <bucket-name> with your deployment's bucket name (visible in the backup CronJob configuration).

Manually Triggering a Backup

You can trigger an immediate backup from any pod with curl installed (such as airflow):

CouchDB:

kubectl exec -it -n pca airflow-0 -- curl -f couchdb:10003/backup

PostgreSQL:

kubectl exec -it -n pca airflow-0 -- curl http://postgres:10004/backup

Elasticsearch:

kubectl exec -it -n pca airflow-0 -- curl -f elasticsearch01:10005/backup

Restoration Procedures

Restoration procedures vary by component and require careful coordination to avoid data loss. The general process involves:

  1. Export backup files from MinIO to the admin VM
  2. Identify the target node and persistent volume claim (PVC)
  3. Stop the affected service and dependent services
  4. Replace data files with backup contents
  5. Restart services in the correct order
  6. Validate data integrity

For detailed step-by-step restoration procedures, see the component-specific guides:

Important: For complex recovery scenarios or if you encounter issues, contact Cisco TAC for assistance.

Infrastructure Issues Requiring Rebuild

For catastrophic scenarios involving node loss or storage failures, application backups alone may not be sufficient for rapid recovery.

Recommended Snapshots

Maintain regular snapshots of the following for disaster recovery:

All cluster nodes:

  • /var/lib/embedded-cluster/ — Contains OpenEBS local storage PVCs

Admin node only:

  • Replicated Admin Console configuration
  • Deployment configuration files

Storage Considerations

Embedded and air-gapped deployments use OpenEBS local storage (openebs-hostpath), meaning persistent volume data resides directly on node filesystems at:

/var/lib/embedded-cluster/openebs-local/<pvc-id>/

To identify which node hosts a specific service's data:

kubectl get pods -n pca -o wide | grep <service-name>
kubectl get pvc -n pca | grep <service-name>

Recovery from Infrastructure Loss

Recovery from node or storage loss requires:

  1. Restore the node or provision a replacement
  2. Restore filesystem snapshots to the appropriate paths
  3. Rejoin the node to the cluster (if applicable)
  4. Verify PVC bindings and service health

Important: Contact Cisco TAC for infrastructure recovery. TAC provides deployment-specific guidance based on your cluster topology and the nature of the failure.

Post-Restoration Validation

After any restoration:

  1. Verify service health through the PCA UI

  2. Check that data is processing correctly

  3. Review logs for errors:

    kubectl logs -f -n pca <pod-name>
    
  4. For time-sensitive data, you may need to backfill jobs for the period between the backup timestamp and the restoration time

Related Topics