Perform etcd Backup for Restricted Environment on OCP 4.3.x

Etcd is the key-value store for OpenShift Container Platform, which persists the state of all resource objects.
Back up your cluster’s etcd data regularly and store in a secure location ideally outside the OpenShift Container Platform environment. Do not take an etcd backup before the first certificate rotation completes, which occurs 24 hours after installation, otherwise the backup will contain expired certificates. It is also recommended to take etcd backups during non-peak usage hours, as it is a blocking action.

I was in OCP 4.3.0 Restricted Environment where OCP Nodes have no Internet Connection even through Proxy, and noticed script failed as it tried to download the etcdctl from Internet.

[[email protected] ~]# ssh -i .ssh/id_rsa [email protected]
[[email protected] ~]$ sudo /usr/local/bin/ ./assets/backup
Creating asset directory ./assets
Downloading etcdctl binary..

In high level to make the etcd backup successful, I had to find etcdctl and copied somewhere (/root/etcdctl), and modified script

[[email protected] ~]# find / -iname etcdctl*

[[email protected] ~]# diff /usr/local/bin/ /usr/local/bin/
< ETCDCTL="/root/etcdctl"
> ETCDCTL="${ASSET_DIR}/bin/etcdctl"
<   # dl_etcdctl
>   dl_etcdctl

Then performed the backup:

[[email protected] ~]# /usr/local/bin/ assets/backup/
Trying to backup etcd client certs..
etcd client certs found in /etc/kubernetes/static-pod-resources/kube-apiserver-pod-14 backing up to ./assets/backup/
Backing up /etc/kubernetes/manifests/etcd-member.yaml to ./assets/backup/
Trying to backup latest static pod resources..
Snapshot saved at ./assets/tmp/snapshot.db
snapshot db and kube resources are successfully saved to assets/backup//snapshot_db_kuberesources_2020-02-25_030239.tar.gz!

We need to revert back the changes we have on script to avoid machine-config operatorgoes to DEGRADED state due to file mismatch, verification: oc describe pods -n machine-config-operator machine-config-daemon-XXX (the nodes where we modify the script)
To fix the DEGRADED state, we need to delete the problematic pods 

– Do not forget to store the snapshot backup file somewhere outside the OCP Nodes 
– For OCP nodes connected using proxy, We might need to add HTTP(S)_PROXY environment variables on the script or export them before running the script
– For OCP 4.3.5 and later, You might not need to modify the backup script.

Disclaimer: The views expressed and the content shared are those of the author and do not reflect the views of the author’s employer or techbeatly platform.

Platform Consultant at Red Hat, Oracle Engineered Systems Specialist


