As we know OpenShift clusters are bundled with multiple compute nodes, master nodes, infra nodes etc, it’s not a big deal to manage node maintenance for OS patching kind of activities. But we need to ensure we have enough capacity on other nodes to balance the workload.
When there is a maintenance work – eg: Kernel patching – we need to exercise this without impacting those pods and application running on cluster.
This is to ensure no more pods can be scheduled for placement on the node.
Check node status – eg: compute-102
[[email protected] ~]# oc get nodes |grep compute-102
compute-102 Ready 1y v1.6.1+5115d708d7
Update to SchedulingDisabled
[root@master-101 ~]# oadm manage-node compute-102 --schedulable=false
NAME STATUS AGE VERSION
compute-102 Ready,SchedulingDisabled 1y v1.6.1+5115d708d7
You can simply run below command to for this task.
# oc adm drain compute-102
But most of the time it will not work as there will be pods with local data or some pods with daemons running. So we need to add additional options such as –ignore-daemonsets, –delete-local-data etc.
[[email protected] ~]# oc adm drain compute-102 --delete-local-data --ignore-daemonsets --force
node "compute-102" already cordoned
WARNING: Ignoring DaemonSet-managed pods: logging-fluentd-1gttp; Deleting pods with local storage: myapp-1-1kr16, uysed-25-m7qk4, postgresql-1-xt7bm
Then you can see the warning messages and pods are evacuating from the node compute-102.
Wait for all pods to remove and something like below.
node "compute-102" drained
So your node is free now to do any kind of activity since we have disabled scheduling and evacuated all pods.
Let’s verify no pods are running on the node
[[email protected] ~]# oadm manage-node compute-102 --list-pods
Listing matched pods on node: compute-102
NAME READY STATUS RESTARTS AGE
logging-fluentd-1gttp 1/1 Running 1 1d
Once you finished your task – eg: patching and rebooting – wait for server/node to back online. Yeah, maybe you don’t need to reboot; it might be a change in configuration.
On node, make sure openvswitch, docker and atomic-openshift-node.service services are up and running.
[roo[email protected] ~]# oadm manage-node compute-102 --schedulable=true
NAME STATUS AGE VERSION
compute-102 Ready 1y v1.6.1+5115d708d7
Wait for nodes getting pods and do some check.
That’s it
Disclaimer: The views expressed and the content shared are those of the author and do not reflect the views of the author’s employer or techbeatly platform.
Gineesh Madapparambath
Gineesh Madapparambath is the founder of techbeatly and he is the author of the book - ๐๐ป๐๐ถ๐ฏ๐น๐ฒ ๐ณ๐ผ๐ฟ ๐ฅ๐ฒ๐ฎ๐น-๐๐ถ๐ณ๐ฒ ๐๐๐๐ผ๐บ๐ฎ๐๐ถ๐ผ๐ป.
He has worked as a Systems Engineer, Automation Specialist, and content author. His primary focus is on Ansible Automation, Containerisation (OpenShift & Kubernetes), and Infrastructure as Code (Terraform).
(aka Gini Gangadharan - iamgini.com)
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Leave a Reply