Control Plane Baremetal Node Failure - Unplanned
kubectl get nodes
In the following example, the status of the primary control plane 1 node changes to NotReady after it fails.user1-cloud@kali-stacked-control-plane:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
kali-stacked-control-plane1 NotReady control-plane 136m v1.21.0
kali-stacked-control-plane2 Ready control-plane 10h v1.21.0
kali-stacked-control-plane3 Ready control-plane 10h v1.21.0
All the pods in the failed primary control plane 1 Bare Metal node remains either terminated or in pending state. You verify the status of the pods using the kubectl get pods command as shown in the following example:
user1-cloud@kali-stacked-controlplane3:~$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-5d7fff4bc6-lxkpc 1/1 Running 0 7h26m
kube-system calico-node-tx7zg 1/1 Running 0 10h
kube-system calico-node-v6m7v 1/1 Running 0 10h
kube-system coredns-66d57f55d9-6dnsn 1/1 Running 0 136m
kube-system coredns-66d57f55d9-rdtbd 1/1 Running 0 136m
kube-system etcd-kali-stacked-controlplane2 1/1 Running 0 10h
kube-system etcd-kali-stacked-controlplane3 1/1 Running 0 10h
kube-system kube-apiserver-kali-stacked-controlplane2 1/1 Running 0 10h
kube-system kube-apiserver-kali-stacked-controlplane3 1/1 Running 0 10h
To replace the failed primary control plane 1 Bare Metal node:
-
Delete the failed primary control plane 1 Bare Metal node using the following command:
kubectl delete node node_name
Example:
user1-cloud@kali-stacked-controlplane3:~$ kubectl delete node kali-stacked-controlplane1 node "kali-stacked-controlplane1" deleted
-
Assign the primary control plane 1 Bare Metal node to maintenance mode in the cluster configuration using the following commands:
configure clusters cluster_name nodes controlplane1 maintenance true commit end
Example:
SMI Cluster Deployer(config)# clusters kali-stacked SMI Cluster Deployer(config-clusters-kali-stacked)# nodes controlplane1 SMI Cluster Deployer(config-nodes-controlplane1)# maintenance true SMI Cluster Deployer(config-nodes-controlplane1)# commit Commit complete. SMI Cluster Deployer(config-nodes-controlplane1)# end SMI Cluster Deployer#
-
The node is ready for the RMA process.
NoteIf the remaining nodes need to be upgraded or NFs need to be synchronized, run a cluster sync in this state. However, it's not a part of the RMA process.
-
Add the node back to the cluster when it is repaired or replaced and available.
NoteIf you add a node after it's repaired, ensure that the disks are clean by clearing the boot drive and virtual drive on the node. This step is to ensure that the virtual drive is in a clean state without the previous state before you add it back. However, removal of the virtual drive is not required for a new replacement node.
-
Attach the new primary control plane 1 Bare Metal node and remove it from the maintenance mode in the cluster configuration using the following commands:
configure clusters cluster_name nodes controlplane1 maintenance false commit end
Example:
SMI Cluster Deployer(config)# clusters kali-stacked SMI Cluster Deployer(config-clusters-kali-stacked)# nodes controlplane1 SMI Cluster Deployer(config-nodes-controlplane1)# maintenance false SMI Cluster Deployer(config-nodes-controlplane1)# commit Commit complete. SMI Cluster Deployer(config-nodes-controlplane1)# end SMI Cluster Deployer#
-
Run the cluster synchronization using the following command:
clusters cluster_name actions sync run debug true
Example:
SMI Cluster Deployer# clusters kali-stacked actions sync run debug true This will run sync. Are you sure? [no,yes] yes message accepted
-
Verify the status of the cluster using the following command:
clusters cluster_name actions k8s cluster-status
Example:
SMI Cluster Deployer# clusters kali-stacked actions k8s cluster-status pods-desired-count 40 pods-ready-count 39 pods-desired-are-ready true etcd-healthy true all-ok true
NOTES:
-
clusters cluster_name - Specifies the K8s cluster.
-
nodes controlplane1 - Specifies primary control plane 1 Bare Metal node.
-
maintenance true/false - Assigns or removes the primary control plane 1 Bare Metal mode to maintenance mode
-
actions sync run debug true - Synchronizes the cluster configuration.
-
actions k8s cluster-status - Displays the status of the cluster.