Unified RMA Procedure for the Control Plane and Worker Nodes
This section describes the unified RMA procedure applicable to the following scenarios:
-
Replace a working control plane Bare Metal node or worker node for maintenance.
-
Replace a failed Bare Metal control plane node or worker node in a stacked cluster.
Notes:
-
If you are performing RMA for a control plane node, you must ensure that the majority of control plane nodes are still available during the RMA process.
-
Disable auto-sync before you perform the RMA procedure.
Use the following steps to replace a working or failed control plane or worker node:
-
Drain and remove the node which is sent for maintenance, using the following command:
clusters cluster_name nodesnode_name actions sync drain remove-node true
Example:
SMI Cluster Deployer# clusters kali-stacked nodes controlplane1 actions sync drain remove-node true This will run drain on the node, disrupting pods running on the node. Are you sure? [no,yes] yes message accepted
-
For a planned maintenance scenario, shutdown the node if the node is still running.
-
Assign the node to maintenance mode in the cluster configuration using the following CLI commands:
config clusters cluster_name nodes node_name maintenance true commit end
Example:
SMI Cluster Deployer(config)# clusters kali-stacked SMI Cluster Deployer(config-clusters-kali-stacked)# nodes controlplane1 SMI Cluster Deployer(config-nodes-controlplane1)# maintenance true SMI Cluster Deployer(config-nodes-controlplane1)# commit Commit complete. SMI Cluster Deployer(config-nodes-controlplane1)# end SMI Cluster Deployer#
-
The node is ready for the RMA process.
NoteIf the remaining nodes must be upgraded or updated, run a cluster sync in this state. However, it's not a part of the RMA process.
-
Add the node back to the cluster when it is repaired or replaced and available.
NoteIf the remaining nodes have been upgraded to a new SMI release during the time when this node was under maintenance, then it's recommended to clear the boot drive and delete the virtual drive on the node. This step ensures that virtual drive is in a clean state without the previous state before you add it back. However, removal of the virtual drive is not required for a new replacement node.
-
Attach the new Bare Metal node and remove it from the maintenance mode in the cluster configuration using the following commands:
config clusters cluster_name nodes node_name maintenance false commit end
Example:
SMI Cluster Deployer(config)# clusters kali-stacked SMI Cluster Deployer(config-clusters-kali-stacked)# nodes controlplane1 SMI Cluster Deployer(config-nodes-controlplane1)# maintenance false SMI Cluster Deployer(config-nodes-controlplane1)# commit Commit complete. SMI Cluster Deployer(config-nodes-controlplane1)# end SMI Cluster Deployer#
-
Run the cluster synchronization using the following command:
clusters cluster_name actions sync run debug true
Example:
SMI Cluster Deployer# clusters kali-stacked actions sync run debug true This will run sync. Are you sure? [no,yes] yes message accepted
-
Monitor the cluster synchronization using the following command:
monitor sync-logs cluster_name
Example:
SMI Cluster Deployer# monitor sync-logs kali-stacked 2020-09-30 01:50:02.159 DEBUG cluster_sync.kali-stacked: Cluster name: kali-stacked 2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: Force VM Redeploy: false 2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: Force partition Redeploy: false 2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: reset_k8s_nodes: false 2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: purge_data_disks: false 2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: upgrade_strategy: auto 2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: sync_phase: all 2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: debug: true . . . . 2020-09-30 01:53:27.638 DEBUG cluster_sync.kali-stacked: Cluster sync successful 2020-09-30 01:53:27.638 DEBUG cluster_sync.kali-stacked: Ansible sync done 2020-09-30 01:53:27.638 INFO cluster_sync.kali-stacked: _sync finished. Opening lock
-
To verify the status of the cluster, use the following command:
clusterscluster_name actions k8s cluster-status
Example:
pods-desired-count 99 pods-ready-count 99 pods-desired-are-ready true etcd-healthy true all-ok true
NOTES:
-
clusters cluster_name - Specifies the K8s cluster.
-
nodes node_name - Specifies the control plane Bare Metal node.
-
maintenance true/false - Assigns or removes the primary control plane Bare Metal mode to maintenance mode.
-
actions sync run debug true - Synchronizes the cluster configuration.
-
actions k8s cluster-status - Displays the status of the cluster.