Unified RMA Procedure for the Control Plane and Worker Nodes

This section describes the unified RMA procedure applicable to the following scenarios:

  • Replace a working control plane Bare Metal node or worker node for maintenance.

  • Replace a failed Bare Metal control plane node or worker node in a stacked cluster.

Notes:

  • If you are performing RMA for a control plane node, you must ensure that the majority of control plane nodes are still available during the RMA process.

  • Disable auto-sync before you perform the RMA procedure.

Use the following steps to replace a working or failed control plane or worker node:

  1. Drain and remove the node which is sent for maintenance, using the following command:

    clusters cluster_name nodesnode_name actions sync drain remove-node true 

    Example:

    SMI Cluster Deployer# clusters kali-stacked nodes controlplane1 actions sync drain remove-node true                  
    This will run drain on the node, disrupting pods running on the node.  Are you sure? [no,yes] yes
    message accepted
  2. For a planned maintenance scenario, shutdown the node if the node is still running.

  3. Assign the node to maintenance mode in the cluster configuration using the following CLI commands:

    config 
      clusters cluster_name 
         nodes node_name 
            maintenance true 
         commit 
      end 

    Example:

    SMI Cluster Deployer(config)# clusters kali-stacked 
    SMI Cluster Deployer(config-clusters-kali-stacked)# nodes controlplane1 
    SMI Cluster Deployer(config-nodes-controlplane1)# maintenance true 
    SMI Cluster Deployer(config-nodes-controlplane1)# commit
    Commit complete.
    SMI Cluster Deployer(config-nodes-controlplane1)# end
    SMI Cluster Deployer# 
  4. The node is ready for the RMA process.

    Note

    If the remaining nodes must be upgraded or updated, run a cluster sync in this state. However, it's not a part of the RMA process.

  5. Add the node back to the cluster when it is repaired or replaced and available.

    Note

    If the remaining nodes have been upgraded to a new SMI release during the time when this node was under maintenance, then it's recommended to clear the boot drive and delete the virtual drive on the node. This step ensures that virtual drive is in a clean state without the previous state before you add it back. However, removal of the virtual drive is not required for a new replacement node.

  6. Attach the new Bare Metal node and remove it from the maintenance mode in the cluster configuration using the following commands:

    config 
      clusters cluster_name 
         nodes node_name 
            maintenance false 
         commit 
      end 

    Example:

    SMI Cluster Deployer(config)# clusters kali-stacked 
    SMI Cluster Deployer(config-clusters-kali-stacked)# nodes controlplane1 
    SMI Cluster Deployer(config-nodes-controlplane1)# maintenance false 
    SMI Cluster Deployer(config-nodes-controlplane1)# commit
    Commit complete.
    SMI Cluster Deployer(config-nodes-controlplane1)# end
    SMI Cluster Deployer# 
  7. Run the cluster synchronization using the following command:

    clusters cluster_name actions sync run debug true 

    Example:

    SMI Cluster Deployer# clusters kali-stacked actions sync run debug true
    This will run sync.  Are you sure? [no,yes] yes
    message accepted
  8. Monitor the cluster synchronization using the following command:

    monitor sync-logs cluster_name 

    Example:

    SMI Cluster Deployer# monitor sync-logs kali-stacked 
    2020-09-30 01:50:02.159 DEBUG cluster_sync.kali-stacked: Cluster name: kali-stacked 
    2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: Force VM Redeploy: false 
    2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: Force partition Redeploy: false 
    2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: reset_k8s_nodes: false 
    2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: purge_data_disks: false 
    2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: upgrade_strategy: auto 
    2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: sync_phase: all 
    2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: debug: true 
    .
    .
    .
    .
    2020-09-30 01:53:27.638 DEBUG cluster_sync.kali-stacked: Cluster sync successful 
    2020-09-30 01:53:27.638 DEBUG cluster_sync.kali-stacked: Ansible sync done 
    2020-09-30 01:53:27.638 INFO cluster_sync.kali-stacked: _sync finished.  Opening lock 
  9. To verify the status of the cluster, use the following command:

    clusterscluster_name actions k8s cluster-status 

    Example:

    pods-desired-count 99
    pods-ready-count 99
    pods-desired-are-ready true
    etcd-healthy true
    all-ok true

NOTES:

  • clusters cluster_name - Specifies the K8s cluster.

  • nodes node_name - Specifies the control plane Bare Metal node.

  • maintenance true/false - Assigns or removes the primary control plane Bare Metal mode to maintenance mode.

  • actions sync run debug true - Synchronizes the cluster configuration.

  • actions k8s cluster-status - Displays the status of the cluster.