Kubernetes Volumes Freezes on Node Shutdown

In the event of an unclean node shutdown, the detach and reattach logic of persistent volumes in Kubernetes are prone for issues. For more information on the issues, see https://github.com/kubernetes/kubernetes/issues/65392. This section describes the root cause of these issues and how to resolve these issues in the production system.

Problem

The following are some of the scenarios where the Kubernetes volumes freezes after an unclean node shutdown:

  • When you shutdown a node that has a non-local persistent volumes (Cinder or vSphere) attached to it without using the cordon or drain approach recommended by Kubernetes.

    ubuntu@cn2smi-controlplane1:~$ kubectl get nodes
    NAME                    STATUS     ROLES           AGE   VERSION
    cn2smi-controlplane1    Ready      control-plane   21h   v1.15.3
    cn2smi-controlplane2    Ready      control-plane   21h   v1.15.3
    cn2smi-controlplane3    Ready      control-plane   21h   v1.15.3
    cn2smi-oam1             NotReady   <none>          14h   v1.15.3
  • When the pods are in "creation" or "initialization" state waiting for the volumes to attach. The pods freeze approximately after thirty seconds.

    ubuntu@cn2smi-controlplane1:~$ kubectl get pods -w -A -o wide| grep oam | grep bulk
    cee-global      bulk-stats-68dc684d57-hqtdj                                       3/3     Terminating         0          4m38s   192.200.7.68    cn2smi-oam1             <none>           <none>
    cee-global      bulk-stats-68dc684d57-sdjgl                                       0/3     ContainerCreating   0          15s     <none>          cn2smi-oam2             <none>           <none>
    
  • The pods in the "ContainerCreating" state displays a "Multi-Attach error" event.

    Events:
      Type     Reason              Age   From                     Message
      ----     ------              ----  ----                     -------
      Normal   Scheduled           76s   default-scheduler        Successfully assigned cee-global/bulk-stats-68dc684d57-sdjgl to cn2smi-oam2
      Warning  FailedAttachVolume  76s   attachdetach-controller  Multi-Attach error for volume "pvc-b01a4434-190f-4eec-b289-41ae990b0025" Volume is already used by pod(s) bulk-stats-68dc684d57-hqtdj
    ubuntu

Resolution

You can delete the "Terminating" node to resolve this issue. To delete the "Terminating" node, run the following command:

 kubectl delete pod -n <namespace> <pod-name> --force --grace-period=0
Note
You must use the grace-period=0 and force options for deleting the node.

The pod reschedules approximately after seven minutes .