Software Upgrade on GR Pairs

Considering config commit as reference. The same checklist is also applicable for other upgrade scenarios.

Checklist

Note

Don’t perform cluster sync on both sites (Rack-1/Site-1 and Rack-2/Site-2) at the same time. Trigger manual switchover on Rack-1 before proceeding with Rack-1/Site-1 upgrade.

  • Don’t perform config commits on both sites at the same time. Perform config commit on each site separately.

  • Before to the config commit procedure on Rack-1/Site-1, initiate the CLI-based switchover on Rack-1/Site-1 and make sure that Rack-2/Site-2 is having Primary ownership for both the instances (instance-id 1 and instance-id 2).

  • Perform config commit on Rack-1/Site-1. Wait for the successful config commit, PODs restart, and are back in running state to fetch the latest helm charts (if applicable).

  • Revert the role of Rack-1/Site-1 to be Primary (Switch/Reset roles on both sites).

  • Verify that the available roles of Rack-1//Site-1 (Primary) and Rack-2//Site-2 (Standby) are on the expected status.

  • Repeat the preceding checklist for Rack-2/Site-2.

Software Upgrade

Upgrading the Rack-1/Site-1, when the GR is Enabled:

  1. Verify that the available roles of both instances on Rack-1//Site-1 are in PRIMARY/STANDBY.

    show role instance-id 1
    result "PRIMARY"
    show role instance-id 2
    result "STANDBY"
  2. Initiate switch role for both instances on Rack-1/Site-1 to STANDBY with failback-interval of 0 seconds. This step transitions the roles from PRIMARY/STANDBY to STANDBY_ERROR/STANDBY_ERROR.

    geo switch-role instance-id 1 role standby [failback-interval 0]
    geo switch-role instance-id 2 role standby [failback-interval 0]
    Note
    • Heartbeat between both the sites must be successful.

    • The CLI failback-interval is an optional command to provide backward compatibility of upgrades between releases. The value of failback-interval is 0. It is deprecated from current release and will be discontinued from the subsequent releases.

  3. Verify that the available roles of both instances have moved to STANDBY_ERROR on Rack-1/Site-1.

    show role instance-id 1
    result "STANDBY_ERROR"
    show role instance-id 2
    result "STANDBY_ERROR"
  4. Verify that the available roles of both instances have moved to PRIMARY on Rack-2/Site-2.

    show role instance-id 1
    result "PRIMARY"
    show role instance-id 2
    result "PRIMARY"
  5. Perform rolling upgrade (or) non-graceful upgrade using system mode shutdown/running as per the requirement on Rack-1/Site-1. To allow replication to finish, give a 5-minute gap between the GR switchover and SMF shutdown.

  6. Perform the following steps post completion of the upgrade procedure. Perform health check on Rack-1/Site-1 and ensure the PODs have come up and Rack-1/Site-1 is healthy.

  7. Verify that the available roles of both instances remain in STANDBY_ERROR mode on Rack-1/Site-1.

    show role instance-id 1
    result "STANDBY_ERROR"
    show role instance-id 2
    result "STANDBY_ERROR"
  8. Initiate reset role for both instances on Rack-1/Site-1 to STANDBY. This step transitions the roles from STANDBY_ERROR/STANDBY_ERROR to STANDBY/STANDBY.

    geo reset-role instance-id 1 role standby
    geo reset-role instance-id 2 role standby
  9. Verify that the roles of both instances have moved to STANDBY on Rack-1/Site-1.

    show role instance-id 1
    result "STANDBY"
    show role instance-id 2
    result "STANDBY"
  10. Initiate switch role for instance-id 1 on Rack-2/Site-2 to STANDBY. This step transitions the available roles of Rack-2/Site-2 from PRIMARY/PRIMARY to STANDBY_ERROR/PRIMARY and Rack-1/Site-1 from STANDBY/STANDBY to PRIMARY/STANDBY.

    geo switch-role instance-id 1 role standby [failback-interval 0]
  11. Verify that the available roles of the instances on Rack-2/Site-2 are in STANDBY_ERROR/PRIMARY.

    show role instance-id 1
    result "STANDBY_ERROR"
    show role instance-id 2
    result "PRIMARY"
  12. Verify that the available roles of both instances on Rack-1/Site-1 are in PRIMARY/STANDBY.

    show role instance-id 1
     result "PRIMARY"
    show role instance-id 2
     result "STANDBY"
  13. Initiate reset role for instance-id 1 on Rack-2/Site-2 to STANDBY. This step transitions the roles of Rack-2/Site-2 from STANDBY_ERROR/PRIMARY to STANDBY/PRIMARY.

    geo reset-role instance-id 1 role standby
  14. Verify that the available roles of both instances on Rack-2/Site-2 are in STANDBY/PRIMARY.

    show role instance-id 1
    result "STANDBY"
    show role instance-id 2
     result "PRIMARY"

Upgrading the Rack-2/Site-2, when the GR is Enabled:

  1. Verify that the available roles of both instances on Rack-2/Site-2 are in STANDBY/PRIMARY.

    show role instance-id 1
    result "STANDBY"
    show role instance-id 2
    result "PRIMARY"
  2. Initiate switch role for both instances on Rack-2/Site-2 to STANDBY with failback-interval of 0 seconds. This step transitions the roles from STANDBY/PRIMARY to STANDBY_ERROR/STANDBY_ERROR.

    geo switch-role instance-id 1 role standby [failback-interval 0]
    geo switch-role instance-id 2 role standby [failback-interval 0]
  3. Verify that the available roles of both instances move to STANDBY_ERROR on Rack-2/Site-2.

    show role instance-id 1
    result "STANDBY_ERROR"
    show role instance-id 2
    result "STANDBY_ERROR"
  4. Verify that the available roles of both instances move to PRIMARY on Rack-1/Site-1.

    show role instance-id 1
     result "PRIMARY"
    show role instance-id 2
     result "PRIMARY"
  5. Perform rolling upgrade (or) non-graceful upgrade via system mode shutdown/running as per the requirement on Rack-2/Site-2.

  6. Perform the subsequent steps post completion of the upgrade procedure. Perform the health check on Rack-2/Site-2 and ensure the PODs have come up and Rack-2/Site-2 is healthy.

  7. Verify that the available roles of both the instances remain in STANDBY_ERROR on Rack-2/Site-2.

    show role instance-id 1
    result "STANDBY_ERROR"
    show role instance-id 2
    result "STANDBY_ERROR"
  8. Initiate reset role for both instances on Rack-2/Site-2 to STANDBY. This step transitions the roles from STANDBY_ERROR/STANDBY_ERROR to STANDBY/STANDBY.

    geo reset-role instance-id 1 role standby
    geo reset-role instance-id 2 role standby
  9. Verify that the available roles of both instances move to STANDBY on Rack-2/Site-2.

    show role instance-id 1
    result "STANDBY"
    show role instance-id 2
    result "STANDBY"
  10. Initiate switch role for instance-id 2 on Rack-1/Site-1 to STANDBY. This step transitions the available roles of Rack-1/Site-2 from PRIMARY/PRIMARY to PRIMARY/STANDBY_ERROR and Rack-2/Site-2 from STANDBY/STANDBY to STANDBY/PRIMARY.

    geo switch-role instance-id 2 role standby [failback-interval 0]
  11. Verify that the available roles of both instances on Rack-1/Site-1 are in PRIMARY/STANDBY_ERROR.

    show role instance-id 1
    result "PRIMARY"
    show role instance-id 2
    result "STANDBY_ERROR"
  12. Verify that the available roles of both instances on Rack-2/Site-2 are in STANDBY/PRIMARY.

    show role instance-id 1
    result "STANDBY"
    show role instance-id 2
    result "PRIMARY"
  13. Initiate reset role for instance-id 2 on Rack-1/Site-1 to STANDBY. This step transitions the roles of Rack-1/Site-1 from PRIMARY/STANDBY_ERROR to PRIMARY/STANDBY.

    geo reset-role instance-id 2 role standby
  14. Verify that the available roles of both the instances on Rack-1/Site-1 are in PRIMARY/STANDBY.

    show role instance-id 1
    result "PRIMARY"
    show role instance-id 2
    result "STANDBY"