Software Upgrade on GR Pairs
Considering config commit
as reference. The same checklist is also applicable for other upgrade scenarios.
Checklist
Note | Don’t perform |
-
Don’t perform
config commits
on both sites at the same time. Performconfig commit
on each site separately. -
Before to the
config commit
procedure on Rack-1/Site-1, initiate the CLI-based switchover on Rack-1/Site-1 and make sure that Rack-2/Site-2 is having Primary ownership for both the instances (instance-id 1 and instance-id 2). -
Perform
config commit
on Rack-1/Site-1. Wait for the successfulconfig commit
, PODs restart, and are back in running state to fetch the latest helm charts (if applicable). -
Revert the role of Rack-1/Site-1 to be Primary (Switch/Reset roles on both sites).
-
Verify that the available roles of Rack-1//Site-1 (Primary) and Rack-2//Site-2 (Standby) are on the expected status.
-
Repeat the preceding checklist for Rack-2/Site-2.
Software Upgrade
Upgrading the Rack-1/Site-1, when the GR is Enabled:
-
Verify that the available roles of both instances on Rack-1//Site-1 are in PRIMARY/STANDBY.
show role instance-id 1 result "PRIMARY"
show role instance-id 2 result "STANDBY"
-
Initiate switch role for both instances on Rack-1/Site-1 to STANDBY with failback-interval of 0 seconds. This step transitions the roles from PRIMARY/STANDBY to STANDBY_ERROR/STANDBY_ERROR.
geo switch-role instance-id 1 role standby [failback-interval 0] geo switch-role instance-id 2 role standby [failback-interval 0]
Note-
Heartbeat between both the sites must be successful.
-
The CLI failback-interval is an optional command to provide backward compatibility of upgrades between releases. The value of failback-interval is 0. It is deprecated from current release and will be discontinued from the subsequent releases.
-
-
Verify that the available roles of both instances have moved to STANDBY_ERROR on Rack-1/Site-1.
show role instance-id 1 result "STANDBY_ERROR"
show role instance-id 2 result "STANDBY_ERROR"
-
Verify that the available roles of both instances have moved to PRIMARY on Rack-2/Site-2.
show role instance-id 1 result "PRIMARY"
show role instance-id 2 result "PRIMARY"
-
Perform rolling upgrade (or) non-graceful upgrade using system mode shutdown/running as per the requirement on Rack-1/Site-1. To allow replication to finish, give a 5-minute gap between the GR switchover and SMF shutdown.
-
Perform the following steps post completion of the upgrade procedure. Perform health check on Rack-1/Site-1 and ensure the PODs have come up and Rack-1/Site-1 is healthy.
-
Verify that the available roles of both instances remain in STANDBY_ERROR mode on Rack-1/Site-1.
show role instance-id 1 result "STANDBY_ERROR"
show role instance-id 2 result "STANDBY_ERROR"
-
Initiate reset role for both instances on Rack-1/Site-1 to STANDBY. This step transitions the roles from STANDBY_ERROR/STANDBY_ERROR to STANDBY/STANDBY.
geo reset-role instance-id 1 role standby geo reset-role instance-id 2 role standby
-
Verify that the roles of both instances have moved to STANDBY on Rack-1/Site-1.
show role instance-id 1 result "STANDBY"
show role instance-id 2 result "STANDBY"
-
Initiate switch role for instance-id 1 on Rack-2/Site-2 to STANDBY. This step transitions the available roles of Rack-2/Site-2 from PRIMARY/PRIMARY to STANDBY_ERROR/PRIMARY and Rack-1/Site-1 from STANDBY/STANDBY to PRIMARY/STANDBY.
geo switch-role instance-id 1 role standby [failback-interval 0]
-
Verify that the available roles of the instances on Rack-2/Site-2 are in STANDBY_ERROR/PRIMARY.
show role instance-id 1 result "STANDBY_ERROR"
show role instance-id 2 result "PRIMARY"
-
Verify that the available roles of both instances on Rack-1/Site-1 are in PRIMARY/STANDBY.
show role instance-id 1 result "PRIMARY"
show role instance-id 2 result "STANDBY"
-
Initiate reset role for instance-id 1 on Rack-2/Site-2 to STANDBY. This step transitions the roles of Rack-2/Site-2 from STANDBY_ERROR/PRIMARY to STANDBY/PRIMARY.
geo reset-role instance-id 1 role standby
-
Verify that the available roles of both instances on Rack-2/Site-2 are in STANDBY/PRIMARY.
show role instance-id 1 result "STANDBY"
show role instance-id 2 result "PRIMARY"
Upgrading the Rack-2/Site-2, when the GR is Enabled:
-
Verify that the available roles of both instances on Rack-2/Site-2 are in STANDBY/PRIMARY.
show role instance-id 1 result "STANDBY"
show role instance-id 2 result "PRIMARY"
-
Initiate switch role for both instances on Rack-2/Site-2 to STANDBY with failback-interval of 0 seconds. This step transitions the roles from STANDBY/PRIMARY to STANDBY_ERROR/STANDBY_ERROR.
geo switch-role instance-id 1 role standby [failback-interval 0] geo switch-role instance-id 2 role standby [failback-interval 0]
-
Verify that the available roles of both instances move to STANDBY_ERROR on Rack-2/Site-2.
show role instance-id 1 result "STANDBY_ERROR"
show role instance-id 2 result "STANDBY_ERROR"
-
Verify that the available roles of both instances move to PRIMARY on Rack-1/Site-1.
show role instance-id 1 result "PRIMARY"
show role instance-id 2 result "PRIMARY"
-
Perform rolling upgrade (or) non-graceful upgrade via system mode shutdown/running as per the requirement on Rack-2/Site-2.
-
Perform the subsequent steps post completion of the upgrade procedure. Perform the health check on Rack-2/Site-2 and ensure the PODs have come up and Rack-2/Site-2 is healthy.
-
Verify that the available roles of both the instances remain in STANDBY_ERROR on Rack-2/Site-2.
show role instance-id 1 result "STANDBY_ERROR"
show role instance-id 2 result "STANDBY_ERROR"
-
Initiate reset role for both instances on Rack-2/Site-2 to STANDBY. This step transitions the roles from STANDBY_ERROR/STANDBY_ERROR to STANDBY/STANDBY.
geo reset-role instance-id 1 role standby geo reset-role instance-id 2 role standby
-
Verify that the available roles of both instances move to STANDBY on Rack-2/Site-2.
show role instance-id 1 result "STANDBY"
show role instance-id 2 result "STANDBY"
-
Initiate switch role for instance-id 2 on Rack-1/Site-1 to STANDBY. This step transitions the available roles of Rack-1/Site-2 from PRIMARY/PRIMARY to PRIMARY/STANDBY_ERROR and Rack-2/Site-2 from STANDBY/STANDBY to STANDBY/PRIMARY.
geo switch-role instance-id 2 role standby [failback-interval 0]
-
Verify that the available roles of both instances on Rack-1/Site-1 are in PRIMARY/STANDBY_ERROR.
show role instance-id 1 result "PRIMARY"
show role instance-id 2 result "STANDBY_ERROR"
-
Verify that the available roles of both instances on Rack-2/Site-2 are in STANDBY/PRIMARY.
show role instance-id 1 result "STANDBY"
show role instance-id 2 result "PRIMARY"
-
Initiate reset role for instance-id 2 on Rack-1/Site-1 to STANDBY. This step transitions the roles of Rack-1/Site-1 from PRIMARY/STANDBY_ERROR to PRIMARY/STANDBY.
geo reset-role instance-id 2 role standby
-
Verify that the available roles of both the instances on Rack-1/Site-1 are in PRIMARY/STANDBY.
show role instance-id 1 result "PRIMARY"
show role instance-id 2 result "STANDBY"