Skip to main content

Initiating a failover to your replica appliance

You can failover to a GitHub Enterprise Server replica appliance using the command line for maintenance and testing, or if the primary appliance fails.

The time required to failover depends on how long it takes to manually promote the replica and redirect traffic. The average time ranges between 20-30 minutes.

Promoting a replica does not automatically set up replication for existing appliances. After promoting a replica, if desired, you can set up replication from the new primary to existing appliances and the previous primary.

  1. If the primary appliance is available, to allow replication to finish before you switch appliances, on the primary appliance, put the primary appliance into maintenance mode.

    • Put the appliance into maintenance mode.

    • When the number of active Git operations, MySQL queries, and Resque jobs reaches zero, wait 30 seconds.

      Note

      Nomad will always have jobs running, even in maintenance mode, so you can safely ignore these jobs.

    • To verify all replication channels report OK, use the ghe-repl-status -vv command.

      ghe-repl-status -vv
      
  2. Enable maintenance mode on all active replica appliances. For more information, see "Enabling and scheduling maintenance mode."

  3. On the replica appliance you'd like to fail over to, to stop replication and promote the replica appliance to primary status, use the ghe-repl-promote command.

    ghe-repl-promote
    

    Note

    If the primary node is unavailable, warnings and timeouts may occur but can be ignored.

  4. Update the DNS record to point to the IP address of the replica. Traffic is directed to the replica after the TTL period elapses. If you are using a load balancer, ensure it is configured to send traffic to the replica.

  5. Notify users that they can resume normal operations.

  6. If desired, set up replication from the new primary to existing appliances and the previous primary. For more information, see "About high availability configuration."

  7. Appliances you do not intend to setup replication to that were part of the high availability configuration prior the failover, need to be removed from the high availability configuration by UUID.

    • On the former appliances, get their UUID via cat /data/user/common/uuid.

      cat /data/user/common/uuid
      
    • On the new primary, remove the UUIDs using ghe-repl-teardown. Please replace UUID with a UUID you retrieved in the previous step.

      ghe-repl-teardown -u UUID
      

Further reading