GitHub Enterprise High Availability Configuration (HA) is a primary/secondary failover configuration that provides redundancy while Clustering provides redundancy and scalability by distributing read and write load across multiple nodes.

Failure scenarios

High Availability (HA) and Clustering both provide redundancy by eliminating the single node as a point of failure. They are able to provide availability in these scenarios:

  • Software crashes, either due to operating system failure or unrecoverable applications.
  • Hardware failures, including storage hardware, CPU, RAM, network interfaces, etc.
  • Virtualization host system failures, including unplanned and scheduled maintenance events on AWS.
  • Logically or physically severed network, if the failover appliance is on a separate network not impacted by the failure.

Scalability

Clustering provides better scalability by distributing load across multiple nodes. This horizontal scaling may be preferable for some organizations with tens of thousands of developers. In HA, the scale of the appliance is dependent exclusively on the primary node and the load is not distributed to the replica server.

Differences in failover method and configuration

Feature Failover configuration Failover method
High Availability Configuration DNS record with a low TTL pointed to the primary appliance, or load balancer. You must manually promote the replica appliance in both DNS failover and load balancer configurations.
Clustering DNS record must point to a load balancer. If a node behind the load balancer fails, traffic is automatically sent to the other functioning nodes.

Backups and disaster recovery

Neither HA or Clustering should be considered a replacement for regular backups. For more information, see "Configuring backups on your appliance."

Monitoring

Availability features, especially ones with automatic failover such as Clustering, can mask a failure since service is usually not disrupted when something fails. Whether you are using HA or Clustering, monitoring the health of each instance is important so that you are aware when a failure occurs. For more information on monitoring, see About recommended alert thresholds and Monitoring Cluster Nodes.

Further Reading