Skip to main content
Loading

Rack Awareness Configuration

To avoid data unavailability, configure rack awareness to use multiple racks defined by rack-id.

These examples illustrate how this feature currently operates:

  • When configuring 3 racks with replication factor of 3 (RF3), each rack receives one replica for a given partition.
    • The first replica, the master, is on one rack, the second replica on a different rack, and the third replica is on the third rack, with specific nodes from each rack depending on the succession list order.
  • If you lose a rack, the number of replicas is eventually restored to whatever your replication factor is configured for. For instance, if you go from 3 racks to 2 with a configuration of RF3, one rack hosts the master, the other rack hosts one replica, and the third replica will be on one of the two racks.

Partition distribution

Masters are always evenly distributed across each node in the cluster, regardless of the configuration in prefer-uniform-balance, even when the number of racks does not match the replication-factor.
In these cases, racks will not have a copy of each partition.

Unbalanced racks

In unbalanced racks - racks with different numbers of nodes - the master partition and replica partitions are distributed to distinct racks. When the replication-factor is configured higher than the current number of racks, the excess replicas are randomly distributed.

Unbalanced clusters

If a single node goes down, the cluster is temporarily unbalanced until the node is restored. This imbalance does not cause service interruptions. The cluster continues uninterrupted. Once the node restarts, the cluster automatically rebalances. The imbalance in the general load on the nodes across racks depends on the nature of the workload, for example, the ratio of the updated portion of a record to its total size, as replica writes carry the full record.

Configure a cluster for rack awareness

You can configure rack awareness at the namespace level. For nodes which should be on the same rack, specify the same rack-id for these nodes.

namespace {
...
rack-id 1
...
}

Upgrade a cluster for rack awareness

You can enable or update rack awareness for an existing cluster dynamically.

note

Tools package 6.0.x or later is required to use asadm's manage config commands. manage config requires asadm to be in enable mode by typing enable. Otherwise, use the equivalent asinfo - set-config command.

  1. On each node use asadm's manage config command to change the rack-id to the desired value:
asadm -e "enable; manage config namespace <namespaceName> param rack-id to 1 with <host>"

or use asinfo's set-config command:

asinfo -v "set-config:context=namespace;id=namespaceName;rack-id=1"
  1. Add the rack-id configuration parameter to the namespace stanza in the config file to ensure the configuration persists following any future restarts.
  2. Trigger a rebalance of the cluster to engage migrations with the new rack-id configurations:
note

Make sure to persist your changes in the config file to protect against a roll back due to restart. Also, verify that there are no typos in the config file. This is the 'best practice' for updating the config file for any dynamic change.

Display rack group settings

Use the following commands to get the grouping of the racks.

note

Tools package 6.2.x or later is required to use asadm's show racks command. Otherwise, use the equivalent asinfo - racks command.

asadm -e "show racks"
~~~~~~~~~~~~~~~~Racks (2021-10-21 20:33:28 UTC)~~~~~~~~~~~~~~~~~
Namespace|Rack| Nodes
| ID|
bar |2 |BB9040016AE4202, BB9020016AE4202, BB9010016AE4202
test |1 |BB9040016AE4202, BB9010016AE4202
Number of rows: 2

For the example above, for the test namespace, rack-id 1 includes nodes BB9040016AE4202, BB9020016AE4202, BB9010016AE4202, rack-id 2 includes nodes BB9040016AE4202, BB9010016AE4202.

Configuring the Rack Awareness Protocol

Update the paxos-protocol parameter in the service stanza to be v4 as shown below:

service {
...
paxos-protocol v4
...
}

Configure the group (Rack)

In the non-rack-awareness case, each node is identified by a node-value -- a 64-bit unsigned integer which consists of: 16-bit port ID plus the 48-bit hardware MAC address. Each node-value in the cluster must be unique.

When using rack awareness, a node-value consists of: 16-bit port ID + 16-bit group ID + 32-bit node ID:

  • The 32-bit node ID can be automatically generated from the node's IP address OR you can specify it explicitly. Specifying the node ID explicitly might be used when you want more control over the node ID value for verification or cluster debugging.
  • The group ID must be specified explicitly.

Explicit Node ID

Set each node's ID and group by inserting the following top level stanza into a node's configuration file:

cluster {
mode static
self-node-id [32-bit unsigned integer node ID]
self-group-id [16-bit unsigned integer group ID]
}

Auto Generate Node ID

Inserting the following top level stanza into each node's configuration file: (the node ID is computed based on IP address)

cluster {
mode dynamic
self-group-id [16-bit unsigned integer group ID]
}

Node IDs

  • When you choose the dynamic option and use the IP address as part of the node ID, avoid reusing IP addresses for nodes in a cluster. IP addresses affect the uniqueness of the node iDs.
  • The node IDs of different database nodes must never conflict; such a conflict would result in the cluster not forming correctly.
  • The node ID for a database node is computed once, at the start of the node's lifetime in a cluster instance, so even though that node's actual IP address may change over time, its node ID is fixed for either its lifetime or the lifetime of the cluster (whichever is shorter).
  • Though a cluster's node IP addresses at any one time may not conflict, to guarantee they remain unique over the lifetime of the cluster, no active IP address may overlap with any previously started node IP address.

Node values

  1. If the node-values (port+group+node) are not unique within a cluster then the nodes with duplicate node-values will not be able to join the cluster.
  2. The node ID and group ID must be positive (non-zero) integers.
  3. To turn off rack awareness support, users can comment out or delete the "mode" line, or they can use the value "none": e.g. mode none. This will restore default (non-rack-awareness) cluster behavior.
  4. A cluster in rack awareness mode can have a mixture of nodes with auto-generated and explicit node IDs – that is, you can mix and match how node IDs are specified.

For example, to configure two nodes into a group, the first node might have the following in its configuration file:

service {
...
paxos-protocol v4
...
}

cluster {
mode static
self-node-id 101
self-group-id 201
}

and the second node might have the following in its configuration file:

service {
...
paxos-protocol v4
...
}

cluster {
mode dynamic
self-group-id 201
}

On startup, when a node joins the cluster, the existing nodes in the cluster will discover the group topology and assign replica partitions accordingly.

Upgrade a cluster to rack awareness

To upgrade your cluster to use rack awareness:

  1. Stop all cluster nodes.
  2. Upgrade their configuration files, similar to bringing up new cluster.
  3. Restart the cluster. Since replica locations must be agreed upon by all cluster nodes, when you first enable rack awareness you must restart the entire cluster.

If you enable rack awareness, all nodes in the cluster must use rack awareness; that is, configure all nodes as described above. Rack awareness requires a minimum of two groups. If there is only one node or only one group, then the default (non-rack aware) behavior automatically takes over. This is also the case if multiple groups (racks) are present on cluster start, but failure causes only a single group (rack) to be up. The remaining nodes in the single remaining group (rack) automatically form a default (non-rack aware) cluster.

Scale a rack awareness cluster

Once rack awareness is enabled, you can add a node to a group or remove a node from a group without restarting the whole cluster. To add a node to the cluster, simply configure the group before starting the node. The cluster rebalances.

Node values (that is, the 64-bit identifier comprised of port+group+node) MUST be unique. When you first enable rack awareness, this uniqueness is easy to configure; however, over hardware upgrades and IP addresses changes, uniqueness may be harder to maintain, especially with a mix of static and dynamic node values. It is a best practice to create and carefully maintain a list of current node values. When a new machine enters the cluster and there is a node-value collision, odd cluster behavior may occur because the node communication has faults.

Display rack group settings

To see the nodes in a group:

asinfo -v dump-ra:verbose=true

In this 3-node/3-group cluster example, the output of a query using node ID 101 is:

May 28 2013 18:39:00 GMT: INFO (info): (base/cluster_config.c:267) Rack Aware is enabled.  Mode: static.
May 28 2013 18:39:00 GMT: INFO (paxos): (base/cluster_config.c:281) SuccessionList[0]: Node bcd00cb00000069 : Port 3021 ; GroupID 203 ; NodeID 105 [Master]
May 28 2013 18:39:00 GMT: INFO (paxos): (base/cluster_config.c:281) SuccessionList[1]: Node bc300ca00000067 : Port 3011 ; GroupID 202 ; NodeID 103
May 28 2013 18:39:00 GMT: INFO (paxos): (base/cluster_config.c:281) SuccessionList[2]: Node bb900c900000065 : Port 3001 ; GroupID 201 ; NodeID 101 [Self]

Where to next?