To avoid data unavailability, configure rack awareness to use multiple racks defined by
These examples illustrate how this feature currently operates:
- When configuring 3 racks with replication factor of 3 (RF3), each rack receives one replica for a given partition.
- The first replica, the master, is on one rack, the second replica on a different rack, and the third replica is on the third rack, with specific nodes from each rack depending on the succession list order.
- If you lose a rack, the number of replicas is eventually restored to whatever your replication factor is configured for. For instance, if you go from 3 racks to 2 with a configuration of RF3, one rack hosts the master, the other rack hosts one replica, and the third replica will be on one of the two racks.
Masters are always evenly distributed across each node in the cluster, regardless of the configuration in
prefer-uniform-balance, even when the number of racks does not match the
In these cases, racks will not have a copy of each partition.
In unbalanced racks - racks with different numbers of nodes - the master partition and replica partitions are distributed to distinct racks. When the replication-factor is configured higher than the current number of racks, the excess replicas are randomly distributed.
If a single node goes down, the cluster is temporarily unbalanced until the node is restored. This imbalance does not cause service interruptions. The cluster continues uninterrupted. Once the node restarts, the cluster automatically rebalances. The imbalance in the general load on the nodes across racks depends on the nature of the workload, for example, the ratio of the updated portion of a record to its total size, as replica writes carry the full record.
Configure a cluster for rack awareness
You can configure rack awareness at the namespace level. For nodes which should be on the same rack, specify the same
rack-id for these nodes.
Upgrade a cluster for rack awareness
You can enable or update rack awareness for an existing cluster dynamically.
- On each node use asadm's
manage configcommand to change the
rack-idto the desired value:
asadm -e "enable; manage config namespace <namespaceName> param rack-id to 1 with <host>"
or use asinfo's
asinfo -v "set-config:context=namespace;id=namespaceName;rack-id=1"
- Add the
rack-idconfiguration parameter to the namespace stanza in the config file to ensure the configuration persists following any future restarts.
- Trigger a rebalance of the cluster to engage migrations with the new
Make sure to persist your changes in the config file to protect against a roll back due to restart. Also, verify that there are no typos in the config file. This is the 'best practice' for updating the config file for any dynamic change.
Display rack group settings
Use the following commands to get the grouping of the racks.
asadm -e "show racks"
~~~~~~~~~~~~~~~~Racks (2021-10-21 20:33:28 UTC)~~~~~~~~~~~~~~~~~
bar |2 |BB9040016AE4202, BB9020016AE4202, BB9010016AE4202
test |1 |BB9040016AE4202, BB9010016AE4202
Number of rows: 2
For the example above, for the
test namespace, rack-id 1 includes nodes BB9040016AE4202, BB9020016AE4202, BB9010016AE4202, rack-id 2 includes nodes BB9040016AE4202, BB9010016AE4202.
Configuring the Rack Awareness Protocol
Update the paxos-protocol parameter in the service stanza to be v4 as shown below:
Configure the group (Rack)
In the non-rack-awareness case, each node is identified by a node-value -- a 64-bit unsigned integer which consists of: 16-bit port ID plus the 48-bit hardware MAC address. Each node-value in the cluster must be unique.
When using rack awareness, a node-value consists of: 16-bit port ID + 16-bit group ID + 32-bit node ID:
- The 32-bit node ID can be automatically generated from the node's IP address OR you can specify it explicitly. Specifying the node ID explicitly might be used when you want more control over the node ID value for verification or cluster debugging.
- The group ID must be specified explicitly.
Explicit Node ID
Set each node's ID and group by inserting the following top level stanza into a node's configuration file:
self-node-id [32-bit unsigned integer node ID]
self-group-id [16-bit unsigned integer group ID]
Auto Generate Node ID
Inserting the following top level stanza into each node's configuration file: (the node ID is computed based on IP address)
self-group-id [16-bit unsigned integer group ID]
- When you choose the dynamic option and use the IP address as part of the node ID, avoid reusing IP addresses for nodes in a cluster. IP addresses affect the uniqueness of the node iDs.
- The node IDs of different database nodes must never conflict; such a conflict would result in the cluster not forming correctly.
- The node ID for a database node is computed once, at the start of the node's lifetime in a cluster instance, so even though that node's actual IP address may change over time, its node ID is fixed for either its lifetime or the lifetime of the cluster (whichever is shorter).
- Though a cluster's node IP addresses at any one time may not conflict, to guarantee they remain unique over the lifetime of the cluster, no active IP address may overlap with any previously started node IP address.
- If the node-values (port+group+node) are not unique within a cluster then the nodes with duplicate node-values will not be able to join the cluster.
- The node ID and group ID must be positive (non-zero) integers.
- To turn off rack awareness support, users can comment out or delete the "mode" line, or they can use the value "none": e.g. mode none. This will restore default (non-rack-awareness) cluster behavior.
- A cluster in rack awareness mode can have a mixture of nodes with auto-generated and explicit node IDs – that is, you can mix and match how node IDs are specified.
For example, to configure two nodes into a group, the first node might have the following in its configuration file:
and the second node might have the following in its configuration file:
On startup, when a node joins the cluster, the existing nodes in the cluster will discover the group topology and assign replica partitions accordingly.
Upgrade a cluster to rack awareness
To upgrade your cluster to use rack awareness:
- Stop all cluster nodes.
- Upgrade their configuration files, similar to bringing up new cluster.
- Restart the cluster. Since replica locations must be agreed upon by all cluster nodes, when you first enable rack awareness you must restart the entire cluster.
If you enable rack awareness, all nodes in the cluster must use rack awareness; that is, configure all nodes as described above. Rack awareness requires a minimum of two groups. If there is only one node or only one group, then the default (non-rack aware) behavior automatically takes over. This is also the case if multiple groups (racks) are present on cluster start, but failure causes only a single group (rack) to be up. The remaining nodes in the single remaining group (rack) automatically form a default (non-rack aware) cluster.
Scale a rack awareness cluster
Once rack awareness is enabled, you can add a node to a group or remove a node from a group without restarting the whole cluster. To add a node to the cluster, simply configure the group before starting the node. The cluster rebalances.
Node values (that is, the 64-bit identifier comprised of port+group+node) MUST be unique. When you first enable rack awareness, this uniqueness is easy to configure; however, over hardware upgrades and IP addresses changes, uniqueness may be harder to maintain, especially with a mix of static and dynamic node values. It is a best practice to create and carefully maintain a list of current node values. When a new machine enters the cluster and there is a node-value collision, odd cluster behavior may occur because the node communication has faults.
Display rack group settings
To see the nodes in a group:
asinfo -v dump-ra:verbose=true
In this 3-node/3-group cluster example, the output of a query using node ID 101 is:
May 28 2013 18:39:00 GMT: INFO (info): (base/cluster_config.c:267) Rack Aware is enabled. Mode: static.
May 28 2013 18:39:00 GMT: INFO (paxos): (base/cluster_config.c:281) SuccessionList: Node bcd00cb00000069 : Port 3021 ; GroupID 203 ; NodeID 105 [Master]
May 28 2013 18:39:00 GMT: INFO (paxos): (base/cluster_config.c:281) SuccessionList: Node bc300ca00000067 : Port 3011 ; GroupID 202 ; NodeID 103
May 28 2013 18:39:00 GMT: INFO (paxos): (base/cluster_config.c:281) SuccessionList: Node bb900c900000065 : Port 3001 ; GroupID 201 ; NodeID 101 [Self]
Where to next?
- Configure service, fabric, and info sub-stanzas which defines what interface will be used for application to node communication.
- Configure heartbeat sub-stanza which defines what interface will be used for intracluster communications.
- Learn more about Rack Aware Architecture.
- Or return to Configure Page.