Quiescing a node

Context

When the quiesce command is issued to a node in Enterprise Edition 4.3.1.3 and later, it causes the node to be "quiesced" in the next cluster rebalance.

During rebalance, a quiesced node behaves in some ways as if it has been removed from the cluster, but in other ways as if it is still in the cluster.

Rebalance puts the quiesced node at the end of every partition's "succession list". This can cause a quiesced node that was previously master to handoff master status to a normal (not quiesced) node. This handoff happens only if the normal node and quiesced node have matching full partition versions. Such a handoff is the goal of using quiescence in one particular use case (see below).

The quiesced node at the end of the succession list is excluded from certain algorithms during rebalance: AP rack-aware, AP uniform balance, and SC "second phase" rack-aware -- for these purposes the quiesced node behaves as if it is not in the cluster.

Otherwise, the quiesced node behaves as if it is in the cluster -- it will accept transactions, schedule migrations if appropriate, and for SC it will count towards determining that a partition is available.

One last way a quiesced node behaves differently is that it will never drop its data, even if all migrations complete and leave it with a "superfluous" partition version. The assumption is that the quiesced node will be taken down and then return to its prior place as a replica -- keeping data means it will not take as long to re-sync, needing only a "delta" migration instead of a "fill" migration.

With Enterprise Edition 5.2.0 and above, a node can be placed in a perpetually quiesced state by using the stay-quiesced configuration parameter. This provides a lightweight mechanism for providing a tie-breaker node in the event of a network failure. This is particularly useful in rack-aware configurations where a loss of connectivity might result in a tie-break situation. Having such permanently quiesced node alone on a third rack on its own, breaks a potential tie by having a majority of nodes for a strong-consistency enabled namespace, providing full availability even in the case of losing a full rack out of two.

Using Quiescence for a Smooth Master Handoff

A common use for quiescence is to enable smooth master handoffs.

Normally, when a node is removed from a cluster, it takes a couple seconds for the remaining nodes to re-cluster and determine a new master for the partitions that had been master on the missing node. During this time, transactions (with timeouts shorter than the "master gap") that are looking for a master node will not find one, and will time out -- i.e. writes, and SC reads. AP reads will by default retry against a replica.

Quiescence can fill this gap -- if the node to be removed is first quiesced, and a rebalance triggered, master handoff will occur during the rebalance, yet the quiesced node will continue to receive transactions (and proxy them to the new master) until all the clients have discovered the new master and moved to it. Once this has happened, the quiesced node can be taken down. The re-clustering caused by this will not have a "master gap". Therefore, the burst of timeouts should instead become a burst of proxies (refer to the proxy related metrics such as client_proxy_complete).

note

The term 'master gap' refers to the time elapsed between the current master node in the client partition map becoming unreachable and a new master node assuming masterhood. Once a new master has been elected, client transactions can be proxied within the cluster to the new master node. When the client has tended the remaining nodes (after the tend interval has elapsed) transactions will be sent directly to the new master. The maximum master gap can be calculated and would be expressed as follows:

(timeout x interval + quantum interval

With default values this would resolve to:

(10 x 150) + (1500 x 0.2) = 1800ms or 1.8s

Rolling upgrade procedure

Ensure the cluster is stable | `asadm -e 'info network'`

Using the info network asadm command, ensure there are no migrations, all the nodes are in the cluster, and all nodes show the same key.

Admin> info network
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~Network Information (2021-05-27 01:21:44 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
                Node|         Node ID|             IP|    Build|Migrations|\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~Cluster\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~|Client|  Uptime
                    |                |               |         |          |Size|         Key|Integrity|      Principal| Conns|        
aero-cluster1_1:3000| BB9211EF53E1600|10.0.3.196:3000|E-5.6.0.3|   0.000  |   4|39815062484B|True     |BB9D9FC68290C00|     6|28:34:44
aero-cluster1_2:3000| BB9211EF53E1600|10.0.3.196:3000|E-5.6.0.3|   0.000  |   4|39815062484B|True     |BB9D9FC68290C00|     4|28:34:44
aero-cluster1_3:3000| BB906A7363E1600|10.0.3.149:3000|E-5.6.0.3|   0.000  |   4|39815062484B|True     |BB9D9FC68290C00|     4|28:34:44
aero-cluster1_4:3000|*BB9D9FC68290C00| 10.0.3.41:3000|E-5.6.0.3|   0.000  |   4|39815062484B|True     |BB9D9FC68290C00|     6|28:34:44
Number of rows: 4

Issue the quiesce command | `manage quiesce`

The quiesce command can be issued from asadm or directly through asinfo. If using asadm, it should be directed to the node to be quiesced using the with modifier, specifying the IP address or node ID of the node to be quiesced:

note

Tools package 6.0.x or later is required to use asadm's manage quiesce command. Otherwise, use the equivalent asinfo - quiesce command.

Admin+> manage quiesce with 10.0.3.224
\~\~\~Quiesce Nodes\~\~\~\~
       Node|Response
ubuntu:3000|ok      
Number of rows: 1

Verify the command has been successful by checking the pending_quiesce statistic:

Admin+> show statistics like pending_quiesce
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~bar Namespace Statistics (2021-05-27 00:53:40 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node           |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
pending_quiesce|false               |true                |false               |false
Number of rows: 2

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~test Namespace Statistics (2021-05-27 00:53:40 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node           |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
pending_quiesce|false               |true                |false               |false
Number of rows: 2

note

Delaying fill migrations is a good practice in many common situations (independent of quiescing). See the migrate-fill-delay configuration parameter for details.
If the manage quiesce command has been issued to the wrong node, use manage quiesce undo to revert it.
Quiescing happens for all namespaces on a node at once. Quiescing individual namespaces is not supported.

note

If the quiesce command is inadvertently issued against all the nodes in the cluster, the subsequent recluster command will be ignored:

WARNING (partition): (partition_balance_ee.c:435) {test} can't quiesce all nodes - ignoring

Issue a recluster command | `manage recluster`

This is the step where quiesced masters will handoff to other nodes and migrations will start.

Issue a recluster command:

note

Tools package 6.0.x or later is required to use asadm's manage recluster command. Otherwise, use the equivalent asinfo - recluster command.

Admin+> manage recluster
Successfully started recluster

Verify command was successful by checking the effective_is_quiesced and nodes_quiesced statistics:

Admin+> show statistics like quiesce
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~bar Namespace Statistics (2021-05-27 01:08:57 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node                 |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
effective_is_quiesced|false               |true                 |false              |false     
nodes_quiesced       |1                   |1                    |1                  |1          
pending_quiesce      |false               |true                 |false              |false      
Number of rows: 4

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~test Namespace Statistics (2021-05-27 01:08:57 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node                 |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
effective_is_quiesced|false               |true                 |false              |false     
nodes_quiesced       |1                   |1                    |1                  |1          
pending_quiesce      |false               |true                 |false              |false      
Number of rows: 4

Check that no more transactions are hitting the quiesced node and proxies counters are as expected

1) The quiesced node should not be receiving any active traffic

A few seconds should be enough to be sure all clients have "moved" from the quiesced node to the new masters.

The asadm show latencies command can be used to check that the read and write throughput to the quiesced node is down to zero. Usual metrics can be used to verify other type of transactions (e.g batch) have also stopped against the node to be quiesced.

Admin+> show latencies
\~\~\~\~\~\~\~\~\~\~\~Latency  (2021-05-27 01:12:04 UTC)\~\~\~\~\~\~\~\~\~\~
Namespace|Histogram|                Node|ops/sec|>1ms|>8ms|>64ms
test     |read     |aero-cluster1_1:3000|    9.3| 0.0| 0.0|  0.0
test     |read     |aero-cluster1_2:3000|    0.0| 0.0| 0.0|  0.0
test     |read     |aero-cluster1_3:3000|    7.5| 0.0| 0.0|  0.0
test     |read     |aero-cluster1_4:3000|    8.4| 0.0| 0.0|  0.0
         |         |                    |   25.2| 0.0| 0.0|  0.0
test     |write    |aero-cluster1_1:3000|    2.0| 0.0| 0.0|  0.0
test     |write    |aero-cluster1_2:3000|    0.0| 0.0| 0.0|  0.0
test     |write    |aero-cluster1_3:3000|    3.2| 0.0| 0.0|  0.0
test     |write    |aero-cluster1_4:3000|    4.5| 0.0| 0.0|  0.0
         |         |                    |    9.9| 0.0| 0.0|  0.0
Number of rows: 8

note

For Aerospike server versions 5.7 and earlier, client libraries send SI query transactions to all nodes in the cluster. If the workload involves SI queries, a quiesced node shows a value in the ops/sec column of query histograms. This can be safely ignored. The quiesced node owns no partitions and so will not return any records, though it will track query progress. You will observe the same behavior for older client libraries that do not query per partition, against server versions 6.0 and later.

2) The quiesced node should no longer be doing proxy

There would typically be a second or two of proxy transactions on the node that was quiesced as clients retrieve the updated partition map and start directing transactions to the new master nodes for the partitions previously owned by the quiesced nodes. It is a good practice to also monitor for proxies transactions to stop on the quiesced node prior to shutting it down.

On the quiesced node, confirm that the following statistics are not incrementing. For details regarding the metrics, refer to the Metrics Reference page.

Admin+> show statistics like client_proxy
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~bar Namespace Statistics (2021-05-27 01:17:48 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node                 |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
client_proxy_complete|0                   |20                  |0                   |0
client_proxy_error   |0                   |0                   |0                   |0 
client_proxy_timeout |0                   |0                   |0                   |0         
Number of rows: 4

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~test Namespace Statistics (2021-05-27 01:17:48 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node                 |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
client_proxy_complete|0                   |10                  |0                   |0
client_proxy_error   |0                   |0                   |0                   |0 
client_proxy_timeout |0                   |0                   |0                   |0         
Number of rows: 4

Admin+> show statistics like batch_sub_proxy
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~bar Namespace Statistics (2021-05-27 01:17:48 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node                    |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
batch_sub_proxy_complete|0                   |20                  |0                   |0
batch_sub_proxy_error   |0                   |0                   |0                   |0 
batch_sub_proxy_timeout |0                   |0                   |0                   |0         
Number of rows: 4

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~test Namespace Statistics (2021-05-27 01:17:48 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node                    |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
batch_sub_proxy_complete|0                   |20                  |0                   |0
batch_sub_proxy_error   |0                   |0                   |0                   |0 
batch_sub_proxy_timeout |0                   |0                   |0                   |0         
Number of rows: 4

3) The non-quiesced nodes should not be a destination of proxy transactions

On the nodes that were not quiesced, check statistics for the transactions that confirm that the destination of proxy transactions. These statistics would be the ones that are beginning with from_proxy.

Admin > show statistics like from_proxy_read
Admin > show statistics like from_proxy_write
Admin > show statistics like from_proxy_batch_sub

Other ways to monitor proxies can be to simply monitor the proxy transactions on the client transaction metric log line or, alternatively, dynamically enable proxy histogram and monitor their throughput using the Log Latency Tool.

Take down the quiesced node and proceed with the upgrade or maintenance

$ sudo systemctl stop aerospike

Verify the node has stopped and the cluster is now showing one less node:

Admin> info network
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~Network Information (2021-05-27 01:21:44 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
                Node|         Node ID|             IP|    Build|Migrations|\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~Cluster\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~|Client|  Uptime
                    |                |               |         |          |Size|         Key|Integrity|      Principal| Conns|        
aero-cluster1_1:3000| BB9211EF53E1600|10.0.3.196:3000|E-5.6.0.3|   0.000  |   3|39815062484B|True     |BB9D9FC68290C00|     6|28:34:44
aero-cluster1_3:3000| BB906A7363E1600|10.0.3.149:3000|E-5.6.0.3|   0.000  |   3|39815062484B|True     |BB9D9FC68290C00|     4|28:34:44
aero-cluster1_4:3000|*BB9D9FC68290C00| 10.0.3.41:3000|E-5.6.0.3|   0.000  |   3|39815062484B|True     |BB9D9FC68290C00|     6|28:34:44
Number of rows: 3

The nodes_quiesced statistic is now back to 0:

Admin+> show statistics like quiesce
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~bar Namespace Statistics (2021-05-27 01:08:57 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node                 |aero-cluster1_1:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
effective_is_quiesced|false               |false                 |false               
nodes_quiesced       |0                   |0                     |0
pending_quiesce      |false               |false                 |false                 
Number of rows: 4

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~test Namespace Statistics (2021-05-27 01:08:57 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node                 |aero-cluster1_1:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
effective_is_quiesced|false               |false                 |false               
nodes_quiesced       |0                   |0                     |0
pending_quiesce      |false               |false                 |false                 
Number of rows: 4

Proceed with the upgrade or other maintenance needed.

Bring the quiesced node back up

Bring the quiesced node back up, and make sure it joins the cluster:

$ sudo systemctl start aerospike

Admin> info network
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~Network Information (2021-05-27 01:21:44 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
                Node|         Node ID|             IP|    Build|Migrations|\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~Cluster\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~|Client|  Uptime
                    |                |               |         |          |Size|         Key|Integrity|      Principal| Conns|        
aero-cluster1_1:3000| BB9211EF53E1600|10.0.3.196:3000|E-5.6.0.3|   1.015 K|   4|39815062484B|True     |BB9D9FC68290C00|     6|28:34:44
aero-cluster1_2:3000| BB912EA003E1600|10.0.3.224:3000|E-5.6.0.3|   2.798 K|   4|39815062484B|True     |BB9D9FC68290C00|     6|00:00:00
aero-cluster1_3:3000| BB906A7363E1600|10.0.3.149:3000|E-5.6.0.3|   828.000|   4|39815062484B|True     |BB9D9FC68290C00|     4|28:34:44
aero-cluster1_4:3000|*BB9D9FC68290C00| 10.0.3.41:3000|E-5.6.0.3|   953.000|   4|39815062484B|True     |BB9D9FC68290C00|     6|28:34:44

Wait for migrations to complete | `asinfo -v 'cluster-stable:...`

Make sure migrations have completed prior to moving on to the next node. This is done using the cluster-stable command. The command should be run on each node in the cluster. The command should return the same cluster key for every node in the cluster. The cluster-stable can be scripted and the results compared programmatically.

For most common cases, migrations at this point would only consist of lead migrations and the time required for completion would be proportional to how long the node has been quiesced.

For situations where the stored data is deleted, or when there is no persisted storage, migrations would need to repopulate the data on the node, which would usually take longer. A cold restart occurring could also have impact on how long migrations would take as a node would take longer to cold restart and could also, in some cases, resurrect previously deleted records.

Admin+> asinfo -v 'cluster-stable:size=4;ignore-migrations=no'
aero-cluster1_4:3000 (10.0.3.41) returned:
5EDF7C44A664

aero-cluster1_2:3000 (10.0.3.224) returned:
5EDF7C44A664

aero-cluster1_3:3000 (10.0.3.149) returned:
5EDF7C44A664

aero-cluster1_1:3000 (10.0.3.196) returned:
5EDF7C44A664

note

Waiting for migrations to complete in this step is necessary to ensure that when quiescing the next node, the master ownership change will happen as soon as the recluster is done. Quiescing the next node without waiting for migrations to complete would be also possible but then requires to wait for migrations to complete prior to shutting the node to avoid abrupt client and fabric connections cut offs.

Move to the next node

Repeat those steps on the next node.

note

Using Quiescence for Extra Durability

Quiescence may also be used to provide extra durability in various scenarios.

For example, in an AP cluster (replication factor 2) in which a node must be taken down, if it is quiesced, a rebalance triggered, and migrations do complete before removing the quiesced node, two full copies will be present when the node is removed. The cluster will then still have all data available if another node accidentally goes down before the first node returns.

note

Quiescing multiple nodes

Quiescing multiple nodes at once can be useful in rack-aware clusters. In such cases, quiescing a whole rack at once can speed up maintenance procedures.

In strong-consistency enabled namespaces, quiescing replication-factor number of nodes or more will force masterhood handover (after migrations complete) but will result in unavailability when the nodes are eventually shut down (unless if all nodes quiesced are in the same rack).

note

Quiescing nodes for namespaces with replication factor 1

For namespaces configured with replication-factor 1, it is necessary to wait for migrations to complete prior to shutting down the server in order for the single copy of each partition owned by that node to have migrated to another node.

Context​

Using Quiescence for a Smooth Master Handoff​

Rolling upgrade procedure​

Ensure the cluster is stable | asadm -e 'info network'​

Issue the quiesce command | manage quiesce​

Issue a recluster command | manage recluster​

Check that no more transactions are hitting the quiesced node and proxies counters are as expected​

Take down the quiesced node and proceed with the upgrade or maintenance​

Bring the quiesced node back up​

Wait for migrations to complete | asinfo -v 'cluster-stable:...​

Move to the next node​

Context

Using Quiescence for a Smooth Master Handoff

Rolling upgrade procedure

Ensure the cluster is stable | `asadm -e 'info network'`

Issue the quiesce command | `manage quiesce`

Issue a recluster command | `manage recluster`

Check that no more transactions are hitting the quiesced node and proxies counters are as expected

Take down the quiesced node and proceed with the upgrade or maintenance

Bring the quiesced node back up

Wait for migrations to complete | `asinfo -v 'cluster-stable:...`

Move to the next node