Skip to main content

Consistency Management

This section covers how to add and remove nodes in strong consistency namespaces, as well as how to detect and repair unavailability.

Adding Additional Nodes

This section will guide you through the process of adding a node to an existing namespace configured with "strong-consistency".

Install & Configure the Additional Nodes

Install and configure Aerospike on the new nodes as you had done in "Install & Configure for Strong Consistency".

When the nodes have joined the cluster we will see the service's cluster_size being greater than ns_cluster_size - also the show roster command will show the newly observed nodes in its Observed Nodes.

note

Tools package 6.2.x or later is required to use asadm's manage roster commands. Otherwise, use the equivalent asinfo - roster and asinfo - roster-set commands.


Admin> show stat -flip like cluster_size
~Service Statistics (2021-10-22 23:43:35 UTC)~
Node|cluster_size
node1.aerospike.com:3000| 6
node2.aerospike.com:3000| 6
node4.aerospike.com:3000| 6
node5.aerospike.com:3000| 6
node6.aerospike.com:3000| 6
node7.aerospike.com:3000| 6
Number of rows: 6

~test Namespace Statistics (2021-10-22 23:43:35 UTC)~
Node|ns_cluster_size
node1.aerospike.com:3000| 5
node2.aerospike.com:3000| 5
node4.aerospike.com:3000| 5
node5.aerospike.com:3000| 5
node6.aerospike.com:3000| 5
node7.aerospike.com:3000| 5
Number of rows: 6

Adding the new nodes to the Roster

Copy the Observed Nodes list into the Pending Roster using asadm's manage roster stage observed command:

Admin> enable
Admin+> manage roster stage observed ns test
Pending roster now contains observed nodes.
Run "manage recluster" for your changes to take affect.

Activate the new roster with a manage recluster:

note

Tools package 6.0.x or later is required to use asadm's manage recluster commands. Otherwise, use the equivalent asinfo - recluster

Admin+> manage recluster
Successfully started recluster

Run the show roster command and confirm that the roster has been updated on all nodes to the roster you had set. Also make certain the service's cluster_size matches the namespace's ns_cluster_size.

Congratulations, you have successfully added nodes to your cluster.

Removing Nodes

This section will guide you through the process of removing a node from an existing namespace configured with strong-consistency.

The general process is to remove a node from a cluster first. This will begin the data migration process, and after the data is safely replicated elsewhere, you can remove the node from the namespaces's roster.

Important : When removing nodes, take care to not remove replication-factor or more nodes at a time. Removing too many nodes simultaneously will result dead_partitions, which is your signal of potential data loss. If you observe dead partitions, you may wish to re-add the nodes and wait for the cluster to synchronize before proceeding again.

note

Strong consistency implies the guarantee that with replication factor of N, N copies of data will be written to the cluster. A fully formed cluster must contain X nodes where X >= N to satisfy this. Creating a cluster where X = N will mean that all partitions become unavailable during a single node shutdown.

note

Namespaces with replication-factor set to 1 will have some partitions unavailable whenever any node leaves the cluster, making it impractical to perform a rolling restart/upgrade.

Removing the Nodes from the Cluster

note

Tools package 6.2.x or later is required to use asadm's manage roster & show roster commands. Otherwise, use the equivalent asinfo - roster and asinfo - roster-set commands.

Execute a show roster command to make sure all roster nodes are present in the cluster

Admin+> pager on
Admin+> show roster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Roster (2021-10-23 00:08:54 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Node| Node ID|Namespace| Current Roster| Pending Roster| Observed Nodes
node1.aerospike.com:3000|BB9070016AE4202 |test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node3.aerospike.com:3000|BB9060016AE4202 |test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node4.aerospike.com:3000|BB9050016AE4202 |test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node5.aerospike.com:3000|BB9040016AE4202 |test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node6.aerospike.com:3000|BB9010016AE4202 |test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node7.aerospike.com:3000|*BB9020016AE4202|test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
Number of rows: 6

Shutdown the nodes to be removed - remember, fewer than replication-factor.

In this case we are shutting down node1.aerospike.com:3000 with node id BB9070016AE4202.

systemctl stop aerospike

Execute a show stat unavailable command to make sure there are no unavailable_partitions.

If an unrelated failure happened during this process, restart the downed nodes, wait for migrations to complete, and restart this procedure.

Admin> show stat like unavailable for test -flip
~test Namespace Statistics (2021-10-23 00:15:26 UTC)~
Node|unavailable_partitions
node2.aerospike.com:3000| 0
node4.aerospike.com:3000| 0
node5.aerospike.com:3000| 0
node6.aerospike.com:3000| 0
node7.aerospike.com:3000| 0
Number of rows: 5

Wait for migrations to complete. Execute a stat command on partitions_remain until the remaining partitions become zero on all nodes.

Admin> show stat service like partitions_remain -flip
~~~~~Service Statistics (2021-10-23 00:25:10 UTC)~~~~
Node|migrate_partitions_remaining
node2.aerospike.com:3000| 0
node4.aerospike.com:3000| 0
node5.aerospike.com:3000| 0
node6.aerospike.com:3000| 0
node7.aerospike.com:3000| 0
Number of rows: 5

We will now remove the node from the roster, which involves setting the roster to all the nodes except the node we are removing.

Execute a show roster command to ensure BB9070016AE4202 is removed from the pending roster.

Notice that the observed nodes is one smaller than the Current Roster and Pending Roster.

Admin+> pager on
Admin+> show roster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Roster (2021-10-23 00:26:56 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Node| Node ID|Namespace| Current Roster| Pending Roster| Observed Nodes
node1.aerospike.com:3000|BB9070016AE4202 |test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node3.aerospike.com:3000|BB9060016AE4202 |test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node4.aerospike.com:3000|BB9050016AE4202 |test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node5.aerospike.com:3000|BB9040016AE4202 |test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node6.aerospike.com:3000|BB9010016AE4202 |test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node7.aerospike.com:3000|*BB9020016AE4202|test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
Number of rows: 6

Copy the Observed Nodes list into the Pending Roster using asadm's manage roster stage observed command.

Admin+> manage roster stage observed ns test
Pending roster now contains observed nodes.
Run "manage recluster" for your changes to take affect.

Issue a manage recluster to apply the change.

note

Tools package 6.0.x or later is required to use asadm's manage recluster commands. Otherwise, use the equivalent asinfo - recluster

Admin+> manage recluster
Successfully started recluster

Server Shutdown

The safe way to operationally shutdown a server is to execute standard process stop commands. For example:

systemctl stop aerospike

But any operational procedure that executes a SIGTERM is safe. After startup, a SIGTERM will flush data to disk and properly signal other servers.

A special case is restarting during startup. Please avoid executing a process shutdown during startup - and reference the section on Dead Partitions to learn more.

To check if startup is complete, run

asinfo -h [ip of host] -v 'status'

periodically until it returns "ok".

In the future, we plan to enable safe shutdowns during startup, which will render this check unnecessary.

Validating Partition Availability

In cases such as network partitions and server hardware failure, partitions can become unavailable or dead. A dead partition is a special kind of unavailable: for full explanation see the following section.

After making a change, you will often expect all partitions to be available. This is a good command to execute after any operational change where you expect all data to be available.

The following command shows - for each node in the cluster - which nodes believe there are dead or unavailable partitions. Please note that each node will report the global number of dead or unavailable partitions. For example, if the entire cluster has determined that 100 partitions are unavailable, all of the current nodes will report that there are 100 partitions are unavailable.

Note: please do NOT use the asadm 'show pmap' command. This command has not been updated to 4.0 partition states.

show stat namespace for [namespace name] like 'unavailable|dead' -flip
Admin> show stat namespace for test like 'unavailable|dead' -flip
~~~~~~test Namespace Statistics (2021-10-23 00:36:43 UTC)~~~~~~~
Node|dead_partitions|unavailable_partitions
node1.aerospike.com:30000| 0| 0
node2.aerospike.com:30000| 0| 0
node4.aerospike.com:30000| 0| 0
node5.aerospike.com:30000| 0| 0
node6.aerospike.com:30000| 0| 0
Number of rows: 5

In the output of this command, the columns "unavailable" and "dead" should be 0 for each node.

If you have unavailable partitions, then there may be missing data. There are nodes which are expected in the roster but are not currently in the cluster. Please validate that your rosters are as expected. If roster nodes are missing from the cluster, please take the initial step of restoring the missing nodes by changing the roster, or fixing hardware or network issues.

If you have dead partitions, it means that all the roster nodes are in the cluster, but partitions are still unavailable due to storage failure or potential loss due to buffered writes. Please see the following discussion of dead partitions to determine the correct next operational step.

Dead Partitions

Certain failures of hardware, combined with certain Aerospike configurations, may result in cases where data has been lost.

The cases that may result in data loss are:

  1. Drive loss, such as user actions to erase drives ( clearing or wiping ), or catastrophic hardware failure
  2. If not commit-to-device, untrusted or unclean shutdown ( crash, SIGKILL, SIGSEGV, and others )
  3. Clean shutdown during a Fast Restart ( potentially removed in future versions, as data loss is not possible )

In these cases, you may or may not have lost data --- depending on the number of nodes simultaneously affected, the replication factor, and whether data migration has completed for the partitions in question.

Aerospike detects whether you have multiple failures in these cases, and detects potential true data loss. Affected partitions are marked "dead", and require operator intervention to continue. To continue to allow reads would violate Strong Consistency.

The effect of unclean shutdown can be limited by setting the commit-to-device namespace option. With the commit-to-device option, simultaneous crashes are known not to lose data, thus never generate dead partitions. However, enabling commit-to-device generates a flush on every write, and thus comes with performance penalties that should be measured - but may be minimal for low write throughput cases, or on high performance storage such as some direct attach NVMe drives or some high performance enterprise SANs.

If you do not have commit-to-device enabled, you will often see server crashes generating "dead" partitions. User-generated restarts (upgrades) will not generate this effect, as a restart will flush to disk on shutdown even without commit-to-device.

Dead partitions are detected though the commands in the above section regarding Validating Partition Availability above.

In the case of detected potential data loss, the cluster maintains unavailable dead partitions in order to allow you to take corrective action. You may decide that no data has been lost (such as shutdown during Fast Restart, running a test cluster, determination that availability is more important than correctness in this case,you may decide that availability in the case of this potential data loss is preferable to your business, or you may decide to restore data from an external source ( if available ). If you do have an external trusted source, you might consider disabling applications, reviving the potentially flawed namespace, restoring from the external trusted source, then enabling applications. With Aerospike's feature that alerts you of these conditions, consistent reads are guaranteed allowing time for correct operator intervention.

Reviving Dead Partitions:

This process should be followed in the operational cases where you wish to use your namespace in the face of missing data. For example, you may have entered a maintenance state where you have disabled application use, and are preparing to reapply data from a reliable message queue or other source.

Notice that you have dead partitions

show stat namespace for test like dead -flip
Admin> show stat namespace for test like dead -flip
~test Namespace Statistics (2021-10-23 00:38:41 UTC)~
Node|dead_partitions
node1.aerospike.com:3000| 264
node2.aerospike.com:3000| 264
node4.aerospike.com:3000| 264
node5.aerospike.com:3000| 264
node6.aerospike.com:3000| 264
Number of rows: 5

Execute revive to acknowledge the potential data loss on each server.

note

Tools package 6.2.x or later is required to use asadm's manage revive commands. Otherwise, use the equivalent asinfo - revive

Admin+> manage revive ns test
~~~Revive Namespace Partitions~~~
Node|Response
node1.aerospike.com:3000|ok
node2.aerospike.com:3000|ok
node4.aerospike.com:3000|ok
node5.aerospike.com:3000|ok
node6.aerospike.com:3000|ok
Number of rows: 5

Execute recluster to enliven the dead partitions.

note

Tools package 6.0.x or later is required to use asadm's manage recluster commands. Otherwise, use the equivalent asinfo - recluster

Admin+> manage recluster
Successfully started recluster

Verify that there are no longer any dead partitions using the dead_partitions stat.

show stat like dead_partitions -flip
Admin> show stat namespace for test like dead -flip
~test Namespace Statistics (2021-10-23 00:40:41 UTC)~
Node|dead_partitions
node1.aerospike.com:3000| 264
node2.aerospike.com:3000| 264
node4.aerospike.com:3000| 264
node5.aerospike.com:3000| 264
node6.aerospike.com:3000| 264
Number of rows: 5