Skip to main content
Loading

Managing Storage

Aerospike can generate large amounts of data in a short time. The following examples describe your options for configuring Aerospike to manage storage, without requiring continual intervention by administrators.

note

Refer to Configuring Namespace Data Retention for definitions of expiration, eviction, and stop-writes, and the configuration parameters for controlling those processes.

Defragmentation

Aerospike runs a continuous background defragmentation process to maximize the amount of available storage. When block usage drops below the defrag-lwm-pct limit, storage occupied by stale data is reclaimed for use.

Aerospike needs half of the disk to be free, in order to efficiently defragment the SSD while also performing a high volume of operations at low latency. When a high-water mark threshold for namespace disk usage is met, by default set at 50%, it triggers the namespace supervisor (NSUP) to start evicting data. Setting this parameter to zero stops disk-based eviction.

note

Evictions are disabled by default in server 4.9 and later.

When defragmentation cannot keep up with storage requirements, you may have to increase the defragmentation rate.

You can use asadm to check storage statistics. The following command shows the current device_available_pct for the test namespace:

asadm --enable -e "show statistics like device_available_pct for test"

Aerospike defragmentation

Aerospike writes data to storage-engine device namespaces in large blocks of the size configured with the write-block-size parameter. Each block is filled with incoming write transactions and then written to the device:

  • When the swb (streaming write buffer of size write-block-size) is full, or when the next record to be written doesn't fit.

  • When the swb has not been flushed for flush-max-ms milliseconds. The default is one second.

  • On every write transaction when configured through the commit-to-device parameter for strong-consistency enabled namespaces.

Written blocks may remain in the page and hardware caches. The fsync-max-sec configuration parameter controls when data is pushed from these caches.

As records are updated or deleted, the active records capacity of the blocks on disk decreases. When a block usage level falls below the value set by the defrag-lwm-pct parameter, it becomes eligible for defragmentation and is queued up in the storage-engine.device[ix].defrag_q. The default value of defrag-lwm-pct is 50%.

The following four configuration parameters can be tuned for the defragmentation sub-system. You can set them dynamically, or in the aerospike.conf server configuration file for a persistent configuration:

  • defrag-lwm-pct: The default is 50%. A higher percentage means more blocks to be defragmented, and more dense data on the disk. The value of 50% provides a good balance between space usage and write amplification. For a given use case it may be desirable to increase defrag-lwm-pct and gain more usable space on the disk. In such instances, for example when the workload is read-heavy, write-amplification may be less of a factor. This should be tested, particularly to observe the effect on defragmentation load during operations which generate a lot of deletions, such as truncation or partitions dropping during migration.

  • defrag-sleep: The default sleep time is 1000 microseconds after each wblock is defragmented.

  • defrag-startup-minimum defaults to 10%. If a minimum of 10% of the disk is not writable then the server will not join the cluster or open a service port.

note

The disk might appear full to Aerospike, because it writes all data in blocks. Use device_free_pct to see the total available writable space across all devices in the namespace.

  • defrag-queue-min: The default is 0, do not defragment. Use a value greater than zero to define how many wblocks in the defrag-queue will initiate defragmentation.

The server log captures the defragmentation profile:

NAMESPACE-NAME /dev/sda: used-bytes 296160983424 free-wblocks 885103 write-q 0 write (12659541,43.3) defrag-q 0 defrag-read (11936852,39.1) defrag-write (3586533,10.2) shadow-write-q 0 tomb-raider-read (13758,598.0)

The details for each parameter are described in the log reference manual. The following metrics capture device statistics:

In the example log line, the writes per sec are greater than the defragmentation writes (note that the writes per sec include the defrag writes per second). Initially, this may not pose a problem but over a period of time, you may be running low on device_available_pct. You may also want to monitor the defrag-q, which should not be constantly increasing. If you determine the node is falling behind and the logs show an empty defragmentation queue, consider raising the defrag-lwm-pct slightly. Be aware that raising the defrag-lwm-pct will have a non-linear write amplification.

Search for write and defrag-write in your server logs to see more useful information:

tail -f /var/log/aerospike/aerospike.log | grep -ie write -e defrag-write /var/log/aerospike/aerospike.log

Increasing the defragmentation rate

You may need to temporarily decrease the defrag-sleep and increase the defrag-lwm-pct parameters.

Use the asadm command-line interface to change defrag-sleep:

Admin> enable
Admin+> manage config namespace TEST storage-engine param defrag-sleep to 500 with 10.0.0.1:3000

Expected output:

~Set Namespace Param defrag-sleep to 500~
Node|Response
10.0.0.1:3000|ok
Number of rows: 1

Change defrag-sleep:

Admin+> manage config namespace TEST storage-engine param defrag-lwm-pct to 60 with 10.0.0.1:3000

Expected output:

~Set Namespace Param defrag-lwm-pct to 60~
Node|Response
10.0.0.1:3000|ok
Number of rows: 1

The new values will not persist after a server restart. Add your desired values to aerospike.conf, in the namespace storage-engine section, to make them persistent:

defrag-sleep 500
defrag-lwm-pct 60

Stop-writes

The namespace configuration parameters stop-writes-pct and stop-writes-sys-memory-pct (default 90%) control the level of memory use that causes a node to stop accepting new client writes.

min-avail-pct (default 5%) and max-used-pct (default 70%) determine the minimum and maximum percentages of disk usage that cause the node to stop accepting new client writes.

note

min-avail-pct measures free wblocks (write blocks), while max-used-pct measures namespace disk usage in bytes, compared to its total disk capacity.

You can dynamically modify any of these stop-writes configuration parameters with asadm:

asadm --enable -e "manage config namespace TEST param max-used-pct to 85 with 10.1.2.3"

Alternatively, use asinfo:

# asinfo only talk to one node at a time
asinfo -h 10.1.2.3 -v "set-config:context=namespace;id=TEST;max-used-pct=85"

You can view your configured stop-writes parameters and their state with asadm's show stop-writes command.

Verifying evictions

The eviction counter is reset every time the server is restarted. Use the asadm info command to verify that evictions are working the way you want:

Admin> info

This prints the free disk and memory available for each namespace. It also prints the configured limits to the high-water mark threshold for both memory and disk.

asadm -e "show statistics namespace for TEST like hwm_breached"

Inspect the Aerospike log for messages that show you may be evicting data. Run the following command on individual nodes:

grep -e "hwm_breached" -e "stop_writes" /var/log/aerospike/aerospike.log

NSUP not keeping up

If NSUP is not able to keep up with expiring records, it might take the node a long time to restart, as the node will first remove expired records before rejoining the cluster. In server 6.3 and later, if the NSUP cycle takes longer than 2 hours and deletes more than 1% of the namespace, a warning line is written to the server log.

You can monitor the NSUP statistics nsup_cycle_duration and nsup_cycle_deleted_pct. These are the stats used by the Monitoring Stack to trigger alerts and visually warn users.

You can control NSUP by dynamically configuring nsup-period and nsup-threads.

asadm --enable -e  "manage config namespace TEST param nsup-threads to 3"

Nodes will not start if there is not enough storage

If the database does not have enough contiguous storage to start, and does not have enough space to defragment to get the space it needs, it will not start.

For persistence files for in-memory databases, specify the size of the persistence file (in contrast to using an SSD, where you use the entire SSD). The persistence file size can also run out of space and the same rules apply as for SSDs.

When a namespace runs low on storage

When a namespace can no longer write data, you will see error messages in the log, like this example message:

Sep 05 2022 21:28:48 GMT: INFO (namespace): (base/namespace.c:458) {test} lwm breached true, hwm_breached true, stop_writes true, memory sz:22971755648 nobjects:358933683 nbytesmem:0 hwm:23192823808 sw:34789232640, disk sz:216122189312 hwm:216116854784 sw:341237137408

This shows that the namespace test on the node has reached the high-water-mark for either disk or memory, and the stop-writes percentage. As a result, the namespace can no longer accept write requests. Messages that look like this are the result of the stop_write parameter being true either on this node, or other nodes:

Sep 05 2022 21:28:48 GMT: INFO (rw): (base/thr_rw.c:2300) writing pickled failed 8 for digest 7318ad7422e51009

Resolve this by adjusting configuration parameters:

  1. Speed up your current eviction rate by reducing the memory or disk high-water-mark (high-water-disk-pct, high-water-memory-pct).
  2. Slow your migration speed, if migrations are active.
  3. Increase your defragmentation priority or rate.
  4. Increase the stop-writes-pct (or stop-writes-sys-memory-pct), which is the percentage of disk usage above which the database will stop writing new records.
caution

Increasing the stop-writes parameters should not be done on a permanent basis. You need to find a permanent solution by reviewing your capacity and ensuring that there is sufficient storage.

All of these parameters can be changed dynamically in the main Aerospike configuration file on the node.

Avoiding 0% available space

When storage running low occurs too frequently, you will see log entries similar to the following:

Apr 27 2022 02:53:12 GMT: WARNING (drv_ssd): (storage/drv_ssd.c:1844) could not allocate storage on device /dev/sdb

When the device_available_pct (or pmem_available_pct for PMem storage) goes to zero, all the subsequent writes will fail. This should not happen if the default min-avail-pct is not modified.

caution

Taking a server down increases traffic/data on the other nodes. Do not take any servers down if you are in a data overflow situation.

If only a single node is having problems because of a hardware problem, then taking down the problematic node may resolve the situation.

The solutions discussed here are short-term, temporary updates. In the longer term, you need to add capacity to resolve storage overflow problems.