Skip to main content
Loading

Best practices for Aerospike and Linux

These steps outline stability and performance best practices for Aerospike and the Linux operating system.

Best practice checks at startup

When the Aerospike Database server starts (version 5.7 and later) it verifies certain best practices and, by default, logs a warning for each violation that is found. For production environments, it is recommended to set enforce-best-practices to true. When enforce-best-practices is set to true, the server shuts down if any of the best practices are found to be violated during startup.

If you choose to leave enforce-best-practices set to false, you can still monitor violations with the failed_best_practices Boolean stat or the best-practices info command. The failed_best_practices stat reports true if any best practice was violated during startup. The best practices info command returns the list of best practices that failed.

The following is a list of best practices checked at startup:

Aerospike database best practices

service-threads

The recommended value for service-threads depends on the configuration of the namespaces in the aerospike.conf file:

The service-threads best practice is checked at server startup.

memory-size

We recommend that the cumulative sum of the memory-size configuration not exceed the total memory on the machine.

The memory-size best practice is checked at server startup.

Namespace device size

All the devices which a namespace uses for storage should be the same size, within an 8 MiB range of tolerance. This best practice is checked at server startup.

Linux best practices

All-Flash deployment

In an All-Flash deployment, the following kernel parameters are required. enforce-best-practices verifies that these kernel parameters have the expected values.

/proc/sys/vm/dirty_bytes = 16777216
/proc/sys/vm/dirty_background_bytes = 1
/proc/sys/vm/dirty_expire_centisecs = 1
/proc/sys/vm/dirty_writeback_centisecs = 10
  • When running as non-root, you must set these values before running the Aerospike server.
  • When running as root, the server configures them automatically.

Either way, if these parameters can't be correctly set (manually or automatically by the server), the node will not start.

RAM reserved for Linux operating system resources

To help prevent out-of-memory issues with host hardware, keep 10-15% of total physical memory reserved for Linux system resources.

The following may influence memory usage:

  • Overhead from the Linux OS and services.
  • Overhead caused by memory fragmentation.
  • Overhead from Aerospike indexes (primary & secondary).
  • Namespace data for in-memory namespaces. For more information, see Capacity Planning.
  • Overhead from cache and queue-related configurations, including max-write-cache (per device) and post-write-queue (per device). See Block size and cache size for more information.
  • Overhead from the Aerospike process.

min_free_kbytes

The min_free_kbytes kernel parameter controls how much memory should be kept free and not occupied by filesystem caches. Normally, the kernel occupies almost all free RAM with filesystem caches and free memory up for allocation by processes as required. As Aerospike performs large allocations in shared memory (1GB chunks), the default kernel value may result in an unexpected OOM (out-of-memory kill). It is advisable to configure the parameter to at least 1.1GB, preferably 1.25GB if using cloud vendor drivers - as these too can make large allocations. This ensures that Linux always keeps enough memory available and free for large allocations.

tip

Setting min_free_kbytes too high is likely to cause an out-of-memory error in Aerospike.

Check the parameter value:

cat /proc/sys/vm/min_free_kbytes

If the value is lower, adjust it accordingly to the running kernel and persist across reboots:

echo 3 > /proc/sys/vm/drop_caches
echo 1310720 > /proc/sys/vm/min_free_kbytes
echo "vm.min_free_kbytes=1310720" >> /etc/sysctl.conf

The min_free_kbytes best practice is checked at server startup.

swappiness

For low-latency operations, using swap to any extent drastically slows down performance. It is advisable to disable swap with swapoff -a and remove the swap partition from /etc/fstab.

If that's not possible for operational reasons, set the swappiness to 0, as per below:

echo 0 > /proc/sys/vm/swappiness
echo "vm.swappiness=0" >> /etc/sysctl.conf

The swappiness best practice is checked at server startup.

THP - transparent huge pages

In order to improve overall system responsiveness and allocation speed, The Linux kernel has a feature called Transparent Huge Pages (THP). Unfortunately, for high-throughput and low-latency databases, which perform multiple small allocations, THP can be counterproductive. Having THP can cause the system to run out of RAM, with similar symptoms to a memory leak. Another issue is latency caused by THP defragmentation page locking.

THP must be disabled before the asd daemon (Aerospike process) starts. If asd is already running, perform the setup described below, and then restart the operating system.

Create an init.d file:

cat << 'EOF' >/etc/init.d/disable-transparent-hugepages
#!/bin/bash
### BEGIN INIT INFO
# Provides: disable-transparent-hugepages
# Required-Start: $local_fs
# Required-Stop:
# X-Start-Before: aerospike
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Disable Linux transparent huge pages
# Description: Disable Linux transparent huge pages, to improve
# database performance.
### END INIT INFO

case $1 in
start)
if [ -d /sys/kernel/mm/transparent_hugepage ]; then
thp_path=/sys/kernel/mm/transparent_hugepage
elif [ -d /sys/kernel/mm/redhat_transparent_hugepage ]; then
thp_path=/sys/kernel/mm/redhat_transparent_hugepage
else
return 0
fi

echo 'never' > ${thp_path}/enabled
echo 'never' > ${thp_path}/defrag

re='^[0-1]+$'
if [[ $(cat ${thp_path}/khugepaged/defrag) =~ $re ]]
then
echo 0 > ${thp_path}/khugepaged/defrag
else
echo 'no' > ${thp_path}/khugepaged/defrag
fi

unset re
unset thp_path
;;
esac
EOF

Make the file executable:

chmod +x /etc/init.d/disable-transparent-hugepages

Enable the script (non-systemd system):

# on debian/ubuntu
update-rc.d disable-transparent-hugepages defaults
# on RHEL/centos
chkconfig --add disable-transparent-hugepages

If using systemd, create a systemd unit file:

cat << 'EOF' > /etc/systemd/system/disable-transparent-huge-pages.service
[Unit]
Description=Disable Transparent Huge Pages

[Service]
Type=oneshot
ExecStart=/bin/bash /etc/init.d/disable-transparent-hugepages start

[Install]
WantedBy=multi-user.target
EOF

Enable the new systemd unit file:

systemctl daemon-reload
systemctl enable disable-transparent-huge-pages.service

The thp-enabled and thp-defrag best practices are checked at server startup. The best practices startup check permits these to be set to either madvise or never.

Zone reclaim mode

For NUMA architectures, zone_reclaim_mode allows for more or less aggressive approaches to reclaim memory when the system runs out of memory. When enabled, it causes aggressive reclaims and memory scans which can negatively affect performance.

It is recommended that zone_reclaim_mode be disabled by setting /proc/sys/vm/zone_reclaim_mode to 0.

The zone_reclaim_mode best practice is checked at server startup.

NVMe partitioning

Note that NVMe devices are normally capable of 4 simultaneous I/O operations, due to their connection design - these occupy 4 PCIe I/O lanes. If using raw devices for Aerospike storage, Aerospike suggests that you partition each NVMe device used to at least 4 partitions. This allows 4 write threads to operate in Aerospike and greatly improves the disk speed. If using a single partition with Aerospike as raw device, iostat may show 100% disk utilization (%util), while the await operation queuing statistic may be showing no queueing (await <1 means no queueing is happening) - this indicates that the disk itself can do more, while the PCIe lanes that are used are already being saturated.

Refer to the Partition Your Flash Devices paragraph for further details on device partitioning.

vm.max_map_count

If using Kubernetes or Docker, it is advisable to raise the max_map_count parameter. This parameter controls how many memory map operations can be performed by a process at most. This can be too low and may result in memory allocation issues during normal operation.

To change this parameter:

echo "vm.max_map_count=262144" >> /etc/sysctl.conf
echo 262144 > /proc/sys/vm/max_map_count
note

You may need to restart the Docker daemon and all its containers after making this change in order for the changes to take effect.

Containers - networks

When using Kubernetes or Docker, the default behavior is to use EXPOSE and PUBLISH features to publish ports from a container through the host to the outside world. This causes the Docker process to listen on a given port on the host and forward all packets to the container itself. This is highly inefficient and may cause latencies, packet drops and other crashes within the containers under heavy loads.

If using containers, it is advisable to configure those containers to either:

  1. Use bridged networking, rather than Docker-only NAT.
  2. Use iptables to forward packets to the NAT network Aerospike containers, rather than the Docker EXPOSE port feature. opposed to the docker EXPOSE port feature.

Both solutions presented above result in better network latencies and a more stable network.

Refer to the Docker configuration manuals for further details.

Maximum open file limits

Aerospike clients perform dynamic connections to the database nodes as required. This may result in many active connections. These connections, on a Linux system, hold a file descriptor and are treated as open files. Aerospike has a configuration parameter proto-fd-max to control the maximum number of allowed client connections. The Aerospike server will not start if proto-fd-max is higher than the Linux system's maximum open files configuration for the process.

After installing Aerospike, ensure that the maximum open files for the asd process is configured to have a higher maximums open file value than proto-fd-max - to allow for fabric and heartbeat connections as well as any open files.

Non-systemd: Edit /etc/init.d/aerospike.conf and change the value of the following line.

ulimit -n 100000

For systemd, create an override.conf file to control this:

cat <<EOF > /etc/systemd/system/aerospike.service.d/override.conf
[Service]
LimitNOFILE=<MAX NUMBER OF FILE DESCRIPTORS>
EOF

Then reload the systemd daemon:

systemctl daemon-reload

This change requires restarting the Aerospike server for the new value to be applied.

For versions 5.0 and later, you may also apply this change dynamically to the asd process if prlimit is available:

prlimit --pid $(pgrep asd) --nofile=200000