Skip to main content

Best practices for Aerospike and Linux

These steps outline the basic Linux and Aerospike tuning and configuration parameters required for best Aerospike stability and performance.

Best Practice Checks at Startup

When the Aerospike server starts (5.7 and later) it will verify certain best-practices and, by default, log a warning for each violation that is found. For production environments, it is recommended to set enforce-best-practices to true. When enforce-best-practices is set to true, the server will shutdown if any of the best-practices found to be violated during startup.

If you choose to leave enforce-best-practices set to false, you can still monitor violation with the failed_best_practices Boolean stat or the best-practices info command. The failed_best_practices stat reports true if any best practice was violated during startup, otherwise it will be false. The best-practices info command will return the list of best practices that failed.

The following are a list of best practices checked at startup:

Aerospike Database Best Practices

service-threads

The recommended value for service-threads depends on the configuration of the namespaces in the aerospike.conf file:

The service-threads best-practice is checked at server startup.

memory-size

We recommend that the cumulative sum of the memory-size configuration not exceed the total memory on the machine.

The memory-size best-practice is checked at server startup.

Linux Best Practices

min_free_kbytes

The min_free_kbytes kernel parameter controls how much memory should be kept free and not occupied by filesystem caches. Normally, the kernel occupies almost all free RAM with filesystem caches and free memory up for allocation by processes as required. As Aerospike performs large allocations in shared memory (1GB chunks), the default kernel value may result in an unexpected OOM (out-of-memory kill). It is advisable to configure the parameter to at least 1.1GB, preferably 1.25GB if using cloud vendor drivers - as these too can make large allocations. This ensures that Linux always keeps enough memory available and free for large allocations.

note

Setting this too high will OOM your machine instantly.

Check the parameter value:

$ cat /proc/sys/vm/min_free_kbytes

If the value is lower, adjust it accordingly to the running kernel and persist across reboots:

$ echo 3 > /proc/sys/vm/drop_caches
$ echo 1310720 > /proc/sys/vm/min_free_kbytes
$ echo "vm.min_free_kbytes=1310720" >> /etc/sysctl.conf

The min_free_kbytes best-practice is checked at server startup.

swappiness

For low-latency operations, using swap to any extent drastically slows down performance. It is advisable to disable swap with swapoff -a and remove the swap partition from /etc/fstab.

If that's not possible for operational reasons, at the very least set the swappiness to 0, as per below:

$ echo 0 > /proc/sys/vm/swappiness
$ echo "vm.swappiness=0" >> /etc/sysctl.conf

The swappiness best-practice is checked at server startup.

THP - transparent huge pages

In order to improve overall system responsiveness and allocation speed, The Linux kernel has a feature called Transparent Huge Pages (THP). Unfortunately, for high-throughput and low-latency databases, which perform multiple small allocations, THP can be counter productive. Having THP can cause the system to run out of RAM, with similar symptoms to a memory leak. Another issue is latency caused by THP defragmentation page locking.

THP must be disabled before the asd daemon (Aerospike process) starts. If asd was running before, first setup the below, and then restart the operating system.

Create an init.d file:

cat << 'EOF' >/etc/init.d/disable-transparent-hugepages
#!/bin/bash
### BEGIN INIT INFO
# Provides: disable-transparent-hugepages
# Required-Start: $local_fs
# Required-Stop:
# X-Start-Before: aerospike
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Disable Linux transparent huge pages
# Description: Disable Linux transparent huge pages, to improve
# database performance.
### END INIT INFO

case $1 in
start)
if [ -d /sys/kernel/mm/transparent_hugepage ]; then
thp_path=/sys/kernel/mm/transparent_hugepage
elif [ -d /sys/kernel/mm/redhat_transparent_hugepage ]; then
thp_path=/sys/kernel/mm/redhat_transparent_hugepage
else
return 0
fi

echo 'never' > ${thp_path}/enabled
echo 'never' > ${thp_path}/defrag

re='^[0-1]+$'
if [[ $(cat ${thp_path}/khugepaged/defrag) =~ $re ]]
then
echo 0 > ${thp_path}/khugepaged/defrag
else
echo 'no' > ${thp_path}/khugepaged/defrag
fi

unset re
unset thp_path
;;
esac
EOF

Make the file executable:

chmod +x /etc/init.d/disable-transparent-hugepages

Enable the script (non-systemd system):

# on debian/ubuntu
update-rc.d disable-transparent-hugepages defaults
# on RHEL/centos
chkconfig --add disable-transparent-hugepages

If using systemd, create a systemd script:

cat << 'EOF' > /etc/systemd/system/disable-transparent-huge-pages.service
[Unit]
Description=Disable Transparent Huge Pages

[Service]
Type=oneshot
ExecStart=/bin/bash /etc/init.d/disable-transparent-hugepages start

[Install]
WantedBy=multi-user.target
EOF

Enable systemd script:

systemctl daemon-reload
systemctl enable disable-transparent-huge-pages.service

The thp-enabled and thp-defrag best-practices are checked at server startup, the best-practices startup check permits these to be set to either madvise or never.

Zone reclaim mode

For NUMA architectures, zone_reclaim_mode allows for more or less aggressive approaches to reclaim memory when the system runs out of memory. When enabled, it causes aggressive reclaims and memory scans which can negatively affect performance.

Recommended that zone_reclaim_mode is disabled by setting /proc/sys/vm/zone_reclaim_mode to 0.

The zone_reclaim_mode best-practice is checked at server startup.

NVMe partitioning

Note that NVMe devices are normally capable of 4 simultaneous I/O operations, due to their connection design - these occupy 4 PCIe I/O lanes. If using raw devices for Aerospike storage, Aerospike suggests that you partition each NVMe device used to at least 4 partitions. This allows 4 write threads to operate in Aerospike and greatly improves the disk speed. If using a single partition with Aerospike as raw device, iostat may show 100% disk utilization (%util), while the await operation queuing statistic may be showing no queuing (await <1 means no queuing is happening) - this indicates that the disk itself can do more, while the PCIe lanes that are used are already being saturated.

Refer to the Partition Your Flash Devices paragraph for further details on device partitioning.

vm.max_map_count

If using k8s or docker, it is advisable to raise the max_map_count parameter. This parameter controls how many memory map operations can be performed by a process at most. This can be too low and may result in memory allocation issues during normal operation.

To change this parameter:

$ echo "vm.max_map_count=262144" >> /etc/sysctl.conf
$ echo 262144 > /proc/sys/vm/max_map_count
note

You may need to restart the docker daemon and all it's containers after making this change in order for the changes to take effect.

Containers - Networks

When using k8s or docker, the default behavior is to use EXPOSE and PUBLISH features to publish ports from a container through the host to the outside world. This causes the docker process to listen on a given port on the host and forward all packets to the container itself. This is highly inefficient, may cause latencies, packet drops and other crashes within the containers under heavy loads.

If using containers, it is advisable to configure said containers to either:

  1. use bridged networking as opposed to docker-only NAT
  2. use iptables to forward packets to the NAT network Aerospike containers as opposed to the docker EXPOSE port feature.

Both solutions presented above result in better network latencies and a more stable network.

Refer to the docker configuration manuals for further details.

Max Open File limits

Aerospike clients perform dynamic connections to the database nodes as and when required. This may result in many active connections. These connections, on a Linux system, hold a file descriptor and are treated as open files. Aerospike has a configuration parameter proto-fd-max to control the maximum number of allowed client connections. The Aerospike server will not start if proto-fd-max is higher than the Linux system's maximum open files configuration for the process.

After installing Aerospike, ensure that the max open files for the asd process is configured to have a higher max open file value than proto-fd-max - to allow for fabric and heartbeat connections as well as any open files.

Non-systemd: Edit /etc/init.d/aerospike.conf and change the value of the following line.

$ ulimit -n 100000

For systemd, create an override.conf file to control this:

$ cat <<EOF > /etc/systemd/system/aerospike.service.d/override.conf
[Service]
LimitNOFILE=<MAX NUMBER OF FILE DESCRIPTORS>
EOF

Then reload the systemd daemon:

$ systemctl daemon-reload

That this change requires restarting the Aerospike server for the new value to be applied.

For versions 5.0 and later, you may also apply this change dynamically to the asd process if prlimit is available:

$ prlimit --pid $(pgrep asd) --nofile=200000