Skip to main content

Namespace Storage Configuration

Aerospike determines where to store data based on the Storage engine configured for the namespace. These engines determine if the data will be persisted to disk, reside in memory, or both. These decisions will affect the durability, cost, and performance of your cluster.

Comparing Storage Engines

Storage engineDevice
(data not in memory)
All FlashDevice
(data in memory)
Memory only
(no persistence)
Persistent Memory***
Index In Memory
Fast Restarts***
Survive a Power Outage
*Available on Aerospike Enterprise Edition.
**Primary index persisted upon restart for data-in-memory with persistence as of Enterprise Edition version 3.15.1.3.
**Fast Restarts also supported for data-in-index configuration.
***Available as of Enterprise Edition 4.8 with a feature enabled in thefeature-key-file.

Storage Engine Configuration Recipes

The following recipes require modifying the Aerospike Server configuration file which is located at /etc/aerospike/aerospike.conf. Each recipe describes the minimal configuration necessary to enable a particular storage engine as well as the storage sizing parameters used by that engine. To get started, open the configuration file in your preferred editor and make the appropriate changes.

sudo $EDITOR /etc/aerospike/aerospike.conf

Recipe for an SSD Storage Engine

The minimal configuration for an SSD namespace requires setting storage-engine to device and adding a device parameter for each SSD to be used by this namespace. A device must be properly initialized as an Aerospike device (including zeroizing the 8MiB header). The maximum size of a device is 2 TiB, for larger devices, partitioned into multiple, equally-sized partitions that are less than 2 TiB each. In addition, memory-size may need to be changed from the default of 4 GiB to a size appropriate for the expected primary index size. For assistance in sizing the primary index, please refer to the Sizing Guide. For performance, we recommend, in general, reducing the write-block-size from the default of 1 MiB to 128 KiB on SSD backed namespaces. This may vary based on the specific workload and record average size, and the best way to find the right setting would be to run some benchmarks with different values.

namespace <namespace-name> {
memory-size <SIZE>G # Maximum memory allocation for primary
# and secondary indexes.
storage-engine device { # Configure the storage-engine to use persistence
device /dev/<device> # raw device. Maximum size is 2 TiB
# device /dev/<device> # (optional) another raw device.
write-block-size 128K # adjust block size to make it efficient for SSDs.
}
}
note

A device partition must only be associated with a single namespace at any one time. This article discusses how to add and remove device partitions from namespaces.

Recipe for a Persistent Memory Storage Engine

The minimal configuration for a persistent memory namespace requires setting two parameters in your configuration file for each pmem storage file to be used by this namespace:

  • storage-engine
  • file. (To view the description of this entry in the configuration, scroll far down the returned results to find the entry for file with the namespace context and subcontext storage-engine device or pmem.)

Also, filesize needs to be large enough to support the size of the data (with a maximum allowed value of 2 TiB). In addition, memory-size may need to be changed from the default of 4 GiB to a size appropriate for the expected primary index size. For assistance in sizing the primary index, see the Sizing Guide.

As of version 5.1, persistent memory namespaces are treated equivalently to Data-In-Memory namespaces for the purpose of computing the default number of service-threads and will default to the number of CPUs (unless if there is at least one SSD namespace).
On systems with hyperthreading, only physical cores are counted, and in multi-socketed systems, if Non-Uniform Memory Access (NUMA) pinning is enabled, each Aerospike instance only counts the CPU cores on the socket it is servicing.

namespace <namespace-name> {
memory-size <SIZE>G # Maximum memory allocation for secondary indexes (if any).
storage-engine pmem { # Configure the storage-engine to use
# persistence. Maximum size is 2 TiB.
file /mnt/pmem/<filename> # Location of pmem data file on server, where /mnt/pmem is the
# mount point of an EXT4 or XFS file system that resides in pmem
# and has been mounted with the DAX option.
# file /mnt/pmem/<another> # (optional) Location of pmem data file on server.
filesize <SIZE>G # Max size of each file in GiB.
}
}

Recipe for an HDD Storage Engine with Data in Memory

The minimal configuration for an HDD with Data-in-Memory namespace involves setting storage-engine to device, setting data-in-memory to true, and finally providing a list of file parameters to indicate where data will be persisted. Also, filesize needs to be large enough to support the size of the data on disk (with a maximum allowed value of 2 TiB). For common use cases, this should roughly be 4 times the memory-size. Lastly, memory-size may need to be adjusted from the default of 4GiB to a size appropriate to handle the expected primary index size and the expected size of the data in memory. For assistance sizing filesize or memory-size please refer to our Sizing Guide.

namespace <namespace-name> {
memory-size <SIZE>G # Maximum memory allocation for secondary indexes (if any).
storage-engine device { # Configure the storage-engine to use
# persistence. Maximum size is 2 TiB
file /opt/aerospike/<filename> # Location of data file on server.
# file /opt/aerospike/<another> # (optional) Location of data file on server.
filesize <SIZE>G # Max size of each file in GiB.
data-in-memory true # Indicates that all data should also be
# in memory.
}
}

Recipe for a HDD Storage Engine with Data in Index Engine

A data-in-index configuration is a highly specialized namespace for a very niche use case. If your data is single-bin and fits in 8 bytes and you need to performance of an in-memory namespace but do not want to lose the fast restart capability provided in Aerospike Enterprise Edition, then data-in-index is it.

The minimal configuration for a data-in-index namespace involves setting single-bin to true, data-in-index to true, and data-in-memory to true. In addition, storage-engine must be device and file or device parameters need to be configured to map to the persisted storage device to be used by this namespace. Finally, memory-size needs to be adjusted from its default of 4 GiB to size that can accommodate the primary index, and filesize from its 16 GiB defaults to the size of the data on disk (with a maximum allowed value of 2 TiB). For assistance sizing filesize or memory-size please refer to our Sizing Guide.

namespace <namespace-name> {
memory-size <N>G # Maximum memory allocation for data and
# primary and secondary indexes.
single-bin true # Required true by data-in-index.
data-in-index true # Enables in index integer store.
storage-engine device { # Configure the storage-engine to use
# persistence.
file /opt/aerospike/<filename> # Location of data file on server.
# file /opt/aerospike/<another> # (optimal) Location of data file on server.
# device /dev/<device> # Optional alternative to using files.

filesize <SIZE>G # Max size of each file in GiB. Maximum size is 2TiB
data-in-memory true # Required true by data-in-index.
}
}

Recipe for Data in Memory Without Persistence

The minimal configuration for a namespace without persistence is to set storage-engine to memory. If your namespace requires more than the default 4 GiB memory-size allocation for the primary index and data in memory then it is also necessary to adjust memory-size accordingly. For assistance sizing memory-size please refer to our Sizing Guide.

namespace <namespace-name> {
memory-size <SIZE>G # Maximum memory allocation for data and primary and
# secondary indexes.
storage-engine memory # Configure the storage-engine to not use persistence.
}

Recipe for Shadow Device

The shadow device storage model introduced in 3.5.12 is tailored for cloud environments where you might have extremely high performance SSDs that are ephemeral (not persistent). Whereas the persisted devices are not as performant as one would like.

All writes are duplicated to another (shadow) device. This shadow device will act as the persisted store. The primary device still receives all operations as normal. This results in a persisted data volume with lower IOPS requirements, while still gaining the IOPS benefit of the non-persisted volume. All without using large amounts of RAM. The Shadow Device only needs to satisfy the write IOPS requirements of your workload, not reads.

note

This is an extension of the SSD Storage Engine.

note

When leveraging network attached shadow devices (for example EBS on AWS) or re-assigning shadow devices to a different instance, when running version 3.16.0.1 or above, it is recommended to have initially configured the node-id across the nodes in the cluster in order to preserve it on a potential new instances that would be re-attached to the shadow device of an instance and avoid redistribution of the partitions in the cluster.

To utilize Shadow Devices, simply add the persisted volume after the declaration of the non-persisted volume on the same line.

info

Shadow devices must be greater than or equal to the size of the primary device.

namespace <namespace-name> {
...
storage-engine device {
device /dev/sdb /dev/sdf # sdb is the fast ephemeral volume, while sdf is the slower persisted volume
...
}
}

In the above example, /dev/sdb is the fast, non-persisted device. /dev/sdf is the persisted device. Order is important. Devices must be listed on the same line for Shadow Device configuration.

Shadow Device configuration can be combined with multiple devices. Note the 1-to-1 mapping:

    storage-engine device{
device /dev/sdb /dev/sdf
device /dev/sdc /dev/sdg
...
}
note

When configuring a namespace to use persistence of any form, care should be taken that a given file or device partition be associated with a single namespace only. Two namespaces cannot share the same file or partition. Configuration of the same file or partition for multiple namespaces could cause issues with the node starting and/or damage to existing data in that file or partition.

Where to Next?