The primary key index is a blend of distributed hash table technology with a distributed tree structure in each server. The entire keyspace in the namespace is separated via a robust hash function into partitions. A total of 4096 partitions are equally distributed across cluster nodes. See data-distribution for details on hashing and partitioning.
Aerospike uses a red-black in-memory structure called a sprig. For each partition, there can be configurable number of sprigs. Configuring the right number of sprigs is a trade-off between memory overhead and optimized parallel access.
The primary index is on the 20 byte hash called the digest of the specified primary key. While this expands the key size of some records (for example, an integer key which is only 8-bytes), it is beneficial because code operation is predictable regardless of input key size or distribution.
When a server fails, the indexes on another server are immediately available. If the failed server remains down, data starts rebalancing, and replicated indexes are built on new nodes.
Currently, each index entry requires 64 bytes. In addition to the 20-byte digest, the following metadata are also stored in index.
Generation count: Tracks all writes to the record; used for resolving conflicting updates.
Expiration time or TTL: Tracks time when a key expires. The eviction subsystem uses this metadata.
Last Update Time: Tracks the last writes to the key (Citrusleaf epoch). Used for conflict resolution during cold restart, conflict resolution during migration (depending on your configuration settings), Filter Expressions, incremental backup scans, truncate and truncate-namespace commands.
The primary index is derived from the data itself and can be rebuilt from that data, depending on the configuration setting for fast restart.
Fast Restart Feature
For fast cluster upgrades with minimal downtime, Aerospike has a fast restart feature. Fast restart allocates index memory from a Linux shared memory segment. For planned shutdowns and restarts (for example, for a upgrade), on restart the server re-attaches to the shared memory segment and activates the primary indexes without a data scan of the storage.