Skip to main content
Loading

Secondary Index Capacity Planning

To plan for the cost of using secondary indexes (SI), you can either calculate an estimate of the space required, or extrapolate from a proof-of-concept. Both approaches are described below.

note

Not all records are indexed. Only records in which the specified bin exists and has the correct data type are indexed.

Capacity planning for server 6.0 and later

info

In this section the meaning of the word space depends on the sindex-type, as configured for the namespace.

  • It refers to memory when a secondary index is in memory (whether sindex-type shmem, or process memory in Aerospike Database Community Edition).
  • It refers to persistent memory (PMem) when configured as sindex-type pmem.
  • It refers to disk storage when using secondary index in flash (sindex-type flash).

Secondary indexes have a B-Tree structure for each partition, 4096 of them collocated with the primary index partitions on the node. Each B-Tree node takes 4KiB of space.

Secondary index spatial cost

  1. Each secondary index entry costs 14 bytes. 8 bytes for the relevant bin value (or hash thereof), plus 6 bytes to reference the record in the primary index.
    • Scalar data (integer, string) is indexed as a single entry in the secondary index.
    • When the bin being indexed is a list, each list element matching the defined data type is indexed as a separate entry in the secondary index.
    • For map data types, each of the map keys or values being indexed has a separate entry in the secondary index.
    • A GeoJSON bin containing a Point is stored as a single secondary index entry.
    • A GeoJSON bin containing a Polygon or AeroCircle uses max-cells number of entries in the secondary index.
  2. Due to B-Tree characteristics, the average-case sparseness factor is 1.5 times the space used for all entries. The worst-case sparseness factor is twice the total space used for all entries. For example, 10 million entries in a secondary index use on average 10^7 x 14 bytes x 1.5 = 200MiB of space, and at worst 10^7 x 14 bytes x 2 = 267MiB of space.
  3. Multiply the total from the previous step by the replication factor of the namespace to get a cluster total.

Secondary index space allocation

So far, we've discussed the spatial cost of the entries in a secondary index. However, memory allocations don't occur in 14 byte increments.

  1. Each secondary index requires an initial memory allocation of 16MiB on each cluster node. For example, when defining four secondary indexes in a namespace, they require an initial allocation of 4 x 16MiB = 64 MiB of space on each node.
  2. From server 6.1 onward, secondary index space allocation is controlled by the namespace sindex-stage-size. This can be set as low as 128MiB, with a default space allocation of 1GiB. So when the first of the four secondary indexes from the example above is defined, the server immediately allocates the first sindex-stage-size allocation. The secondary indexes grow into this allocation, and when it is full the next stage is allocated. This means you should roll up to the sindex-stage-size increments when performing capacity planning.

Max secondary index size

If the size of the secondary index exceeds 2TiB, you must change sindex-stage-size from the default value of 1GiB. Secondary index space is allocated in slices (arenas), the size of which are defined by the sindex-stage-size configuration parameter.

The maximum number of arenas is 2048, so if the secondary index needs to be bigger than 2TiB the sindex-stage-size must be increased.

Capacity planning for server 5.7 and earlier

Extrapolating from server statistics

If you are running a proof of concept on Aerospike, you can use the server statistics for SIs as a way to extrapolate your expected memory consumption.

The server statistics method gives better capacity planning estimates. Extrapolate from these values, scaling them up linearly to your projected dataset.

Calculating an estimate

A SI has a key for each unique value encountered during indexing. Associated with the SI key, is metadata about the records that have this value in their bin, stored in one of two in-memory structures.

  • K - number of SI keys in the SI.
  • R - number of records indexed in the SI.

The formula to calculate the estimate changes based on whether the number of records indexed per unique value is lower or higher than a threshold. Below the threshold, we use arrays for compact storage and cache friendliness. Above the threshold, we use tree for optimal lookup.

  • THRESHOLD (referred in the rest of this article)
    • 64 - if server_version >= 5.7
    • 32 - Otherwise

Less than THRESHOLD

If there are less than THRESHOLD records indexed per unique value (SI key) the following formula is used.

For server version 5.7 and later

TypeMin. mem usageMax. mem usageAvg. case mem usage
Integer(16.25 x 1 x K) + (8 x 1 x R)(16.25 x 2 x K) + (8 x 3.5 x R)(16.25 x 1.5 x K) + (8 x 2 x R)
String(28.44 x 1 x K) + (8 x 1 x R)(28.44 x 2 x K) + (8 x 3.5 x R)(28.44 x 1.5 x K) + (8 x 2 x R)

For server version 5.6 and below

TypeMin. mem usageMax. mem usageAvg. case mem usage
Integer(16.25 x 1 x K) + (20 x 1 x R)(16.25 x 2 x K) + (20 x 3.5 x R)(16.25 x 1.5 x K) + (20 x 2 x R)
String(28.44 x 1 x K) + (20 x 1 x R)(28.44 x 2 x K) + (20 x 3.5 x R)(28.44 x 1.5 x K) + (20 x 2 x R)

THRESHOLD or more

When there are THRESHOLD records or more, the following calculation is used instead.

For server version 5.7 and later

TypeMin. mem usageMax. mem usageAvg. case mem usage
Integer(16.25 x 1 x K) + (8.25 x 1 x R)(16.25 x 2 x K) + (8.25 x 2 x R) + (16.25 x K)(16.25 x 1.5 x K) + (8.25 x 1.5 x R)
String(28.44 x 1 x K) + (8.25 x 1 x R)(28.44 x 2 x K) + (8.25 x 2 x R) + (28.44 x K)(28.44 x 1.5 x K) + (8.25 x 1.5 x R)

For server version 5.6 and earlier

TypeMin. mem usageMax. mem usageAvg. case mem usage
Integer(16.25 x 1 x K) + (20.25 x 1 x R)(16.25 x 2 x K) + (20.25 x 2 x R) + (16.25 x K)(16.25 x 1.5 x K) + (20.25 x 1.5 x R)
String(28.44 x 1 x K) + (20.25 x 1 x R)(28.44 x 2 x K) + (20.25 x 2 x R) + (28.44 x K)(28.44 x 1.5 x K) + (20.25 x 1.5 x R)
  • Cost of an index built over list values depends on the type of the elements (Integer or String, see above), multiplied by the average number of unique values in the list.

  • Cost of an index built over map keys depends on the type of the map keys (Integer or String, see above), multiplied by the average number of keys in the map.

  • Cost of an index built over map values depends on the type of the map values (Integer or String, see above), multiplied by the average number of unique values in the map.

  • Extreme values may vary by a few hundred KBs.

  • SI garbage collection may affect this value by a few hundred MBs.

  • Each formula can be divided as:

      Sum of (m * a * n * s) for n = keys and records + constant
    n - number of keys/records in SI.
    m - memory overhead per key/record in a B-tree or array.
    a - memory allocated per memory used by SI.
    s - size of the data type in the bin. For scalar values, s=1;
    for lists it's the number of unique values in the list;
    for maps it's the number of keys or unique map values, depending on what is being indexed

    Following tables explains the different values of m and a.
Keys/RecordsTypeRecords per SI KeyValue of m (5.7 and later)Value of m (5.6 and earlier)
KeysIntegerAny16.2516.25
KeysStringAny28.4428.44
RecordsAny< THRESHOLD820
RecordsAny> THRESHOLD8.2520.25
ScenarioKeys/RecordsRecords per SI KeyValue of aNote
Best CaseAnyAny1In the best case scenario every memory which is allocated is being used.
Worst caseKeysAny2Keys are always stored in B-tree. In the worst case every node of B-tree is half full.
Worst caseRecords< THRESHOLD3.5For < THRESHOLD records per SI key, arrays are used to store record metadata. In the worst case 3.5 times memory is allocated extra.
Worst caseRecords> THRESHOLD2For > THRESHOLD records per SI key, record metadata is stored similarly as keys.
Avg caseKeysAny1.5Generally, memory efficiency of B-tree lies between 0.5 and 1.
Avg caseRecords< THRESHOLD2For < THRESHOLD records per SI key, record metadata is stored in arrays. Generally they are allocated twice more than needed.
Avg caseRecords> THRESHOLD1.5Same as average case for keys.
Note - In the worst case, each B-tree can also have an extra allocated unused B-tree node.
Thus (28.44 * K) is added into the calculations.
  • GeoJSON Data types: Bins containing Points use the same sizing as one integer bin. Bins containing Polygons or AeroCircles use max-cells number of integer bin value entries. max-cells is defined in the server configuration as any integer between 1 - 256, and defaults to 12. Each cell entry is sized the same as an integer. Refer to the max-cells configuration reference for more information.