Plan your use of secondary indexes carefully to avoid possible performance impact.
For guidance, Enterprise Licensees can contact Aerospike Support.
What is a secondary index?
A secondary index is a data structure used to quickly locate all the records in a namespace, or a set within it, based on a bin value in the record. When a value is updated in the indexed record, the secondary index automatically updates. You can retrieve records whose indexed value matches a specified criteria using a secondary index query.
Enhancements in the Aerospike server version 5.7 result in:
- Improved latency, especially noticeable when you have a large number of objects.
- Up to 40% more queries in the same amount of time.
- Up to 2.5 times memory savings, reducing memory consumption for references by 60%.
- Highly efficient garbage collection.
Applications that benefit from secondary indexes include rich, interactive business applications and user-facing analytic applications. Secondary indexes also enable Business Intelligence tools and ad-hoc queries on operational datasets.
Why would you use a secondary index?
Scanning large amounts of data can take large amounts of time, and negatively affect performance when queries scan every document or entry in a table. Aerospike secondary indexes guarantee faster response times because they provide efficient access to a wider range of data through fields other than the primary key.
Aerospike secondary indexes:
- Are stored in DRAM for fast look-up.
- Are built on every node in the cluster and are co-located with the primary index. Each secondary index entry contains only references to records local to the node.
- Contain pointers to both master records and replicated records in the node.
- Are on a bin value, which allows you to model one-to-many relationships.
- Are specified bin-by-bin (like with RDBMS columns) for efficient updates and a minimal amount of resources required to store indexes.
You can use Aerospike tools or the API to dynamically create and remove indexes based on bins and data types you want to index. For an indexed bin, updating the record to include the bin updates the index.
Index entries are type checked i.e If you have a bin that stores user age, and the age value is stored as a string by one application and an integer by another application, an integer index excludes records containing string values stored in the indexed bin, while a string index excludes records with integer values stored in the indexed bin.
Secondary Index Metadata
Aerospike tracks which indexes are created in a globally maintained data structure—the System Metadata (SMD) system. The SMD module resides in the middle of multiple secondary index modules on multiple nodes. Changes made to the secondary index are always triggered from the SMD.
SMD workflow is:
- A client request triggers a create, delete, or update related to the secondary index metadata. The request passes through the secondary index module to the SMD.
- SMD sends the request to the paxos master.
- The paxos master requests relevant metadata from all cluster nodes.
- Once all the data is received, it calls the secondary index merge callback function. This function resolves the winning metadata version for the secondary index.
- SMD sends a request to all cluster nodes to accept the new metadata.
- Each node performs a secondary index create or delete DDL function.
- A scan is triggered and returns to the client.
Secondary Index Creation
Aerospike supports dynamic creation of secondary indexes. Tools (such as asadm) are available to read the current indexes and allow creation and destruction of indexes.
To build a secondary index, you specify a namespace, set, bin, container type (none, list, map), and data type (integer, string, Geospatial, list, map, and so on). On confirmation by the SMD, each node creates the secondary index in write-only (WO) mode, and starts a background scan to scan all data and insert entries in the secondary index.
- Secondary index entries are only created for records matching all of index specifications.
- The scan populates the secondary index and interacts with read/write transactions exactly as a normal scan, except that there is no network component to the index creation. During index creation, all new writes affecting the indexing attribute update the index. The decision to mark each secondary index as readable is taken by each node independently. When building the secondary index finishes on a node, it is marked as read active on that node. When the index creation scan completes and all index entries are created on all cluster nodes, the index is marked read-write (RW) and is ready for use by queries.
- If a node with data joins the cluster but has missing index definitions in its SMD file, indexes are created and populated based on the latest SMD information after it joins the cluster. During index population, queries are not allowed to ensure that data on the incoming node is clean before it is available.
Avoid creating or dropping indexes when the cluster is not well formed or if it is experiencing integrity problems. Secondary index building is a heavy I/O subsystem operation, so should only be done during low load.
Priority of Secondary Index Creation
The index creation scan only reads records already committed by transactions (that is, no dirty reads are allowed). This means that scans can execute at full speed, provided there are no updates for records to block reads.
To ensure that the index creation scan does not adversely affect the latencies of ongoing read and write transactions,
the default settings suffice because they are based on balancing long running tasks (such as data rebalancing and backup) against
low-latency read/write transactions.
If necessary you can control resource utilization for the index creation scan. For server versions 5.7 and later, modify
sindex-builder-threads config at the service level.
For server versions prior to 5.6, use the job prioritization settings.
Writing Data with Secondary Indexes
On data writes, the SMD specification of current indexes is checked. For all bins with indexes, a secondary index update-insert-delete operation is performed. Note that Aerospike is a flex schema system. If no index value exists on a particular bin or if the bin value is not a supported index type, then the corresponding secondary action is not performed.
All changes to the secondary index are performed atomically with the record changes under single-lock synchronization. Since indexes do not persist, difficult commit problems of committing the index and the data are removed, which increases access speed.
On data deletes (for example, during delete, expiry, eviction, or migration operations), the data is not read from disk to delete the entry from the secondary index. This avoids unnecessary burden on the I/O subsystem. The remaining entries in the secondary index are deleted by a background thread, which wakes up at the regular intervals and performs cleanup.
Garbage collection happens in namespaces without data in memory during deletes, expiry, eviction and migrations. For namespaces with data in memory, garbage collection happens only during migrations.
This illustrates an index architecture with distributed queries.
Every cluster node receives the query to retrieve results from the secondary index. When the query executes:
- Requests “scatter” to all nodes.
- Secondary indexes are placed in DRAM for fast mapping of secondary-to-primary keys.
- Secondary indexes are co-located on each node with data on SSDs to provide fast update performance.
- Records are read in parallel from all SSDs and DRAM.
- Results are aggregated on each node.
- Client “gathers” results from all nodes and returns them to the application.
A secondary index query can evaluate a long list of primary key records. This is why Aerospike performs secondary index queries in small batches. Batching also occurs on client responses, so that if a memory threshold is reached, the response is immediately flushed to the network, much like return values in an Aerospike batch request. This keeps memory usage of an individual secondary query to a constant size, regardless of selectivity.
The query process ensures that results sync with actual data every time the query executes and the record is scanned. Uncommitted data reads are never part of the query results.
Query Execution During Migrations
Getting accurate query results is complicated during data migrations. When a cluster node is added or removed, it invokes the Data Migration Module to transition data to and from nodes as appropriate for the new configuration. During the migration operation, a partition may be available in different versions on many nodes. For a query to locate a partition with the requested data, Aerospike query processing uses additional partition states shared among cluster nodes, and selects a node for each partition where the query can execute. The node can be the master node of the partition, the old master, or replica node that is migrating data to the new master node. Duplicate resolution is not performed, even if there are multiple versions of the data.
Query records can feed into the aggregation framework to perform filtering, aggregation, and so on. Each node sends the query result to the User-Defined Function (UDF) sub-system to start results processing as a stream of records. Stream UDFs are invoked and the sequence of operations defined by the user are applied to the query results. Results from each node are collected by the client application, which can then perform additional operations on the data.
To ensure that aggregation does not affect overall database performance, Aerospike uses these techniques:
- Global queues manage records fed through the different processing stages, and thread pools effectively utilize CPU parallelism.
- The query state is shared across the entire thread pool so that the system can manage the Stream UDF pipeline.
Except for the initial data fetch portion, every stage in aggregation is a CPU-bound operation. It is important that processes finish quickly and optimally. To facilitate this, Aerospike batches records and caches UDF states to minimize system overhead.
- For namespace operations where data is stored in-memory and no storage fetch is required, Aerospike implements stream processing in a single thread context. Even with this optimization, the system can parallelize operations across data partitions because Aerospike natively divides data as a fixed number of partitions.