Skip to main content

Secondary Index

info

Prior to Server 6.0, primary index (PI) queries were called scans and secondary index (SI) queries were called queries.

tip

Plan your use of secondary indexes carefully to avoid possible performance impact.

For guidance, Enterprise Licensees can contact Aerospike Support.

What is a secondary index?

A secondary index (SI) is a data structure that locates all the records in a namespace, or a set within it, based on a bin value in the record. When a value is updated in the indexed record, the secondary index automatically updates. In the Aerospike Database server 6.0, SI query results are unaffected by the state of the cluster. Querying by partition returns correct results when the cluster is stable, and during data migration after a cluster size change. In server 6.0, compatible clients such as Java client 6.0, are needed to support rebalance-tolerant queries. Applications that benefit from secondary indexes include rich, interactive business applications and user-facing analytic applications. Secondary indexes also enable Business Intelligence tools and ad-hoc queries on operational datasets.

Why Use a Secondary Index?

Scanning large amounts of data can take large amounts of time, and can negatively affect performance when SI queries scan every document or entry in a table.

Querying large amounts of data by reading every document or entry in a table is expensive in time and performance. Aerospike secondary indexes guarantee faster response times because they provide efficient access to a wider range of data through fields other than the primary key.

Aerospike secondary indexes:

  • Are stored in the Dynamic Random Access Memory (DRAM) for fast look-up.
  • Are built on every node in the cluster and are co-located with the primary index. Each secondary index entry contains only references to records local to the node.
  • Contain pointers to both master records and replicated records in the node.
  • Are on a bin value, which allows to model one-to-many relationships.
  • Are specified bin-by-bin (such as, with Rational Database Management System (RDBMS) columns) for efficient updates and a minimal amount of resources required to store indexes. Use Aerospike tools or the API to dynamically create and remove indexes based on bins and data types. Index entries are type checked. For instance, a bin can store a user age as a string by one application, and as an integer by another. An integer index excludes records stored as a string, while a string index excludes records stored as integers.

Secondary Index Metadata

Secondary indexes triggered from SMD

Aerospike tracks which indexes are created in a globally-maintained data structure: the System Metadata (SMD) system. The SMD module resides in the middle of multiple secondary index modules on multiple nodes. Changes made to the secondary index are always triggered from the SMD.

SMD workflow:

  1. An info request send from a client to any node forwards an SMD action, and then to the SMD principal node.
  2. From the SMD principal node, it forwards the action to every node, where the SMD module notifies the sindex module.
  3. All nodes apply the action, that is, it triggers the building of a new sindex.

Secondary Index Creation

You can dynamically create, read, and delete secondary indexes with the Aerospike asadm tool. To build a secondary index, specify a namespace, set, bin, container type (none, list, map-keys, and map-values), and data type (integer, string, Geospatial). On confirmation by the SMD, each node creates the secondary index in write-only (WO) mode, and starts a background scan to scan all data and insert entries in the secondary index.

  • Secondary index entries are only created for records matching all of index specifications.
  • The scan populates the secondary index and interacts with read/write transactions exactly as a normal scan, except that there is no network component to the index creation. During index creation, all new writes affecting the indexed bins updates the corresponding secondary indexes.
  • Each node independently marks every secondary index as readable. When building the secondary index finishes on a node, it is marked as read active on that node.
  • If a node with data joins the cluster, but is missing index definitions in its SMD file, indexes are created and populated based on the latest SMD information. During index population, queries are not allowed to ensure that data on the incoming node is clean before it is available.

Priority of Secondary Index Creation

The index creation scan only reads records already committed by transactions (that is, no dirty reads are allowed). This means that scans can execute at full speed, provided there are no record updates to block reads. To ensure that the index creation scan does not adversely affect the latencies of ongoing read and write transactions, the default settings suffice because they balance long running tasks (such as data rebalancing and backup) against low-latency read/write transactions. If necessary, you can control resource utilization for the index creation scan. For server 5.7 and later, modify sindex-builder-threads configuration at the service level. Prior to server 5.6, use the job prioritization settings.

  • For Server 5.7 and later, modify sindex-builder-threads config at the service level.
  • For Server prior to 5.6, use the job prioritization settings.

Writing Data with Secondary Indexes

On data writes, the current indexes are checked. For all bins with indexes, a secondary index update-insert-delete operation is performed. Aerospike is a flex schema system: If no index value exists on a particular bin or if the bin value does not support an index type, then the corresponding secondary action is not performed. All changes to the secondary index are performed atomically with the record changes under single-lock synchronization. As indexes live in memory in a highly distributed structure these modifications are performed with minimal impact on write latency.

Garbage Collection

On record deletes triggered by a client delete, expiry, eviction, or migration operations, the data is not read from disk when deleting the entry from the secondary index. This avoids unnecessary burden on the I/O subsystem. The remaining entries in the secondary index are deleted by a background thread, which wakes up at the regular intervals and performs cleanup. In namespaces without data in memory, garbage collection happens during client deletes, expiry, eviction, and migrations. For namespaces with data in memory, garbage collection happens only during migrations.

Distributed SI Queries

Every cluster node receives the query to retrieve results from the secondary index. When the query executes:

  1. Requests “scatter” to all nodes.
  2. Records are read in parallel from all SSDs and DRAM.
  3. Results are aggregated on each node.
  4. Client “gathers” results from all nodes and returns them to the application.

An SI query can evaluate a long list of primary key records. This is why Aerospike performs SI queries in small batches. Batching also occurs on client responses, so that if a memory threshold is reached, the response is immediately flushed to the network, such as, return values in an Aerospike batch request. This keeps memory usage of an individual SI query to a constant size, regardless of selectivity.

Batching also occurs on client responses, so if a memory threshold is reached, the response is immediately flushed to the network, much like return values in an Aerospike batch request. This keeps memory usage of an individual SI query to a constant size, regardless of selectivity.

The SI query process ensures that results sync with actual data every time an SI query executes and the record is scanned. Uncommitted data reads are never part of the SI query results.

Range queries might return a matching record more than once, if multiple values in the record fall within the value range of the query predicate.

SI Query Execution During Migrations

Getting accurate SI query results is complicated during data migrations. When a cluster node is added or removed, it invokes the Data Migration Module to transition data to and from nodes as appropriate for the new configuration. During the migration operation, a partition may be available in different versions on many nodes.

When a node is added or removed from an Aerospike server version 6.0 cluster, the smart client works with the server to ensure that records aren’t repeated or missed due to a partition migration. The client tracks the cluster state, and will reschedule accessing partitions that are migrating, returning to query them when they’ve fully migrated.

Aggregation

SI query records can feed into the aggregation framework to perform filtering, aggregation, and so on. Each node sends the SI query result to the User-Defined Function (UDF) sub-system to start results processing as a stream of records. Stream UDFs are invoked and the sequence of operations defined by the user are applied to the SI query results. Results from each node are collected by the client application, which can then perform additional operations on the data.

Performance Techniques

To ensure aggregation does not affect overall database performance, Aerospike uses these techniques:

  • Global queues manage records fed through the different processing stages, and thread pools effectively utilize CPU parallelism.

  • The SI query state is shared across the entire thread pool so that the system can manage the Stream UDF pipeline.

Except for the initial data fetch portion, every stage in aggregation is a CPU-bound operation. It is important that processes finish quickly and optimally. To facilitate this, Aerospike batches records and caches UDF states to minimize system overhead.

For namespace operations where data is stored in-memory and no storage fetch is required, Aerospike implements stream processing in a single thread context. Even with this optimization, the system can parallelize operations across data partitions because Aerospike natively divides data as a fixed number of partitions.