Starting with Aerospike Database version 5.6, you can index the membership of records to their set by optionally creating a set index. Using the set index, the performance of set-level operations, such as scanning all the records in the set, can be improved.
Set indexes are compatible with warm restarts.
Set indexes consume approximately 16 bytes per record in the set. See the capacity planning guide for more information.
If a set is very small compared to the namespace that contains it, and no set index is created, the entire primary index needs to be traversed to find the records that belong to the set. A set index may improve the efficiency of getting these records.
The smaller the set is relative to the namespace, the bigger the advantage a set index will provide. When the set gets big enough relative to the namespace, this advantage will disappear, and the memory cost of the set index will not be worthwhile.
As a broad guideline, if the set is bigger than 1% of the namespace, it is less likely to provide a significant advantage. Where the cutoff lies will be use-case and configuration dependent. For example, larger records, which are not in-memory, will likely mean a lower cutoff since device I/O will dominate compared to the index lookup. Therefore, it is recommended to simply try a set index in order to best assess its effectiveness.
The set index can be dynamically configured, even on a single node. Then, scans with and without a set index can be compared to each other. If the set index does not provide a significant advantage, it can be discarded. Set indexes may be enabled and disabled even while scans are in progress.
A scan will use a set index if it exists, otherwise the primary index will be used.
Set indexes make the secondary index hack obsolete
A common workaround for finding all the records in a small set within a large namespace was to create a secondary index on a bin that contained the set name. Users employing this technique should consider switching to set indexes.
- You don't need to modify your records with an extra bin.
- The startup time is far less impacted.
- Set indexes is simpler management, it is a single dynamic configuration.
- You can scan all the records in the indexed set without adding a filter.
- Scans are always correct and support pagination.
Managing set indexes
For more information on managing set indexes, see Managing Sets in a Namespace