Aerospike Expressions is a strongly typed, functional, domain-specific language designed for manipulating and comparing bins and record metadata. Expressions are used to filter records, filter whether a record operation will occur, filter data from being shipped cross-datacenter, and to extend the functionality of transactions. More uses for Expressions are planned.
Types of Expressions
Aerospike supports three types of Expressions:
- Filter Expressions (introduced in Aerospike Database 5.2)
- XDR Filter Expressions (introduced in Aerospike Database 5.3)
- Operation Expressions (introduced in Aerospike Database 5.6)
Filtering is commonly used to select records that satisfy a boolean expression. Filters can be used with all single record operations (reads, writes, transactions, and record UDFs), batch operations (read, write, UDF, and delete), Primary Index (PI) queries (FKA scans), and secondary index (SI) queries. Filter Expressions were introduced in Aerospike Database 5.2 as the successor to the Predicate Expression (PredExp) system and language. Filter Expressions support a variety of metadata functions and all data type operations: the full List and Map APIs (including at a nested element context), bitwise operations on Bytes (Blobs), Geo-spatial queries on GeoJSON, and HyperLogLog operations. Filter Expressions are only executed when the record exists, meaning that they do not execute when a read does not find the record or when a write creates the record.
XDR Filter Expressions
Aerospike allows records shipped with XDR to remote
destinations to be filtered with expressions. XDR filters are dynamic and you
define them per namespace per destination datacenter (DC). You can set Filter
Expressions by using the info command
xdr-set-filter. You can also set them
programmatically via a client API.
XDR filtering lets you reduce the volume of data that you replicate. When you reduce the volume of replicated data, you also:
- Reduce network traffic
- Reduce storage and processing requirements at destination datacenters, which avoids the costs of overprovisioning, most significantly in hub-and-spoke XDR topologies
- Reduce the cost of moving data across or from public clouds
Operation Expressions (read and write expressions) are bin operations that can atomically compute a value from information within the record or provided by the expression. The resulting value is either returned to the client (as is the case with read expressions) or written to a specified bin (as is the case with write expressions). Operation Expressions enable atomic, cross-bin operations, which were previously only available through UDF.
Aerospike Expressions has a Polish Notation (PN) syntax with strict typing that expands the scope of what can be used to select records. Within an Expression, all data is immutable. This means that bin modifications occurring within an expression operate on an ephemeral copy and are not saved to the bin when the expression terminates.
Aerospike Expressions does not include syntax for iteration or recursion.
The type system is split into two type classes: value and bin. All expressions return a sub-type of these two types and all parameters to expressions use these types. Parameters that accept only values are described herein as 't_value' or 'library_specific'. Parameters that accept only bin expressions are described herein as 't_bin_expr' or 'bin_expr'. Parameters that accept either bin or value are described herein as 't_expr' or 'expr'. Where 'library_specific' means that it will be a type specific to the language library in use and where 't' is one of the following:
- nil: value for
- boolean: value only type which may be
- integer: 64-bit signed integer.
- float: 64-bit floating point.
- blob: Binary data.
- string: UTF-8 encoded string.
- geojson: GeoJSON.
- list: CDT List.
- map: CDT Map.
- hll: HyperLogLog.
- AUTO: Some libraries may implement type inference for certain single-result CDT read expressions when the expr_type can be deduced by the result_type.
Metadata resolution is a performance critical component of Aerospike Expressions. Metadata resides in the primary index and does not require a disk load (for namespaces with data on disk). Therefore, expressions that can be fully resolved using metadata will be able to forgo disk access, thereby gaining an order of magnitude in performance. Aerospike Expressions achieves this using a two phase execution model. If an expression can be made to satisfy the necessary logic for a given operation with only metadata operations, doing so will result in large performance gains.
The expressions system starts with the metadata phase where storage-data
unknown. Expressions with
unknown as input generally also
unknown with the exception of logical expressions that evaluate using
trilean logic. If the result is
unknown, then it will proceed to storage-data
phase. If the result is
false, then the record is filtered out without
accessing storage. If the result if
true, then the operation proceeds, and
storage will only be accessed if required by the operation.
Loads the record and executes the expression a second time. If the record
resides on disk, physical IO will be incurred. This phase always resolves to a