Skip to main content
Loading

Aerospike Expressions

Aerospike Expressions is a strongly typed, functional, domain-specific language designed for manipulating and comparing bins and record metadata. Expressions are used to filter records, filter whether a record operation will occur, filter data from being shipped cross-datacenter, and to extend the functionality of transactions. More uses for Expressions are planned.

Types of Expressions

Aerospike supports three types of Expressions:

  • Filter Expressions (introduced in Aerospike Database 5.2)
  • XDR Filter Expressions (introduced in Aerospike Database 5.3)
  • Operation Expressions (introduced in Aerospike Database 5.6)

Filter Expressions

Filtering is commonly used to select records that satisfy a boolean expression. Filters can be used with all single record operations (reads, writes, transactions, and record UDFs), batch operations (read, write, UDF, and delete), Primary Index (PI) queries (FKA scans), and secondary index (SI) queries. Filter Expressions were introduced in Aerospike Database 5.2 as the successor to the Predicate Expression (PredExp) system and language. Filter Expressions support a variety of metadata functions and all data type operations: the full List and Map APIs (including at a nested element context), bitwise operations on Bytes (Blobs), Geo-spatial queries on GeoJSON, and HyperLogLog operations. Filter Expressions are only executed when the record exists, meaning that they do not execute when a read does not find the record or when a write creates the record.

XDR Filter Expressions

Aerospike allows records shipped with XDR to remote destinations to be filtered with expressions. XDR filters are dynamic and you define them per namespace per destination datacenter (DC). You can set Filter Expressions by using the info command xdr-set-filter. You can also set them programmatically via a client API.

XDR filtering lets you reduce the volume of data that you replicate. When you reduce the volume of replicated data, you also:

  • Reduce network traffic
  • Reduce storage and processing requirements at destination datacenters, which avoids the costs of overprovisioning, most significantly in hub-and-spoke XDR topologies
  • Reduce the cost of moving data across or from public clouds

Operation Expressions

Operation Expressions (read and write expressions) are bin operations that can atomically compute a value from information within the record or provided by the expression. The resulting value is either returned to the client (as is the case with read expressions) or written to a specified bin (as is the case with write expressions). Operation Expressions enable atomic, cross-bin operations, which were previously only available through UDF.

Language

Aerospike Expressions has a Polish Notation (PN) syntax with strict typing that expands the scope of what can be used to select records. Within an Expression, all data is immutable. This means that bin modifications occurring within an expression operate on an ephemeral copy and are not saved to the bin when the expression terminates.

Aerospike Expressions does not include syntax for iteration or recursion.

Types

The type system is split into two type classes: value and bin. All expressions return a sub-type of these two types and all parameters to expressions use these types. Parameters that accept only values are described herein as 't_value' or 'library_specific'. Parameters that accept only bin expressions are described herein as 't_bin_expr' or 'bin_expr'. Parameters that accept either bin or value are described herein as 't_expr' or 'expr'. Where 'library_specific' means that it will be a type specific to the language library in use and where 't' is one of the following:

  • nil: value for null.
  • boolean: value only type which may be true or false.
  • integer: 64-bit signed integer.
  • float: 64-bit floating point.
  • blob: Binary data.
  • string: UTF-8 encoded string.
  • geojson: GeoJSON.
  • list: CDT List.
  • map: CDT Map.
  • hll: HyperLogLog.
  • AUTO: Some libraries may implement type inference for certain single-result CDT read expressions when the expr_type can be deduced by the result_type.

Execution Model

Metadata resolution is a performance critical component of Aerospike Expressions. Metadata resides in the primary index and does not require a disk load (for namespaces with data on disk). Therefore, expressions that can be fully resolved using metadata will be able to forgo disk access, thereby gaining an order of magnitude in performance. Aerospike Expressions achieves this using a two phase execution model. If an expression can be made to satisfy the necessary logic for a given operation with only metadata operations, doing so will result in large performance gains.

Metadata Phase

The expressions system starts with the metadata phase where storage-data evaluates to unknown. Expressions with unknown as input generally also output unknown with the exception of logical expressions that evaluate using trilean logic. If the result is unknown, then it will proceed to storage-data phase. If the result is false, then the record is filtered out without accessing storage. If the result if true, then the operation proceeds, and storage will only be accessed if required by the operation.

Storage-data Phase

Loads the record and executes the expression a second time. If the record resides on disk, physical IO will be incurred. This phase always resolves to a definite true or false answer.