There are two types of UDFs in Aerospike: Record UDFs and Stream UDFs. A Record UDF operates on a single record. A Stream UDF operates on a stream of records, and it can comprise multiple stream operators to perform very complex queries.
The complexity of UDFs can range from a single function that is only a few lines long, to a multi-thousand line module that contains many internal functions and multiple external functions.
When contemplating the construction of a new UDF, it is important to consider the application data model and the interaction between record bins. The general pattern (or life-cycle) for UDF development is:
- Design the application data model
- Design the UDFs to perform desired functions on the data model
- Create/Test the UDFs
- Register UDFs with the Aerospike database
- Iterate function test/development cycle
- System Test
- System Deployment
UDF design and development is usually an iterative activity, where the first version is simple and then, potentially, evolves over time to something complex.
Developing UDFs in Aerospike
Probably the best way to get familiar with the UDF mechanism is to create the "Hello World" Record UDF. Following the instructions below, you will see how to build, register and execute your "Hello World" UDF.
return "Hello World!!"
This "Hello World" Record UDF is invoked when a specific record is referenced, and it simply returns the "Hello World!!" string to the caller.
Developing Record-Based UDFs
A Record UDF is invoked once for each record in the Aerospike Key-Value operation. Typically, excluding batch operations, only a single record is the target of a KV operation. In general, a Record UDF can do the following:
- Create/delete bins in the record.
- Read any/all bins of the record.
- Modify any/all bins of the record.
- Delete/Create the specified record.
- Access parameters constructed and passed in from the client.
- Construct results as any server data type (e.g. string, number, list, map) to be returned over the wire to the client.
Example: Record Create or Update
In this simple example (Annotated Record UDF), we show how the UDF can create or update an Aerospike record.
In addition UDF developers can use the following Aerospike commands ( Lua: aerospike Module ) to perform record operations:
status = aerospike:create( rec )
status = aerospike:exists( rec )
status = aerospike:update( rec )
status = aerospike:remove( rec )
Example: Basic Statistics Management
In this moderately complex example (Statistics Record UDF), we show how a UDF can manage some numerical statistics (Max Value, Min Value, Ave Value, Count) of values that are kept in various KV Record bins.
In addition, in this example we make use of the Lua: logging Module functions (which are explained in more detail in the UDF UDF Best Practices Guide). The logging functions write messages to the log or the console:
These log functions will generate output in the log or console. The following Lua line:
trace("[ENTER]<%s> Value(%s) valType(%s)", meth, tostring(newValue), type(newValue));
will generate output as follows:
Sep 21 2013 22:25:22 GMT: DETAIL (ldt): (/home/aerospike/src/lua/udf_samples.lua:160) [ENTER]<unique_set_write()> Value(Map("B"->71, "A"->70)) valType(userdata)
The log functions are controlled by the log/console stanzas in the Aerospike server config file.
Aerospike Client Record UDF Invocation
The following language-specific links provide information on invoking Record UDFs from respective Aerospike Client.
Record UDF: More Detail
A Stream UDF is invoked on a stream of records rather than a single record.
Using secondary indexes on bins in a record, a subset of records matching the query criteria can be streamed out. A Stream UDF can be used to extract values in bins of records, get count of records or similar statistics on this extracted stream of records.
The process can be summarized as:
- Create a Secondary Index
- Run a Query on a Secondary Index
- Apply stream UDF on results of a secondary index query.
The following language-specific links provide information on using Stream UDFs from respective Aerospike Client.
Example: Simple Statistics
In this example, Stream UDF – Simple Statistics, we calculate simple statistics information on a data set.
Example: Word Count
In this example, Stream UDF - Word Count, we count the number of times each word appears in a book.
Stream UDF: More Detail
Functional Benefits of UDFs
When contemplating the use of a UDF, make sure to first review the extensive atomic operations for the List and Map data types. Multiple operations can be chained to execute in order on a single record, using the operate() command of the Aerospike client.
Such single record transactions are faster and scale better than UDFs. However, there are several situations where a UDF may be advantageous, compared to implementing similar functionality on the application side.
- Extending the functionality of the collection data types. with new atomic operations, or to implement new collection types (see Record UDF examples).
- A background UDF, a record UDF applied to multiple records via scan or query, can be used to carry out maintenance.
- Implementing aggregate functions using a stream UDF.
- Knowing Lua : An introduction to using Lua.
- Developing UDF Modules : Details on how to create Lua Modules for running within an Aerospike database.
- Developing Record UDFs : An introduction to developing a Record UDF in Lua.
- Developing Stream UDFs : An introduction to developing a Stream UDF in Lua.
- Managing UDFs : How to install, remove, update UDF modules, and managing runtime UDF caches.
- Known Limitations : Known limitations of the UDF system.
- Best Practices : Tips for improving the developer experience while developing Lua in Aerospike.
- API Reference : The API Reference for Aerospike extensions to Lua, including functions, modules, types and Large Types.
- Record UDF Examples : Useful and example Record UDFs.
- Stream UDF Examples : Useful and example Stream UDFs.