Aerospike Event Stream Processing (ESP)
Change Notification (CN) in the Aerospike system is similar to CDC (Change Data Capture), but differs in how it reports changes. Unlike CDC, CN may include several changes in a single notification. It uses Aerospike's Cross-Datacenter Replication XDR to publish changes as they occur. Each message that the system sends contains either an updated or inserted record, or a record deletion notification, as well as useful metadata
Aerospike Connect for Streaming Event Processing (ESP) converts change notifications over XDR into HTTP/1 or HTTP/2 requests, and streams them to downstream consumers. This enables several use cases, such as server-less event processing, by using either AWS Lambda or Google Cloud Functions.
Connect for ESP is not a replacement for Kafka or Pulsar. For certain use cases, such as server-less event processing, exporting Aerospike data to Elasticsearch or Splunk, etc., it can obviate the need for Kafka or Pulsar in your streaming pipeline. However, there are a few tradeoffs depending on your use case. For instance, unlike Kafka or Pulsar, there is no message retention, topics and partitions, or schema registry.
- Connect for ESP converts CN over XDR into HTTP/1 or HTTP/2 requests, which potentially opens up connections to several HTTP-based systems such as ElasticSearch, AWS Lambda, Splunk, etc.
- Serializes the change notification payload into a text format such as JSON, or a binary format such as Avro or MessagePack, for efficient data exchange.
- Ships the LUT (last-update-time) of the record to enable downstream applications to build their own custom logic for ordering messages.
- Extends XDR’s at-least-once delivery guarantee to ensure zero message loss.
- Can be used with XDR filter expressions to filter out records before transmission. For example, ensure that only records with a compliance bin set to true end as searchable data in Elasticsearch.
- Offers flexible deployment options, either in the cloud or on-prem, with support for Docker and Kubernetes.
- Can be configured to route CNs to multiple destinations using the same ESP connector cluster.
- Includes an option to add custom headers specific to the target system to each HTTP request.
- Supports HTTPS, which can be configured with PEM files.
There might be instances where records are re-ordered on the network or across connector instances, in which case a message for an older version of a record could be delivered after a message for a newer version of the same record.
See Record Ordering Architecture in order to maintain record ordering.
- Export stripped down CN messages to Elasticsearch using its Document REST API. In this architectural pattern, Aerospike Database is a source of truth and Elasticsearch provides fast search capability, while maintaining a one-to-one relationship with the full objects stored in the Aerospike Database. This pattern could potentially be extended to make Aerospike data available for analysis in Splunk using the HTTP Event Collector in Splunk.
- Back up Aerospike data to S3 (using AWS Lambda) or Google Cloud Storage (using Google Cloud Functions) for archival purposes.
- Stream Aerospike data to AWS SageMaker (using AWS Lambda) or Google AI (using Google Cloud Functions) to build cloud native AI/ML pipelines with the best of breed AI/ML services.
- Encrypt Aerospike data for compliance, using AWS KMS with Lambda, before ingesting it into your data lake.
- Develop a web application in the language of your choice to ingest and process Aerospike CN. Additionally, your applications could also persist state, results, or raw data into the Aerospike database using the Aerospike REST Gateway.
- Load balance Aerospike streaming connectors for Kafka, JMS, or Pulsar to enable you to build scalable streaming pipelines.
Serverless Event Processing with AWS Lambda
In this example, Connect for ESP converts the CN events from the XDR protocol to HTTP POST requests. The CN payload can be configured to serialize into a text format, such as JSON, or a binary format, such as Avro or MessagePack, for efficient data exchange. The HTTP request is forwarded to an API Gateway (or an application load balancer), which triggers the Lambda function. The Lambda function writes the CN payload to AWS SQS. The data in this pipeline flows in a highly scalable manner. Note that the destination does not necessarily have to be SQS, but could be any other AWS service that is integrated with Lambda. Optionally, you can persist state or the raw CN payload for future use in the Aerospike database via the Aerospike REST Gateway.
Load Balancing Outbound ConnectorsTo increase the throughput of the connector instances that stream data from Aerospike, you can deploy a load balancer in front of those instances.
Connector instances receive change notifications when records are inserted, modified, or deleted in an Aerospike database. The change-notification system uses Aerospike's cross-datacenter replication (XDR) to send notifications when changes to records are made. Since Aerospike Enterprise Edition 5.0, XDR has used a protocol that is TCP based.
The problem with TCP-based load balancing is that it cannot balance load across existing TCP connections to connector instances i.e. the connections are sticky, rendering newer connector instances useless from a load sharing standpoint.
ESP connector converts XDR messages into HTTP requests so that you can address the TCP stickiness problem by using an HTTP load balancer to distribute HTTP requests to existing and newer nodes. It works across all the deployment options from bare-metal to the cloud, including Kubernetes. It can also handle scenarios where a node is removed.