Skip to main content

Aerospike Connect for Presto

With Aerospike Connect for Presto (also referred to as the Presto connector), you can:

  • Run ANSI SQL queries to perform in-place, on-demand analytics on massive amounts of data in Aerospike databases.
  • Federate queries across multiple Aerospike clusters.
  • Integrate Aerospike into an ecosystem of multiple data-storage technologies.
  • Query Aerospike by using business-intelligence (BI) tools, such as Tableau.
  • Query records with different schemas within the same set in Aerospike.
  • Accelerate queries by using Aerospike's massive parallelism, predicate pushdown, and secondary indexes.
  • Leverage Presto's Cost-Based Optimization via row count for query optimization.
  • Deploy in a cloud or Kubernetes environment if you want to leverage Managed Presto Services offered by cloud providers.

Aerospike Connect for Presto supports the Trino (formerly PrestoSQL) distribution of Presto.

System Topology

The following diagrams provide a high-level overview of how client applications, Presto, and Aerospike Database interact:

  1. A client application, such as Jupyter, Tableau, or the Presto CLI, passes an SQL query to the Presto coordinator.
  2. The Presto coordinator constructs a query plan and distributes portions of the plan among workers.
  3. The connectors send parallel partition scan requests and push down predicates wherever possible to the Aerospike database as a part of the data load stage of query execution.

  1. The Presto connector loads the scanned data from the 4,096 Aerospike partitions into the configured number of Presto splits.
  2. The Presto workers process the splits and execute the remaining stages to generate the result set.
  3. The coordinator fetches results from the workers and returns them to the client application.

Secondary Indexes

The Aerospike Trino connector now supports secondary index. Creating a secondary index (sindex) on a high-cardinality bin in a set can help significantly speed up your Trino queries. If you know the sindex cardinality, you can provide the sindex to be used for your query as a hint using the sindex_name session property. The '__sindex' table provides the details on available sindexes and is created for each schema. You can change its name using the aerospike.index-table-name configuration property. See the examples section for step by step instructions on how to query using secondary index.

Limitations

  • Support for Presto Cost-Based Optimization is limited to statistics for a table, i.e. row count.
  • Trino ignores case types in table names. Therefore, ensure two sets within the same namespace in Aerospike database do not have the same name that differ in case types, e.g. sets named "deepLearning" and "deeplearning", to prevent name collision in Trino. Similarly, ensure that two bins within the same set do not have the same name that differ in case types.
  • Trino does not support a sindex query on CDTs.
  • Use caution when working with user keys returned by the __digest query. When you query with __digest, user keys stored in the database may be returned as NULL values in the result set. This is a known issue.

Getting started

You can follow these instructions to deploy the Presto connector and a Trino cluster: