Skip to main content

Aerospike Connect for Presto

With Aerospike Connect for Presto (also referred to as the Presto connector), you can:

  • Run ANSI SQL queries to perform in-place, on-demand analytics on massive amounts of data in Aerospike databases
  • Federate queries across multiple Aerospike clusters
  • Integrate Aerospike into an ecosystem of multiple data-storage technologies
  • Query Aerospike by using business-intelligence (BI) tools, such as Tableau
  • Query records with different schemas within the same set in Aerospike
  • Accelerate queries by using Aerospike's massive parallelism
  • Leverage Presto's Cost-Based Optimization via row count for query optimization
  • Deploy in a cloud or Kubernetes environment if you want to leverage Managed Presto Services offered by cloud providers

Aerospike Connect for Presto supports the Trino (formerly PrestoSQL) distribution of Presto.

System topology

The following diagrams provide a high-level overview of how client applications, Presto, and Aerospike Database interact:

  1. A client application, such as Jupyter, Tableau, or the Presto CLI, passes an SQL query to the Presto coordinator.
  2. The Presto coordinator constructs a query plan and distributes portions of the plan among workers.
  3. The connectors send parallel partition scan requests and push down predicates wherever possible to the Aerospike database as a part of the data load stage of query execution.

  1. The Presto connector loads the scanned data from the 4,096 Aerospike partitions into the configured number of Presto splits.
  2. The Presto workers process the splits and execute the remaining stages to generate the result set.
  3. The coordinator fetches results from the workers and returns them to the client application.

Limitations

  • Support for Presto Cost-Based Optimization is limited to statistics for a table, i.e. row count.
  • Trino ignores case types in table names. Therefore, ensure two sets within the same namespace in Aerospike database do not have the same name that differ in case types, e.g. sets named "deepLearning" and "deeplearning", to prevent name collision in Trino. Similarly, ensure that two bins within the same set do not have the same name that differ in case types.
  • The Presto connector does not make use of secondary indexes.

Getting started

You can follow these instructions to deploy the Presto connector and a Trino cluster: