Sound business decisions lean heavily on insights, which are harnessed from the various sources of data in your organization. Generating insights is not just about creating premier dashboards or fetching targeted records. The speed of analysis matters as well. Ad-hoc queries and creating interactive dashboards demand near-real-time response times to be actionable, not to mention impactful.
Analytics workloads are read-heavy and demand low latency and high throughput. Large datasets are typically persisted in a database and queried on an ad-hoc basis. The power of a highly optimized SQL engine can be supercharged if the underlying database is highly performant, especially at scale. Enter - Aerospike SQL.
What is Aerospike SQL?
Aerospike SQL is an analytics platform powered by Starburst integrated with the Aerospike Database. Starburst Enterprise is a fully-supported, production-tested and enterprise-grade distribution of open source Trino (formerly Presto SQL).
Aerospike SQL enables you to run Trino on a single machine, a cluster of machines, on-prem, or in the cloud. Aerospike SQL uses the Aerospike Presto (Trino) connector to reconcile the data model differences which manifest themselves while accessing a NoSQL database such as Aerospike (ASDB), using SQL, via an SQL engine such as Trino.
The Presto (Trino) connector seamlessly bridges Aerospike Enterprise Edition (EE) and Starburst Enterprise. It allows you to leverage the scalability, speed, reliability, and total cost of ownership (TCO) benefits of Aerospike, while leveraging the speed, massive parallelism, and support for Trino that Starburst Enterprise offers. It is based on the principle of "separation of compute and storage" to enable you to right-size your compute and storage clusters independently to achieve maximum performance with lower TCO.
The time spent to deal with infrastructure issues is time not spent on issues that matter most to your business. Aerospike SQL comes with best-in-class support from both Aerospike and Starburst, so you can focus on generating valuable insights at scale from the data stored in Aerospike using ANSI SQL to drive your critical business decisions.
Here’s How it Works
- A user submits an SQL query using one of the Starburst Enterprise clients to the Trino coordinator in the Aerospike SQL cluster.
- The coordinator constructs a query plan and distributes portions of the plan among workers.
- The connectors, which run in the workers, send parallel partition scan requests or push-down predicates wherever possible to your Aerospike database cluster as a part of the data load stage of query execution. The Trino connector loads the scanned data from the 4,096 Aerospike partitions into the configured number of Trino splits.
- The Trino workers process the splits and execute the remaining stages to generate the result set.
- The coordinator fetches results from the workers and returns them to the client application.
Aerospike SQL enables you to:
- Run ANSI SQL queries to perform in-place on massive amounts of data in an Aerospike database.
- Browse namespaces/sets easily in Aerospike using the Cluster Explorer to discover data in your cluster.
- Federate queries and clusters to create a single point of access across multiple Aerospike clusters.
- Create dashboards using data stored in Aerospike and familiar business-intelligence (BI) tools, such as Tableau and Power BI.
- Boost your query performance by configuring the Trino connector for massive parallelism, predicate pushdown, and secondary indexes. Queries with Aerospike secondary indexes run roughly 80x faster than queries without them.
- Leverage Presto's cost-based optimization (CBO) via row count for query optimization. Aerospike connector is one of the two Presto connectors which support Presto CBO.
- Leverage Schema inference if you do not know a priori the schema of the data stored in ASDB.
- Secure your data with TLS between clients all the way to Aerospike clusters, LDAP and PKI authentication of Presto users with ASDB, and support for server quotas to guarantee fair usage.
- Deploy anywhere, on-prem or cloud (AWS and GCP).
- Operationalize your use cases quickly with best-in-class support offered by Aerospike and Starburst.
How Aerospike SQL can help you
If you're a data analyst
You can run ad-hoc SQL queries on massive datasets, such as “Count the number of users that have clicked the new banner ad," or “What are some categories of ads they've seen?” You can also create insightful dashboards using BI tools such as Tableau and Power BI.
If you're a Data Protection Officer
You can conduct audits on datasets using SQL to ensure compliance and proactively address potential issues.
If you're a Data Engineer
You can programmatically run complex extract, transform, and load (ETL) queries using Python and Jupyter notebooks and develop complex data models using Aerospike Collection data types (CDT) like maps and lists, and query them using the highly performant Trino JSON Functions.
The Aerospike Presto (Trino) connector currently does not support the following Starburst Enterprise features:
- Query Logger
- Materialized Views
- Caching Service
- Atlas Integration
- Data Catalog - AWS Glue and Hive Metastore
See connector-specific limitations here.