Configuration Properties for Aerospike Connect for Presto
Use these configuration properties in the
aerospike.properties file to specify how the Presto connector should interact with your Aerospike database.
Help with tuning Presto for performance improvements is beyond the scope of this documentation.
To use these configuration properties if you plan to use Docker, you must use the
-v option when you run the
docker run command. Use this option to mount the
aerospike.properties file by providing the path to the folder that contains this file. The default path is
In general, when you use Docker you cannot both set environment variables and set these configuration properties. The only exception is when you use
TRINO_NODE_TYPE. You can set those environment variables using the
-e option and also mount the
aerospike.properties file using the
-v option in the same
docker run command.
The following sets of configuration properties are available:
Basic configuration properties
Description: Number of milliseconds to keep the inferred schema cached. If you specify the schema, this property has no effect.
Default value: 1800000
Description: Name of the Aerospike cluster, if a name has been configured for it.
Default value: null
Description: Maximum socket idle in seconds. Socket connection pools will discard sockets that have been idle longer than the maximum.
Default value: 55
Description: How long the connector should wait (in milliseconds) for a response from your Aerospike database when the connector tries to make its initial connection to it.
Default value: 1000
Description: Set to
true for the connector to throw an exception if it is unable to connect to any seed nodes for an Aerospike database.
Description: Is threadPool shared between other client instances or classes. If threadPool is not shared (default), threadPool will be shutdown when the client instance is closed.
Description: Should use "services-alternate" instead of "services" in info request during cluster tending.
Description: Table name for the default set. Use this environment variable when your namespace has a null set or no sets. If you have multiple namespaces with no sets in your cluster, you can query them, like this:
select * from <namespace_1>.<value>
select * from <namespace_2>.<value>
<value> is the value assigned to
Description: A comma-separated list of seed nodes in the Aerospike cluster.
Default value for non-virtual server environments: null
For standalone deployments in Dockerized environments: On MacOS, use
docker.for.mac.host.internal:3000. On Linux operating systems, use
Description: Require the primary key (PK) on INSERT queries. Although we recommend that you provide a primary key, you can choose not to by setting this property to false, in which case a UUID is generated for the PK. You can view PKs by setting
aerospike.record-key-hidden to false for future queries.
Description: Column name for the record's primary key. You can use this name in the queries for projection and/or predicates.
Description: If set to
false, the primary key column is made available in the result set.
Description: Column name for the record's digest. You can use this name in the queries for projection and/or predicates, and must be specified as a string.
Description: If set to false, the record's digest column will be available in the result set.
Description: Use a strict schema. See "Strict schemas".
Description: Path of the directory containing table description files. Do not use this configuration parameter for Dockerized environments; instead, use the
-v option in the
docker run command, as explained in "Deploying in Standalone Mode in Docker" and "Deploying in Distributed Mode in Docker".
Description: Use this property when you have a namespace or set name with mixed case types in the Aerospike database. This property will help resolve the tables/schemas case-insensitivity issue inherent in Trino that converts all names to lowercase. If turned on, you will be able to use supported SQL statements correctly against table names with mixed case types, e.g. “deepLearning”.
- It does not support two sets with the same name but differing in case types within the same namespace, e.g. sets named “deepLearning” and “deeplearning”.
- It only works with Trino version 360 or greater.
- Using it may have some performance implications, hence use it only when you have set names with mixed case types in the Aerospike database.
- Although the output of SHOW TABLES and DESCRIBE statements are lower case names regardless of whether mixed case naming is used in Aerospike database, you should be able to correctly use SELECT and other statements either with mixed case or lower case names. For example, if deepLearning and Score are the names of the set and the bin names that are used in the Aerospike database,
SELECT Score FROM deepLearning;should work fine. That is despite the fact that set and the column names show up in lower case in the output for SHOW TABLES and DESCRIBE statements respectively.
Possible values: true, false.
Description: Aerospike Java client cache pool size.
Default value: 4
Because Presto is a SQL engine, it assumes that the underlying data store (Aerospike, in this case) follows a strict schema for all the records within a table. However, as a NoSQL database, Aerospike is schema-less.
Therefore, a single bin (mapped to a Presto column) within a set (mapped to a Presto table) could technically hold values of multiple Aerospike supported types.
The Presto connector reconciles this incompatibility with the help of the
aerospike.strict-schema configuration property:
- If none of the column types in the user-specified schema match the bin types of a record in Aerospike, a record with NULLs is returned in the result set.
- If the above mismatch is limited to fewer columns in the user-specified schema, NULL is returned for those columns in the result set. There is no way to differentiate between a NULL due to a missing value in the original data set and a NULL due to a mismatch. Therefore, a user would have to treat all NULLs as missing values. The columns that are not a part of the schema will be automatically filtered out in the result set by the connector.
- If a mismatch between the user-specified schema and the schema of a record in Aerospike is detected at the bin/column level, your query will error out.
- The strict configuration (
true) could be used when you have modeled your data in Aerospike to adhere to a strict schema i.e. each record within the set has the same structure.
Properties related to security
Description: Authenticates all Presto users to your Aerospike database with this single set of credentials. If you set
INTERNAL, ensure that the user and the password, and the associated roles, are set up in the Aerospike database. See Configuring access control for more information.
To override the username and password that are set in this file, you can authenticate users in Presto sessions by running these commands in the Presto CLI:
SET SESSION <catalog_name>.client_policy_user = '<username>'
SET SESSION <catalog_name>.client_policy_password = '<password>'
<catalog_name> matches the catalog name of the Aerospike database being authenticated to.
When you use the
SET SESSION command, the names of these two configuration properties use underscores, not periods, to separate the words that compose them. Also, these are the only properties that you can set with the
SET SESSION command.
Default value: null
Description: Authentication mode to use when values are set for
INTERNAL- Use internal authentication only. The hashed password is stored on the server. Do not send clear password. This is the default.
EXTERNAL- Use external authentication (such as LDAP). Specific external authentication is configured on server. If TLS is defined, send clear password on node login via TLS. Throw exception if TLS is not defined.
EXTERNAL_INSECURE- Use external authentication (such as LDAP). Specific external authentication is configured on server. Send clear password on node login whether or not TLS is defined. This mode should only be used for testing purposes because it is does not provide secure authentication.
PKI- Authentication and authorization based on a certificate. No user name or password needs to be configured. Requires TLS and a client certificate. Requires server version 5.7.0+.
Description: Enable secure TLS connection.
Description: The type of the keystore.
Description: Keystore file path.
Default value: null
Description: Keystore password.
Default value: null
Description: Key password.
Default value: null
Description: Truststore file path.
Default value: null
Description: Truststore password.
Default value: null
Description: Use TLS connection only for login authentication.
Description: A comma-separated list of allowable TLS ciphers to use for the secure connection.
Default value: Default ciphers defined by JVM
Description: A comma separated list of allowable TLS protocols to use for the secure connection.
Properties related to performance
Description: Number of synchronous connection pools used for each node.
If each of your nodes has eight or fewer CPU cores, you can leave this value at the default. However, if each node has more CPU cores, use a higher value to create multiple connection pools per node. Doing so helps to avoid contention among CPU cores for pooled connections.
Default value: 1
Description: Maximum number of synchronous connections allowed per server node. Increasing this value can help prevent the connector from reaching the maximum number of connections if you run many queries that use parallel scans.
Default value: 300
Description: Generate statistics for Cost-Based Optimization (CBO). Currently, the Presto connector only supports the row count. Ensure that you turn on CBO in Presto.
Description: Limit returned records per second (rps) rate for each server. A value of 0 specifies that there is no limit. Setting this value higher than 0 throttles the rate at which records are returned.
Default value: 0
Description: Maximum number of concurrent requests to server nodes at any point in time. Issue requests to all server nodes in parallel if maxConcurrentNodes is zero.
Default value: 0
Description: Number of Presto splits. Update this property to align with the available resources (CPU threads) in your cluster. Aerospike connector supports up to
Integer.MAX_VALUE splits (i.e. 2^31-1 Presto splits) for parallel partition scans by Presto workers.
Splits is the unit of parallelism in Presto. Hence, we can support up to ~2B Presto worker threads (configurable by setting
task.max-worker-threads in Presto).
Setting this value too high may cause a drop in performance due to context switching. Aerospike recommends that you set the value of
aerospike.split-number to the result of multiplying the number of cores by the number of threads per core.
Default value: 16
Use a value of 4 for Dockerized environments
Description: Socket connect timeout in milliseconds.
Default value: 0
Description: Aerospike socket idle timeout in milliseconds when processing a database command.
Default value: 300000
Description: Aerospike Java client Netty event loop group size.
<number of available CPU cores>
Description: Maximum number of async commands that can be processed in each event loop at any point in time.
0 (execute all async commands immediately)
Description: Maximum number of async commands that can be stored in each event loop's delay queue for later execution.
0 (no delay queue limit)
Description: Expected number of concurrent asynchronous commands in each event loop that are active at any point in time.
Default value: 256
Properties related to audit trail
The Aerospike connector supports query audit logging by leveraging Trino event listener.
It currently logs timestamp, initiating user (name used in the Trino session or the user's OS name if session name was unspecified), schema name and table name (format is schamename.tablename), query ID, query status (success/failure), SQL statement, and the number of records that were read or written.
You can enable it by creating a configuration file
etc/event-listener.properties with the following properties.
Description: Set this name to
Description: Path of the security audit log file. If you plan to use the default path, make sure that the permissions to create the file in the specified location exist, otherwise the feature will not work.
Default value: etc/log/security.log
Description: Maximum size of a single security audit log file that is specified in bytes (e.g. 128MB=134217728).
Default value: 134217728
Description: Maximum number of security audit log files that could be created.
Default value: 24
- The log file contains one log entry per line, and the values are separated by a tab character. The timestamp is using ISO 8601 format.
- The log files are stored in the Trino coordinator.
- schamename.tablename is not displayed if the query fails.
Properties related to backing off due to error conditions
These properties are available in Aerospike Connect for Presto versions 1.1.0 or later.
When certain error conditions occur as the connector interacts with your Aerospike cluster, the connector can "back off" from the server. "Backing off" means not only to retry the actions that led to the error, but to retry them at exponentially increasing intervals of time.
The duration of the interval before the first attempt is specified by the configuration property
aerospike.retry-initial-millis. If the database cannot service the request because it is busy, the connector continues attempting the same action after exponentially longer intervals. To compute the length of each successive interval, the connector multiplies the duration of the current interval by the value of the configuration parameter
For example, if the initial wait time is 1s (1000 milliseconds) and the multiplier is 2, the retries are attempted at 1s, 2s, 4s, 8s, 16s, 32s, and so on. The connector continues retrying the action until the database can service the request or until the connector reaches the maximum number of retries allowed, which you can specify with
The most important of the error conditions that prompt the connector to back off from the server is the violation of a rate quota. The other error conditions are internal error conditions.
These three configuration properties determine the "back off" behavior of the connector when the connector encounters one of the error conditions:
Description: The time to wait (in milliseconds) before retrying for the first time the action that led to the error condition. If the error condition persists after the initial retry, subsequent retries are attempted at intervals that become exponentially longer.
Default value: 1000
Description: The maximum number of times to retry an action that led to an error condition. A value of 0 prevents the connector from retrying.
Default value: 3
Description: The integer by which to multiply the duration of the current wait interval to determine the duration of the next wait interval.
Default value: 2
Description: Enables case-insensitive name resolution. Has minor performance penalty when enabled.
Default value: false
Description: The name of the table that stores the list of available secondary indexes for a schema.
Default value: __sindex
Back-offs for violations of one or both rate quotas
You can use rate quotas with Aerospike Connect for Presto version 1.1.0 or later only when your Aerospike Database Enterprise Edition cluster is at version 5.6 or later.
In your Aerospike database, an administrator can set rate quotas for roles and then assign users to those roles. One rate quota limits the number of reads in terms of records per second, and the other rate quota limits the number of writes, also in terms of records per second. All record accesses are counted towards the quotas: updates, replaces, UDFs, background UDFs, reads, batch reads and scans. Your Aerospike database consequently limits the user to a number of transactions per second. This number consists of the sum of the two rate quotas.
For example, you might set for the role
analysts the rate quota of 40,000 records per second for reads, and the rate quota of 40,000 records per second for writes. Then, you might assign the user
analyst_1 the role
analysts. When a query issued against your Aerospike database by
analyst_1 results in a rate of transactions per second that includes a breach of either of these rate quotas, the connector waits before attempting to re-run the stage of the query that violated a rate quota.
The connector might have to retry a query stage more than once because the database might be busy and not have the resources to service the request. If the query stage still does not run after the maximum number of retries, the connector fails the query, In this situation, the user can retry the query at a later time.