Aerospike allows reading and writing with great flexibility. With an Aerospike
policy, you can create read-modify-write patterns of optimistic
concurrency, control the time to live, and write a record only if no record
previously exists (or the converse).
Operations of this type are very quick, because information like the generation and time-to-live are stored in the primary key index. No extra work is needed to retrieve the data object.
These policies affect both database operations and client operations. Many policies are used to send the appropriate wire protocol commands to the server. Other policies (like maxRetries) affect client operation.
These policies exist with each client, and have slightly different APIs. After understanding which policies you need for your application, please see the client specific documentation for precise syntax.
Set Default Client Policies
Default client policies can be created for each AerospikeClient instance. The following example demonstrates how to set policy defaults in the Java client. For language-specific examples, see the documentation for your client.
// Set client default policies.
ClientPolicy clientPolicy = new ClientPolicy();
clientPolicy.readPolicyDefault.replica = Replica.MASTER_PROLES;
clientPolicy.readPolicyDefault.consistencyLevel = ConsistencyLevel.CONSISTENCY_ALL;
clientPolicy.readPolicyDefault.socketTimeout = 100;
clientPolicy.readPolicyDefault.totalTimeout = 100;
clientPolicy.writePolicyDefault.commitLevel = CommitLevel.COMMIT_ALL;
clientPolicy.writePolicyDefault.socketTimeout = 500;
clientPolicy.writePolicyDefault.totalTimeout = 500;
// Connect to the cluster.
AerospikeClient client = new AerospikeClient(clientPolicy, new Host("seed1", 3000));
Set Per-Transaction Client Policies
To set policies on a per-transaction basis, pass the desired policy settings to
the individual API call. For example, to perform writes with the
// Make a copy of the client's default write policy.
WritePolicy policy = new WritePolicy(client.writePolicyDefault);
// Change commit level.
policy.commitLevel = ConsistencyLevel.COMMIT_MASTER;
// Write record with modified write policy.
client.put(policy, key, bins);
If the policy specified in the transaction call is not null, that policy overrides the corresponding policy defined at the client connection level. If the transaction policy is null, the corresponding default client policy will be used.
You can override client-selected per-transaction data consistency levels on the server using dynamically changeable server configuration parameters at the namespace level.
The following section describes the Aerospike Java client policies. Other clients use similar constructs.
Policy.replica) specifies which replica the client will access during
the single record operation:
SEQUENCE(default) — Try node containing key's master partition first. If connection fails, all commands try nodes containing replicated partitions. If socketTimeout is reached, reads also try nodes containing replicated partitions, but writes remain on master node.
MASTER— Use node containing key's master partition.
MASTER_PROLES— Distribute reads across nodes containing key's master and replicated partitions in round-robin fashion. Writes always use node containing key's master partition.
RANDOM— Distribute reads across all nodes in cluster in round-robin fashion. Writes always use node containing key's master partition.
By default, all client reads are first directed to the master replica
Replica.SEQUENCE), however, you may want to spread the reads over all
available replicas (for example, the performance impact of reading a hot key can
be reduced along the order of the replication factor). Set the replica policy to
Replica.MASTER_PROLES to distribute reads across master and proles.
AP Data Consistency Level
Consistency level (
Policy.consistencyLevel) specifies how many replicas the
server is to consult internally to determine the most-recent record value, and
return it to the client:
The consistencyLevel configurations do not apply to namespaces with
strong-consistency set to
CONSISTENCY_ONE(default) — Read a single replica before returning.
CONSISTENCY_ALL— Read from all node holding a unique version of the record's parent partition.
The default client behavior when reading a record (including
operate functions) is to read only one replica
During cluster reconfiguration, reading a single replica may not return the most
recently-written version. If you wish the server to provide "duplicate
resolution", which is, to contact replicas and find the most recent version,
including updating the master's copy, set the consistency level policy to
The potential performance degradation due to reading all replicas is only significant during cluster reconfiguration.
If enabled, the linearize read policy (
Policy.linearizeRead) forces reads to
be linearized for server namespaces that support strong consistency mode.
If enabled, send key (
Policy.sendKey) sends the user-defined key in addition
to hash digest on both reads and writes. If the key is sent on a write, the key
will be stored with the record on the server and returned to the client on scans
Socket timeout (
Policy.socketTimeout) specifies socket idle timeout in
milliseconds when processing a database command.
If socketTimeout is not zero and the socket has been idle for at least socketTimeout, both maxRetries and totalTimeout are checked. If maxRetries and totalTimeout are not exceeded, the transaction is retried.
If both socketTimeout and totalTimeout are non-zero and socketTimeout > totalTimeout, then socketTimeout will be set to totalTimeout.
If socketTimeout is zero, there will be no socket idle limit.
Total timeout (
Policy.totalTimeout) specifies total transaction timeout in
The totalTimeout is tracked on the client and sent to the server along with the transaction in the wire protocol. The client will most likely timeout first, but the server also has the capability to timeout the transaction.
If totalTimeout is not zero and totalTimeout is reached before the transaction completes, the transaction will abort with a timeout exception.
Max retries (
Policy.maxRetries) specifies the maximum number of retries before
aborting the current transaction. The initial attempt is not counted as a retry.
If maxRetries is exceeded, the transaction will abort with a timeout exception.
Database writes that are not idempotent (such as add()) should not be retried because the write operation may be performed multiple times if the client timed out previous transaction attempts. It's important to use a distinct WritePolicy for non-idempotent writes which sets maxRetries to zero.
Default for read: 2 (initial attempt + 2 retries = 3 attempts)
Default for write/query/scan: 0 (no retries)
Sleep Between Retries
Sleep between retries (
Policy.sleepBetweenRetries) is the milliseconds to
sleep between retries. Enter zero to skip sleep. This field is ignored when
maxRetries is zero. This field is also ignored in async mode.
The sleep only occurs on connection errors and server timeouts which suggest a node is down and the cluster is reforming. The sleep does not occur when the client's socketTimeout expires.
Reads do not have to sleep when a node goes down because the cluster does not shut out reads during cluster reformation. The default for reads is zero.
The default for writes is also zero because writes are not retried by default. Writes need to wait for the cluster to reform when a node goes down. Immediate write retries on node failure have been shown to consistently result in errors. If maxRetries is greater than zero on a write, then sleepBetweenRetries should be set high enough to allow the cluster to reform (>= 500ms).
The write mode (
WritePolicy.recordExistsAction) specifies how to handle writes
where the record already exists.
UPDATE(default) — Create or update record. Merge write command bins with existing bins.
UPDATE_ONLY— Update record only. Fail if record does not exist. Merge write command bins with existing bins.
REPLACE— Create or replace record. Delete existing bins not referenced by write command bins.
REPLACE_ONLY— Replace record only. Fail if record does not exist. Delete existing bins not referenced by write command bins.
CREATE_ONLY— Create only. Fail if record exists.
AP Write Commit Level
The commit level policy (
WritePolicy.commitLevel) specifies how many replicas
the server must write successfully before successfully returning to the client:
The commitLevel configurations do not apply to namespaces with
strong-consistency set to
COMMIT_ALL(default) — Commit all replicas before returning. Required for strong consistency mode.
COMMIT_MASTER— Return after committing only the master replica and replicate the
prolereplica(s) asynchronously. In strong consistency mode,
COMMIT_MASTERwill cause an error.
The default client behavior when modifying a record (including
operate, and UDF functions) is to confirm that all replicas were
successfully written before returning success from the write-related API. This
default policy (
CommitLevel.COMMIT_ALL) provides the highest level of write
If a lower write latency is desired and the application can tolerate a lower
write consistency level (with the possibility of 'dirty reads,' which is when an
older value returns if a read of the same record is done from a non-master
replica before the replica is committed), set the commit level policy to
CommitLevel.COMMIT_MASTER. Since Aerospike 5.7, if the client is pushing a
higher rate of
COMMIT_MASTER transactions than the server's replication system
can handle then the server will push back by converting these transactions to
Write Generation Policy
The generation policy (
WritePolicy.generationPolicy) specifies how to handle
record writes based on record generation.
Record generation is an internal counter that uses integer values and that Aerospike increments every time you update a record. ("Generation" in this context does not mean "the act of generating", but "version".) When a record is inserted, the counter starts at 1. Therefore, a record for which the counter is currently at, say, 5, has been updated four times. Client applications cannot directly change the value of the counter. Reading a record does not cause Aerospike to increment its counter.
When Aerospike is in Available and Partition-tolerant (AP) mode, Aerospike resets a record's counter to 1 after it has been updated 64K times. When Aerospike is in strong-consistency mode, it resets a record's counter to 1 after the record has been updated 1K times.
Client applications can use this counter to coordinate a read-modify-write sequence of operations with other client applications.
For example, suppose a client application needs to read data from a record, modify the data, and then write the modified data back into the record. Reading the record requires a lock on it, as does writing to the record. However, during the time the client app modifies data, it holds no lock on the record. Another client app can update the same record before the first client app is able to obtain a write lock and write the modified data.
If the generation policy is set to
During the read operation, the client app also reads the value of the generation counter for the record.
After the client app modifies the data and obtains a write lock on the record, it reads the current value of the counter.
One of the following situations occurs:
If the generation policy is set to
- If the current value is equal to the value that it read earlier, then the client app writes the modified data to the record.
- If the values are not equal, the client app does not perform the write operation. The client app can retry the sequence of read-modify-write operations.
If the generation policy is set to
- If the current value is greater than the value that it read earlier, the client app writes the modified data to the record.
- If the current value is not greater, the client app does not perform the write operation. The client app can retry the sequence of read-modify-write operations.
If the generation policy is set to
NONE, the client app does not read the
value of the counter when reading data from the record. After modifying the data
that it read, it writes the modified data to the record.
NONE(default) — Client apps do not use the record-generation counter to restrict writes.
EXPECT_GEN_EQUAL— Client apps update or delete records where the previously read value of the counter is equal to the current value. Otherwise, write operations fail, and client apps need to retry them.
EXPECT_GEN_GT— Client apps update or delete records where the previously read value of the counter is less than the current value. Otherwise, write operations fail, and client apps need to retry them. This value is useful for when you want to restore records from a backup, and want to write only records for which you have an older version.
Expiration (Time To Live)
Record expiration (
WritePolicy.expiration) or time to live (ttl) is the number
of seconds the record will live before being removed by the server. Expiration
- -2 — Do not change ttl when record is updated.
- -1 — Never expire.
- 0 — Default to namespace configuration variable "default-ttl" on the server.
- > 0 — Actual ttl in seconds.
If enabled, durable delete (
WritePolicy.durableDelete) leaves a tombstone when
a record is deleted. This prevents deleted records from reappearing after node