Histograms from Aerospike Logs

Overview

Histograms plot the distribution of latencies for various transaction types that are processed by an Aerospike cluster. You can use them for in-depth analysis of the activity on your system.

Writes to the log file

Periodically, the Aerospike server writes histograms to the log file. The frequency of writes to the log file is controlled by the ticker-interval configuration parameter.

Histograms begin with the text "histogram dump:" followed by:

the histogram name
the number of measurements represented by the histogram
a maximum of four buckets each
- The form for each bucket is (BUCKET_NUMBER: NUM_OBSERVED). The threshold for a given bucket is 2^{BUCKET_NUMBER} units.

Example histogram dump

The following example shows a read histogram for the namespace test where 36,453,689,581 measurements are reported, 36,414,385,962 of which were less than 1ms (2⁰ms):

Jun 09 2020 21:40:06 GMT: INFO (info): (hist.c:240) histogram dump: {test}-read (36453689581 total) msec
Jun 09 2020 21:40:06 GMT: INFO (info): (hist.c:257)  (00: 36414385962) (01: 0031233450) (02: 0007841324) (03: 0000215526)
Jun 09 2020 21:40:06 GMT: INFO (info): (hist.c:257)  (04: 0000002198) (05: 0000000693) (06: 0000000590) (07: 0000000716)
Jun 09 2020 21:40:06 GMT: INFO (info): (hist.c:257)  (08: 0000001405) (09: 0000002365) (10: 0000004646) (11: 0000000706)

Some histograms provide metrics which are not latency related, such as accessed object sizes on read/write transactions, and number of records returned by secondary index queries.

High level characteristics of histogram output

Histograms only appear in the logs if they contain data. For example, a batch-specific histogram on a server which does not do any batch read operations would not be present.
Read and write histograms are tracked on per-namespace level basis. If latencies are only observed on a specific namespace, it is easier to track down based on the specific data in the namespace and the application writing to it.
To analyze transactions in detail, turn on the relevant configuration-enabled benchmark histograms specific to the transaction type.
Turning on a couple of the non-default histograms usually does not impact the performance of a cluster. However, in some cases, turning on more than a few of those configurable histograms could have a noticeable impact on the performance of a cluster.
Analyzing a time period that bridges a server restart distorts results of time slices and average/maximum calculations.
Histogram data can also be retrieved through the asadm show latencies command.
By default, latency histograms are measured in milliseconds, and can be configured to microseconds. See microsecond-histograms for details.

info

To analyze histogram data, refer to the Log Latency Tool guide, or the histogram command of the Aerospike Admin (asadm) tool when used under the log analyzer mode.

Notational conventions

Required braces, like {ns}, where ns is the name of a namespace, indicate an input variable you supply on the asinfo command line.
Brackets, like fabric-[channel]-send-init, for the different fabric channels, indicate an output name of the particular histogram type.

Benchmarks

Benchmarks are active as either auto-enabled or configuration-enabled.

Auto-enabled benchmarks

Auto-enabled benchmarks are a set of histograms that measure end-to-end performance of Aerospike subsystems and appear only when the histogram has data to display. They give a high-level view of what is happening on a node at a given point in time. These benchmark stats are written to the Aerospike log file at 10 second intervals and can be used to spot things such as latency in a certain transaction type.

{ns}-read

Time taken for read requests from the time they are received at the node to when the response leaves the node.

{ns}-write

Time taken for writes from end-to-end (includes the time taken for replica write). Does not include deletes. For server versions prior to 4.4, this histogram only included the final "chunk" when reading a large request out of the socket, and therefore could under-report the latency.

{ns}-udf

Time taken for UDF transactions.

batch-index

Batch read "parent" transaction. Tracks the time taken from the start of the transaction to getting a response. For server versions prior to 4.4, this histogram only included the final "chunk" when reading a large request out of the socket, and therefore could under-report the latency. If batch calls are showing high latencies on the client side, consider changing the maxConcurrentThreads, for example, in the Java client library, set Batch Policy from the default 1 (sequential) to 0 (or a number greater than 0). Configuring the maxConcurrentThreads allows concurrent synchronous batch request threads to be sent to server nodes at any time.

{ns}-batch-sub-read

Time taken for batch (sub) read requests from the time they are received at the node to when the response leaves the node. This histogram is introduced in server version 6.4.

{ns}-batch-sub-write

Time taken for batch (sub) writes from end-to-end (includes the time taken for replica write). Does not include deletes. This histogram is introduced in server version 6.4.

{ns}-batch-sub-udf

Time taken for batch (sub) UDF transactions. This histogram is introduced in server version 6.4.

{ns}-query

Time taken for query transactions. This histogram is removed as of server version 6.0.

{ns}-query-rec-count

This displays the records returned per query. This histogram shows the distribution of counts, and not latencies. Note: the first bucket in histogram is always empty as queries not returning any records do not get counted in this histogram. This histogram is removed as of server version 6.0.

{ns}-pi-query

Time taken for primary index short query transactions. This histogram is introduced in server version 6.0.

{ns}-pi-query-rec-count

This displays the records returned per primary index short query. This histogram shows the distribution of counts, and not latencies. Note: the first bucket in histogram is always empty as queries not returning any records do not get counted in this histogram. This histogram is introduced in server version 6.0.

{ns}-si-query

Time taken for secondary index short query transactions. This histogram is introduced in server version 6.0.

{ns}-si-query-rec-count

This displays the records returned per secondary index short query. This histogram shows the distribution of counts, and not latencies. Note: the first bucket in histogram is always empty as queries not returning any records do not get counted in this histogram. This histogram is introduced in server version 6.0.

{ns}-re-repl

Latencies when re-replicating. Only available in strong consistency enabled namespaces.

Configuration-enabled benchmarks

Configuration-enabled benchmarks include specific histograms that you can enable or disable dynamically. These benchmarks are more granular than the auto-enabled benchmarks. If you contact Aerospike Support for assistance with a problem, they may request that you enable a few sets that are relevant to your issue. If you keep these benchmarks permanently enabled, they may impact performance.

note

Tools package 6.0.x or later is required to use asadm's manage config commands. Otherwise, use the equivalent asinfo - set-config command.

General transaction histogram

To enable a specific benchmark, use the following command:

asadm -e 'enable; manage config service param <configuration> to true'

Type of Histogram	Description	Configuration
info	Time taken for an info protocol transaction.	enable-hist-info

Specific transaction benchmarks

The transaction benchmarks are monitored at the per-namespace level.

Proxy transaction analysis

Displays time for a proxy transaction from the sender view.

asadm -e 'enable; manage config namespace <namespaceName> param enable-hist-proxy to true'

proxy

Time taken for proxy transactions.

Fabric transaction analysis

Displays the time for a fabric transaction from sender's view.

asadm -e 'enable; manage config service param enable-benchmarks-fabric to true'

fabric-[channel]-send-init

Histogram that tracks the time taken for preparing a message for sending over fabric (intra cluster communication). Does not include any network operations. Replace [channel] with one of bulk (migrations), ctrl (paxos and migration control), meta (smd) or rw (read / write transactions).

fabric-[channel]-send-fragment

Histogram that tracks the time taken from end of send-init and stops after the network send. Can be hit multiple times if message was too big to fit in the send buffer.Replace [channel] with one of bulk (migrations), ctrl (paxos and migration control), meta (smd) or rw (read / write transactions).

fabric-[channel]-recv-fragment

Histogram that tracks the time taken from the beginning of the first byte of a message from network and ends after network recv. Can be hit multiple times if message was big enough to require multiple network recv calls.Replace [channel] with one of bulk (migrations), ctrl (paxos and migration control), meta (smd) or rw (read / write transactions).

fabric-[channel]-recv-cb

Histogram that tracks the time taken to process the transaction on the receiving node (rw, bulk, control, meta).Replace [channel] with one of bulk (migrations), ctrl (paxos and migration control), meta (smd) or rw (read / write transactions).

Ops subtransaction analysis

Displays slices of internally generated ops subtransactions from a background scan or query.

asadm -e 'enable; manage config namespace <namespaceName> param enable-benchmarks-ops-sub to true'

ops-sub-start

ops-sub-restart

Time taken for a partition to be reserved to a partition being re-reserved (possible internal restart/reattempt, e.g. "key busy"). Refer to the write-restart histogram for further details about transactions being reattempted.

ops-sub-dup-res

Time taken from a partition reserved to duplicates resolved (only during ongoing data rebalancing). Note that timeouts are not included.

ops-sub-master

Time taken from partition reserved or after duplicate resolution to an actual master record applied.

ops-sub-repl-write

Time taken from master record written to replica(s) written.

ops-sub-response

Time taken from master write (if write-commit-level configured to master) or replica write to client response completion.

Read transaction analysis

Displays slices of a single-record read transaction.

asadm -e 'enable; manage config namespace <namespaceName> param enable-benchmarks-read to true'

read-start

Time taken from the beginning of a transaction to when a node partition is reserved. High latencies here could indicate bottleneck at the network or CPU (and often irq balancing) as well as in the transaction queue. This includes the demarshal time (tracked for all transactions under svc-demarshal for server versions prior to 4.8) as well as the time spent in a transaction queue (tracked for all transactions under svc-queue for server versions prior to 4.8).Note as of server version 4.7, there are no transaction queues and thus only the demarshal time slice is relevant.

read-restart

Time taken for a partition to be reserved to a partition being re-reserved (possible internal restart/reattempt, e.g. "key busy" on duplicate resolve).For this slice, in the case of a potential hotkey, the clock actually starts as soon as the transaction gets into the rw-hash, until it is being reattempted. For servers prior to 4.7, the reattempt scenario is this:a transaction first goes through the transaction queue, then gets picked up by a transaction thread, and if there is already an entry in the rw-hash for this record (for transactions required to go through the rw-hash), it will wait for the transaction being processed to complete, which will then put all the transactions waiting on the rw-hash back in the transaction queue in order for those transactions to be reattempted.As of server 4.7, the process is similar but there is no initial transaction queue/thread phase; also, once the transaction beingprocessed completes, all transactions waiting on the rw-hash are put into an internal transaction queue in order to be reattempted.
The same transaction can be re-queued multiple times (each will be counted as a separate transaction in this slice), but the clock will be reset everytime a transaction gets into the rw-hash. Therefore, a transaction being re-queued multiple times will not have its total accumulated time accounted for, but rather only the time it waited in the rw-hash (queued behind a transaction in progress) the last time prior to going through.

read-dup-res

Time taken from a partition reserved to duplicates resolved (only during ongoing data rebalancing). Note that timeouts are not included.

read-local

Time taken from partition reserved or after duplicate resolution to an actual local record read. For data-in-memory true, this is the slice for reading from memory. For data-in-memory false, this is the slice for reading from the disk except when the record is still in the post-write-queue or in cached (read-page-cache enabled). To troubleshoot further, enable storage benchmarks.

read-response

Time taken from local record read to client response completion. High latencies here could indicate a bottleneck on the network or the client, getting slow to consume responses from the server.

read-repl-ping

Time taken for reading all the copies of a record when the linearize read policy is in effect. This only applies for strong consistency enabled namespaces.

Write transaction analysis

Displays slices of a single-record write transaction, excluding deletes.

asadm -e 'enable; manage config namespace <namespaceName> param enable-benchmarks-write to true'

write-start

Time taken from the beginning of a transaction to when a node partition is reserved.High latency here could indicate bottleneck at the network or CPU (and often irq balancing). When compression is enabled, higher latencies could also potentially be observed in this slice. This includes the demarshal time (tracked for all transactions under svc-demarshal for server versions prior to 4.8) as well as the time spent in the transaction queue (tracked for all transactions under svc-queue for server versions prior to 4.8). For server versions prior to 4.4, this histogram only included the final "chunk" when reading a large request out of the socket, and therefore could under-report the latency.Note as of server version 4.7, there are no transaction queues and thus only the demarshal time slice is relevant.

write-restart

Time taken for a partition to be reserved to a partition being re-reserved (possible internal restart/reattempt, e.g. "key busy").For this slice, in the case of a potential hotkey, the clock actually starts as soon as the transaction gets into the rw-hash, until it is being reattempted.
For servers prior to 4.7, the reattempt scenario is this:a transaction first goes through the transaction queue, then gets picked up by a transaction thread, and if there is already an entry in the rw-hash for this record (for transactions required to go through the rw-hash), it will wait for the transaction being processed to complete, which will then put all the transactions waiting on the rw-hash back in the transaction queue in order for those transactions to be reattempted.As of server 4.7, the process is similar but there is no initial transaction queue/thread phase; also, once the transaction beingprocessed completes, all transactions waiting on the rw-hash are put into an internal transaction queue in order to be reattempted.
The same transaction can be re-queued multiple times (each will be counted as a separate transaction in this slice), but the clock will be reset everytime a transaction gets into the rw-hash. Therefore, a transaction being re-queued multiple times will not have its total accumulated time accounted for, but rather only the time it waited in the rw-hash (queued behind a transaction in progress) the last time prior to going through.This also includes restarts from duplicate resolution if any.

write-dup-res

Time taken from a partition reserved to duplicates resolved (only during ongoing data rebalancing). Note that timeouts are not included.

write-master

Time between when the partition is reserved or after a duplicate resolution finishes, to when an actual master record write to the swb in-memory buffer takes place. Latencies displayed here typically cover the time it took to update secondary indexes when necessary, and/or the read portion of the record write (if a write is an update, the record is read before a write is performed). Here are the details of what would be counted in this slice:

read before write
because of update
because of stored key (sendKey policy set to true)
compression at rest
sindex
commit-to-device true (in SC)
binop processing
filter processing (if the filter requires reading a bin to evaluate)

write-repl-write

Time taken from master record written to replica(s) written.

write-response

Time taken from master write (if write-commit-level configured to master) or replica write to client response completion. High latencies here could indicate a bottleneck on the network or the client, getting slow to consume responses from the server.

UDF transaction analysis

Displays slices of a single-record UDF transaction.

asadm -e 'enable; manage config namespace <namespaceName> param enable-benchmarks-udf to true'

udf-start

Time taken from the beginning of a transaction to when a node partition is reserved. This includes the demarshal part (tracked for all transactions under svc-demarshal for server versions prior to 4.8) as well as the time spent in the transaction queue (tracked for all transactions under svc-queue for server versions prior to 4.8).Note as of server version 4.7, there are no transaction queues and thus only the demarshal time slice is relevant.

udf-restart

udf-dup-res

Time taken from a partition reserved to duplicates resolved (only during ongoing data rebalancing). Note that timeouts are not included.

udf-master

Time taken from partition reserved or after duplicate resolution to an actual master record applied.

udf-repl-write

Time taken from master record written to replica(s) written.

udf-response

Time taken from master write (if write-commit-level configured to master) or replica write to client response completion.

Sub-UDF transaction analysis

Displays the slices of internally generated UDF transactions, usually by a background UDF, scan or a query.

asadm -e 'enable; manage config namespace <namespaceName> param enable-benchmarks-udf-sub to true'

udf-sub-start

udf-sub-restart

udf-sub-dup-res

Time taken from a partition reserved to duplicates resolved (only during ongoing data rebalancing). Note that timeouts are not included.

udf-sub-master

Time taken from partition reserved or after duplicate resolution to an actual master record applied.

udf-sub-response

Time taken from master write (if write-commit-level configured to master) or replica write to client response completion.

Batch transaction analysis

Displays the slices for a batch transaction.

asadm -e 'enable; manage config namespace <namespaceName> param enable-benchmarks-batch-sub to true'

batch-sub-udf-master

Time taken from partition reserved or after duplicate resolution to an actual master record applied via UDF.

batch-sub-repl-write

Time taken from master record written to replica(s) written.

batch-sub-write-master

Time taken from partition reserved or after duplicate resolution to an actual master record applied.

batch-sub-prestart

Available in server versions 4.8 and later. The demarshal time for a batch sub-transaction.

batch-sub-start

For server versions 4.8 and later, the time taken from a batch sub-transaction being queued until a node partition is reserved.For server versions prior to 4.8, thetime taken from the beginning of a transaction to when a node partition is reserved. This includes the demarshal part (tracked for all transactions under svc-demarshal) as well as the time spent in the transaction queue (tracked for all transactions under svc-queue). For server versions prior to 4.4, this histogram only included the final "chunk" when reading a large request out of the socket, and therefore could under-report the latency.Note as of server version 4.7, there are no transaction queues and thus only the demarshal time slice is relevant.

Note server version 4.8.0.1 introduced a problem where this histogram also contains data points that should instead be in thebatch-sub-restart histogram. This is fixed as of version 4.8.0.27, and also fixed for other server lineages in versions 4.9.0.25,5.0.0.28, 5.1.0.26, 5.2.0.18, 5.3.0.9, and 5.4.0.4.

batch-sub-restart

Time taken from a partition being reserved to a partition being re-reserved (possible internal restart/reattempt).Refer to the write-restart histogram for further details about transactions being reattempted.

Note server version 4.8.0.1 introduced a problem where this histogram is never populated -- with its data points instead being placed in the batch-sub-start histogram. This is fixed as of version 4.8.0.27, and also fixed for other server lineages in versions 4.9.0.25, 5.0.0.28, 5.1.0.26, 5.2.0.18, 5.3.0.9, and 5.4.0.4.

batch-sub-dup-res

Time taken from a partition reserved to duplicates resolved (only during ongoing data rebalancing). Note that timeouts are not included.

batch-sub-read-local

Time taken from partition reserved or after duplicate resolution to an actual local record read.

batch-sub-response

Time taken from local record read to batch parent response completion.

batch-sub-repl-ping

Time taken for reading all the copies of a record when the linearize read policy is in effect. This only applies for strong consistency enabled namespaces.

Storage transaction analysis

Storage benchmarks are a set of device-targeted histograms that measure performance of the storage subsystem in time slices. They provide insight into performance issues related to storage devices and PMem storage files. Due to their performance impact, they are disabled by default. You can enable them by passing the set-config command to the asinfo utility.

To enable storage-benchmarks:

asadm -e 'enable; manage config namespace NAMESPACE storage-engine param enable-benchmarks-storage to true'

{ns}-[device]-read

Histogram that tracks the time taken to read from disk. (Not available for pmem storage.)

{ns}-[device]-write

Histogram that tracks amount of time taken to write write-block to disk. (Not available for pmem storage.) The size of a write block is the configured write-block-size. When running with commit-to-device set to true with server versions prior to 5.3, this counter will only account for full blocks written and therefore will only count blocks written through the defragmentation process as client writes would write to disk individually rather than at a block level. But when running with commit-to-device set to true with server versions 5.3 and later, client writes will also be included in the histogram.

{ns}-[device]-shadow-write

Histogram that tracks amount of time taken to write write-block to a shadow device, when using shadow devices. (Not available for pmem storage.) The size of a write block is the configured write-block-size). When running with commit-to-device set to true with server versions prior to 5.3, this counter will only account for fullblocks written and therefore will only count blocks written through the defragmentation process as client writes would write to disk individually ratherthan at a block level. But when running with commit-to-device set to true with server versions 5.3 and later, client writes will also be included in the histogram.

{ns}-[device]-large-block-read

Histogram that tracks the amount of time to read write-block from disk. (Not available for pmem storage.) Histogram that tracks the amount of time to read large blocks from disk. The size of a large block (or write block) is the configured write-block-size). Note that as of server version 5.3, tomb-raider device reads are included in this histogram.

{ns}-[device]-fsync

Histogram that tracks the time taken for fsync. (Available in server version 4.3.0.10 and below. Deprecated in server version 4.3.1.3.)

{ns}-device-read-size

Histogram that tracks the size of the objects being read.

{ns}-device-write-size

Histogram that tracks the size of the objects being written. This includes replica writes from other nodes (on namespaces with replication factor 2 or more). This does not include defragmentation writes.

Secondary index transaction analysis

Version 6.0.0 and later
Version 5.7 and earlier

This histogram is removed in server version 6.0.

Secondary index benchmarks are a set of benchmarks targeted to a particular index measuring various components of a query and index. Due to their performance impact, they are disabled by default. You can enable them by passing the sindex-histogram command to the asinfo utility.

To enable secondary index benchmarks for a specific index:

asinfo -h [host ip] -v "sindex-histogram:ns=NAMESPACE;indexname=INDEX;enable=true"

[SINDEX_NAME]_write_us

Histogram measuring secondary index write performance, unit is microseconds.

[SINDEX_NAME]_si_prep_us

Histogram of time spent gathering secondary index key values from records, unit is microseconds.

[SINDEX_NAME]_delete_us

Histogram measuring secondary index delete performance, unit is microseconds.

[SINDEX_NAME]_query

Histogram for entire query execution time, unit is milliseconds.

[SINDEX_NAME]_query_batch_lookup_us

Histogram of time spent performing secondary index lookup from primary index digest tree for the qualifying recods for the specific secondary index query, unit is microseconds.

[SINDEX_NAME]_query_batch_io_us

Histogram of time spent reading qualified records from storage for the secondary index query, unit is microseconds.

[SINDEX_NAME]_query_row_count

Histogram of rows selected by query.

[SINDEX_NAME]_query_diff_count

Histogram of false positives returned by sindex.
Example: If a sindex returned 100 records but the query read only 95records would indicate that there were 5 false positives which may be theresult of garbage collection.

Overview​

Writes to the log file​

Example histogram dump​

High level characteristics of histogram output​

Benchmarks​

Auto-enabled benchmarks​

Configuration-enabled benchmarks​

General transaction histogram​

Specific transaction benchmarks​

Proxy transaction analysis​

Fabric transaction analysis​

Ops subtransaction analysis​

Read transaction analysis​

Write transaction analysis​

UDF transaction analysis​

Sub-UDF transaction analysis​

Batch transaction analysis​

Storage transaction analysis​

Secondary index transaction analysis​

Overview

Writes to the log file

Example histogram dump

High level characteristics of histogram output

Benchmarks

Auto-enabled benchmarks

Configuration-enabled benchmarks

General transaction histogram

Specific transaction benchmarks

Proxy transaction analysis

Fabric transaction analysis

Ops subtransaction analysis

Read transaction analysis

Write transaction analysis

UDF transaction analysis

Sub-UDF transaction analysis

Batch transaction analysis

Storage transaction analysis

Secondary index transaction analysis