Histograms from Aerospike Logs
Overview
Histograms plot the distribution of latencies for various transaction types that are processed by an Aerospike cluster. You can use them for in-depth analysis of the activity on your system.
Writes to the log file
Periodically, the Aerospike server writes histograms to the log file. The frequency of writes to the log file
is controlled by the ticker-interval
configuration parameter.
Histograms begin with the text "histogram dump:" followed by:
- the histogram name
- the number of measurements represented by the histogram
- a maximum of four buckets each
- The form for each bucket is
(BUCKET_NUMBER: NUM_OBSERVED)
. The threshold for a given bucket is 2BUCKET_NUMBER units.
- The form for each bucket is
Example histogram dump
The following example shows a read histogram for the namespace test
where 36,453,689,581 measurements are reported,
36,414,385,962 of which were less than 1ms (20ms):
Jun 09 2020 21:40:06 GMT: INFO (info): (hist.c:240) histogram dump: {test}-read (36453689581 total) msec
Jun 09 2020 21:40:06 GMT: INFO (info): (hist.c:257) (00: 36414385962) (01: 0031233450) (02: 0007841324) (03: 0000215526)
Jun 09 2020 21:40:06 GMT: INFO (info): (hist.c:257) (04: 0000002198) (05: 0000000693) (06: 0000000590) (07: 0000000716)
Jun 09 2020 21:40:06 GMT: INFO (info): (hist.c:257) (08: 0000001405) (09: 0000002365) (10: 0000004646) (11: 0000000706)
Some histograms provide metrics which are not latency related, such as accessed object sizes on read/write transactions, and number of records returned by secondary index queries.
High level characteristics of histogram output
Histograms only appear in the logs if they contain data. For example, a batch-specific histogram on a server which does not do any batch read operations would not be present.
Read and write histograms are tracked on per-namespace level basis. If latencies are only observed on a specific namespace, it is easier to track down based on the specific data in the namespace and the application writing to it.
To analyze transactions in detail, turn on the relevant configuration-enabled benchmark histograms specific to the transaction type.
Turning on a couple of the non-default histograms usually does not impact the performance of a cluster. However, in some cases, turning on more than a few of those configurable histograms could have a noticeable impact on the performance of a cluster.
Analyzing a time period that bridges a server restart distorts results of time slices and average/maximum calculations.
Histogram data can also be retrieved through the
asadm
show latencies
command.By default, latency histograms are measured in milliseconds, and can be configured to microseconds. See
microsecond-histograms
for details.
To analyze histogram data, refer to the Log Latency Tool guide, or the
histogram
command of
the Aerospike Admin (asadm) tool when used under the log analyzer mode.
Notational conventions
- Required braces, like
{ns}
, wherens
is the name of a namespace, indicate an input variable you supply on theasinfo
command line. - Brackets, like
fabric-[channel]-send-init
, for the different fabric channels, indicate an output name of the particular histogram type.
Benchmarks
Benchmarks are active as either auto-enabled or configuration-enabled.
Auto-enabled benchmarks
Auto-enabled benchmarks are a set of histograms that measure end-to-end performance of Aerospike subsystems and appear only when the histogram has data to display. They give a high-level view of what is happening on a node at a given point in time. These benchmark stats are written to the Aerospike log file at 10 second intervals and can be used to spot things such as latency in a certain transaction type.
Batch read "parent" transaction. Tracks the time taken from the start of the transaction to getting a response. For server versions prior to 4.4, this histogram only included the final "chunk" when reading a large request out of the socket, and therefore could under-report the latency. If batch calls are showing high latencies on the client side, consider changing the maxConcurrentThreads,
for example, in the Java client library, set Batch Policy from the default 1
(sequential) to 0
(or a number greater than 0
). Configuring the maxConcurrentThreads
allows concurrent synchronous batch request threads to be sent to server nodes at any time.
Time taken for read requests from the time they are received at the node to when the response leaves the node.
Time taken for writes from end-to-end (includes the time taken for replica write). Does not include deletes. For server versions prior to 4.4, this histogram only included the final "chunk" when reading a large request out of the socket, and therefore could under-report the latency.
Time taken for UDF transactions.
Time taken for query transactions. This histogram is removed as of server version 6.0.
This displays the records returned per query. This histogram shows the distribution of counts, and not latencies. Note: the first bucket in histogram is always empty as queries not returning any records do not get counted in this histogram. This histogram is removed as of server version 6.0.
Time taken for primary index short query transactions. This histogram is introduced in server version 6.0.
This displays the records returned per primary index short query. This histogram shows the distribution of counts, and not latencies. Note: the first bucket in histogram is always empty as queries not returning any records do not get counted in this histogram. This histogram is introduced in server version 6.0.
Time taken for secondary index short query transactions. This histogram is introduced in server version 6.0.
This displays the records returned per secondary index short query. This histogram shows the distribution of counts, and not latencies. Note: the first bucket in histogram is always empty as queries not returning any records do not get counted in this histogram. This histogram is introduced in server version 6.0.
Latencies when re-replicating. Only available in strong consistency enabled namespaces.
Configuration-enabled benchmarks
Configuration-enabled benchmarks include specific histograms that you can enable or disable dynamically. These benchmarks are more granular than the auto-enabled benchmarks. If you contact Aerospike Support for assistance with a problem, they may request that you enable a few sets that are relevant to your issue. If you keep these benchmarks permanently enabled, they may impact performance.
Tools package 6.0.x or later is required to use asadm's manage config commands.
Otherwise, use the equivalent asinfo - set-config
command.
General transaction histogram
To enable a specific benchmark, use the following command:
asadm -e 'enable; manage config service param <configuration> to true'
Type of Histogram | Description | Configuration |
---|---|---|
info | Time taken for an info protocol transaction. | enable-hist-info |
Specific transaction benchmarks
The transaction benchmarks are monitored at the per-namespace level.
Proxy transaction analysis
Displays time for a proxy transaction from the sender view.
asadm -e 'enable; manage config namespace <namespaceName> param enable-hist-proxy to true'
Time taken for proxy transactions.
Fabric transaction analysis
Displays the time for a fabric transaction from sender's view.
asadm -e 'enable; manage config service param enable-benchmarks-fabric to true'
Histogram that tracks the time taken for preparing a message for sending over fabric (intra cluster communication). Does not include any network operations. Replace [channel] with one of bulk (migrations), ctrl (paxos and migration control), meta (smd) or rw (read / write transactions).
Histogram that tracks the time taken from end of send-init and stops after the network send. Can be hit multiple times if message was too big to fit in the send buffer.Replace [channel] with one of bulk (migrations), ctrl (paxos and migration control), meta (smd) or rw (read / write transactions).
Histogram that tracks the time taken from the beginning of the first byte of a message from network and ends after network recv. Can be hit multiple times if message was big enough to require multiple network recv calls.Replace [channel] with one of bulk (migrations), ctrl (paxos and migration control), meta (smd) or rw (read / write transactions).
Histogram that tracks the time taken to process the transaction on the receiving node (rw, bulk, control, meta).Replace [channel] with one of bulk (migrations), ctrl (paxos and migration control), meta (smd) or rw (read / write transactions).
Ops subtransaction analysis
Displays slices of internally generated ops subtransactions from a background scan or query.
asadm -e 'enable; manage config namespace <namespaceName> param enable-benchmarks-ops-sub to true'
Time taken from the beginning of a transaction to when a node partition is reserved. This includes the demarshal part (tracked for all transactions under svc-demarshal
for server versions prior to 4.8).
Time taken for a partition to be reserved to a partition being re-reserved (possible internal restart/reattempt, e.g. "key busy"). Refer to the write-restart
histogram for further details about transactions being reattempted.
Time taken from a partition reserved to duplicates resolved (only during ongoing data rebalancing). Note that timeouts are not included.
Time taken from partition reserved or after duplicate resolution to an actual master record applied.
Time taken from master record written to replica(s) written.
Time taken from master write (if write-commit-level configured to master) or replica write to client response completion.
Read transaction analysis
Displays slices of a single-record read transaction.
asadm -e 'enable; manage config namespace <namespaceName> param enable-benchmarks-read to true'
Time taken from the beginning of a transaction to when a node partition is reserved. High latencies here could indicate bottleneck at the network or CPU (and often irq balancing) as well as in the transaction queue. This includes the demarshal time (tracked for all transactions under svc-demarshal
for server versions prior to 4.8) as well as the time spent in a transaction queue (tracked for all transactions under svc-queue
for server versions prior to 4.8).Note as of server version 4.7, there are no transaction queues and thus only the demarshal time slice is relevant.
Time taken for a partition to be reserved to a partition being re-reserved (possible internal restart/reattempt, e.g. "key busy" on duplicate resolve).For this slice, in the case of a potential hotkey, the clock actually starts as soon as the transaction gets into the rw-hash, until it is being reattempted. For servers prior to 4.7, the reattempt scenario is this:a transaction first goes through the transaction queue, then gets picked up by a transaction thread, and if there is already an entry in the rw-hash for this record (for transactions required to go through the rw-hash), it will wait for the transaction being processed to complete, which will then put all the transactions waiting on the rw-hash back in the transaction queue in order for those transactions to be reattempted.As of server 4.7, the process is similar but there is no initial transaction queue/thread phase; also, once the transaction beingprocessed completes, all transactions waiting on the rw-hash are put into an internal transaction queue in order to be reattempted.
The same transaction can be re-queued multiple times (each will be counted as a separate transaction in this slice), but the clock will be reset everytime a transaction gets into the rw-hash. Therefore, a transaction being re-queued multiple times will not have its total accumulated time accounted for, but rather only the time it waited in the rw-hash (queued behind a transaction in progress) the last time prior to going through.
Time taken from a partition reserved to duplicates resolved (only during ongoing data rebalancing). Note that timeouts are not included.
Time taken from partition reserved or after duplicate resolution to an actual local record read. For data-in-memory
true
, this is the slice for reading from memory. For data-in-memory
false
, this is the slice for reading from the disk except when the record is still in the post-write-queue
or in cached (read-page-cache
enabled). To troubleshoot further, enable storage benchmarks.
Time taken from local record read to client response completion. High latencies here could indicate a bottleneck on the network or the client, getting slow to consume responses from the server.
Time taken for reading all the copies of a record when the linearize read policy is in effect. This only applies for strong consistency enabled namespaces.
Write transaction analysis
Displays slices of a single-record write transaction, excluding deletes.
asadm -e 'enable; manage config namespace <namespaceName> param enable-benchmarks-write to true'
Time taken from the beginning of a transaction to when a node partition is reserved.High latency here could indicate bottleneck at the network or CPU (and often irq balancing). When compression is enabled, higher latencies could also potentially be observed in this slice. This includes the demarshal time (tracked for all transactions under svc-demarshal
for server versions prior to 4.8) as well as the time spent in the transaction queue (tracked for all transactions under svc-queue
for server versions prior to 4.8). For server versions prior to 4.4, this histogram only included the final "chunk" when reading a large request out of the socket, and therefore could under-report the latency.Note as of server version 4.7, there are no transaction queues and thus only the demarshal time slice is relevant.
Time taken for a partition to be reserved to a partition being re-reserved (possible internal restart/reattempt, e.g. "key busy").For this slice, in the case of a potential hotkey, the clock actually starts as soon as the transaction gets into the rw-hash, until it is being reattempted.
For servers prior to 4.7, the reattempt scenario is this:a transaction first goes through the transaction queue, then gets picked up by a transaction thread, and if there is already an entry in the rw-hash for this record (for transactions required to go through the rw-hash), it will wait for the transaction being processed to complete, which will then put all the transactions waiting on the rw-hash back in the transaction queue in order for those transactions to be reattempted.As of server 4.7, the process is similar but there is no initial transaction queue/thread phase; also, once the transaction beingprocessed completes, all transactions waiting on the rw-hash are put into an internal transaction queue in order to be reattempted.
The same transaction can be re-queued multiple times (each will be counted as a separate transaction in this slice), but the clock will be reset everytime a transaction gets into the rw-hash. Therefore, a transaction being re-queued multiple times will not have its total accumulated time accounted for, but rather only the time it waited in the rw-hash (queued behind a transaction in progress) the last time prior to going through.This also includes restarts from duplicate resolution if any.
Time taken from a partition reserved to duplicates resolved (only during ongoing data rebalancing). Note that timeouts are not included.
Time between when the partition is reserved or after a duplicate resolution finishes, to when an actual master record write to the swb in-memory buffer takes place. Latencies displayed here typically cover the time it took to update secondary indexes when necessary, and/or the read portion of the record write (if a write is an update, the record is read before a write is performed). Here are the details of what would be counted in this slice:
- read before write
- because of update
- because of stored key (sendKey policy set to true)
- compression at rest
- sindex
- commit-to-device true (in SC)
- binop processing
- filter processing (if the filter requires reading a bin to evaluate)
Time taken from master record written to replica(s) written.
Time taken from master write (if write-commit-level configured to master) or replica write to client response completion. High latencies here could indicate a bottleneck on the network or the client, getting slow to consume responses from the server.
UDF transaction analysis
Displays slices of a single-record UDF transaction.
asadm -e 'enable; manage config namespace <namespaceName> param enable-benchmarks-udf to true'
Time taken from the beginning of a transaction to when a node partition is reserved. This includes the demarshal part (tracked for all transactions under svc-demarshal
for server versions prior to 4.8) as well as the time spent in the transaction queue (tracked for all transactions under svc-queue
for server versions prior to 4.8).Note as of server version 4.7, there are no transaction queues and thus only the demarshal time slice is relevant.
Time taken for a partition to be reserved to a partition being re-reserved (possible internal restart/reattempt, e.g. "key busy"). Refer to the write-restart
histogram for further details about transactions being reattempted.
Time taken from a partition reserved to duplicates resolved (only during ongoing data rebalancing). Note that timeouts are not included.
Time taken from partition reserved or after duplicate resolution to an actual master record applied.
Time taken from master record written to replica(s) written.
Time taken from master write (if write-commit-level configured to master) or replica write to client response completion.
Sub-UDF transaction analysis
Displays the slices of internally generated UDF transactions, usually by a background UDF, scan or a query.
asadm -e 'enable; manage config namespace <namespaceName> param enable-benchmarks-udf-sub to true'
Time taken from the beginning of a transaction to when a node partition is reserved. This includes the demarshal part (tracked for all transactions under svc-demarshal
for server versions prior to 4.8) as well as the time spent in the transaction queue (tracked for all transactions under svc-queue
for server versions prior to 4.8).Note as of server version 4.7, there are no transaction queues and thus only the demarshal time slice is relevant.
Time taken for a partition to be reserved to a partition being re-reserved (possible internal restart/reattempt, e.g. "key busy"). Refer to the write-restart
histogram for further details about transactions being reattempted.
Time taken from a partition reserved to duplicates resolved (only during ongoing data rebalancing). Note that timeouts are not included.
Time taken from partition reserved or after duplicate resolution to an actual master record applied.
Time taken from master write (if write-commit-level configured to master) or replica write to client response completion.
Batch transaction analysis
Displays the slices for a batch transaction.
asadm -e 'enable; manage config namespace <namespaceName> param enable-benchmarks-batch-sub to true'
Time taken from partition reserved or after duplicate resolution to an actual master record applied via UDF.
Time taken from master record written to replica(s) written.
Time taken from partition reserved or after duplicate resolution to an actual master record applied.
Available in server versions 4.8 and later. The demarshal time for a batch sub-transaction.
For server versions 4.8 and later, the time taken from a batch sub-transaction being queued until a node partition is reserved.For server versions prior to 4.8, thetime taken from the beginning of a transaction to when a node partition is reserved. This includes the demarshal part (tracked for all transactions under svc-demarshal
) as well as the time spent in the transaction queue (tracked for all transactions under svc-queue
). For server versions prior to 4.4, this histogram only included the final "chunk" when reading a large request out of the socket, and therefore could under-report the latency.Note as of server version 4.7, there are no transaction queues and thus only the demarshal time slice is relevant.
Note server version 4.8.0.1 introduced a problem where this histogram also contains data points that should instead be in thebatch-sub-restart
histogram. This is fixed as of version 4.8.0.27, and also fixed for other server lineages in versions 4.9.0.25,5.0.0.28, 5.1.0.26, 5.2.0.18, 5.3.0.9, and 5.4.0.4.
Time taken from a partition being reserved to a partition being re-reserved (possible internal restart/reattempt).Refer to the write-restart
histogram for further details about transactions being reattempted.
Note server version 4.8.0.1 introduced a problem where this histogram is never populated -- with its data points instead being placed in the batch-sub-start
histogram. This is fixed as of version 4.8.0.27, and also fixed for other server lineages in versions 4.9.0.25, 5.0.0.28, 5.1.0.26, 5.2.0.18, 5.3.0.9, and 5.4.0.4.
Time taken from a partition reserved to duplicates resolved (only during ongoing data rebalancing). Note that timeouts are not included.
Time taken from partition reserved or after duplicate resolution to an actual local record read.
Time taken from local record read to batch parent response completion.
Time taken for reading all the copies of a record when the linearize read policy is in effect. This only applies for strong consistency enabled namespaces.
Storage transaction analysis
Storage benchmarks are a set of device-targeted histograms that measure performance of the storage subsystem in time slices. They provide insight into performance issues related to storage devices and PMem storage files. Due to their performance impact, they are disabled by default. You can enable them by passing the set-config
command to the asinfo
utility.
To enable storage-benchmarks:
asadm -e 'enable; manage config namespace NAMESPACE storage-engine param enable-benchmarks-storage to true'
Histogram that tracks the time taken to read from disk. (Not available for pmem storage.)
Histogram that tracks amount of time taken to write write-block to disk. (Not available for pmem storage.) The size of a write block is the configured write-block-size
. When running with commit-to-device
set to true with server versions prior to 5.3, this counter will only account for full blocks written and therefore will only count blocks written through the defragmentation process as client writes would write to disk individually rather than at a block level. But when running with commit-to-device
set to true with server versions 5.3 and later, client writes will also be included in the histogram.
Histogram that tracks amount of time taken to write write-block to a shadow device, when usingshadow devices. (Not available for pmem storage.)The size of a write block is the configured write-block-size
). When running with commit-to-device
set to true with server versions prior to 5.3, this counter will only account for fullblocks written and therefore will only count blocks written through the defragmentation process as client writes would write to disk individually ratherthan at a block level. But when running with commit-to-device
set to true with server versions 5.3 and later, client writes will also be included in the histogram.
Histogram that tracks the amount of time to read write-block from disk. (Not available for pmem storage.) Histogram that tracks the amount of time to read large blocks from disk. The size of a large block (or write block) is the configured write-block-size
). Note that as of server version 5.3, tomb-raider device reads are included in this histogram.
Histogram that tracks the time taken for fsync. (Available in server version 4.3.0.10 and below. Deprecated in server version 4.3.1.3.)
Histogram that tracks the size of the objects being read.
Histogram that tracks the size of the objects being written. This includes replica writes from other nodes (on namespaces with replication factor 2 or more). This does not include defragmentation writes.
Secondary index transaction analysis
- Version 6.0.0 and later
- Version 5.7 and earlier
Secondary index benchmarks are a set of benchmarks targeted to a particular index measuring various components of a query and index. Due to their performance impact, they are disabled by default. You can enable them by passing the sindex-histogram
command to the asinfo utility.
To enable secondary index benchmarks for a specific index:
asinfo -h [host ip] -v "sindex-histogram:ns=NAMESPACE;indexname=INDEX;enable=true"
Histogram measuring secondary index write performance, unit is microseconds.
Histogram of time spent gathering secondary index key values from records, unit is microseconds.
Histogram measuring secondary index delete performance, unit is microseconds.
Histogram for entire query execution time, unit is milliseconds.
Histogram of time spent performing secondary index lookup from primary index digest tree for the qualifying recods for the specific secondary index query, unit is microseconds.
Histogram of time spent reading qualified records from storage for the secondary index query, unit is microseconds.
Histogram of rows selected by query.
Histogram of false positives returned by sindex.
Example: If a sindex returned 100 records but the query read only 95records would indicate that there were 5 false positives which may be theresult of garbage collection.