Skip to main content

Server Log Reference

Changing Log Levels

Understanding the messages recorded in the Aerospike log file is instrumental in figuring out how your instance is performing and being able to detect errors early on. This document outlines the structure of the log messages and provides details on log messages which provide key information about the instance.

Here are useful asinfo commands relevant to the server logs:

  • asinfo -v "logs" returns a list of log file locations in use by this server.
$ asinfo -v "logs"
0:/var/log/aerospike/aerospike.log
  • asinfo -v "log/" returns a list of logging contexts and their associated verbosity levels. See further detail in asinfo manual. For the full list of contexts, refer to the list at the bottom of this page (Contexts).

  • asadm -e "enable; manage config logging file <FILENAME> param <CONTEXT> to <VERBOSITY>" sets log level for the given context. See further detail in the asadm manual.

  • asinfo -v "log-set:id=<SINK_ID>;<CONTEXT>=<VERBOSITY>" sets log level for the given context. See further detail in asinfo manual.

$ asinfo -v "log-set:id=0;fabric=debug"
ok

Log Line Format

All Aerospike log lines take the following form:
<date-time>: <severity> (<context>): (<file>:<line>) <message>
  • date-time: Date and time the log message was recorded. Example: Sep 20 2013 02:49:46 GMT
  • severity: Severity level of the log message. For details, see Severity Levels.
  • context: Component in the server which logged the message. For details, see Contexts.
  • file:line: Location in the source code where the message was logged.
  • message: Message being logged.

Contexts

Contexts indicate which component a log message was emitted from. For the full list of contexts, run the following command:
$ asinfo -v "log/" -l
misc:INFO
alloc:INFO
arenax:INFO
hardware:INFO
msg:INFO
os:INFO
rbuffer:INFO
socket:INFO
tls:INFO
vault:INFO
vmapx:INFO
xmem:INFO
aggr:INFO
appeal:INFO
as:INFO
audit:INFO
batch:INFO
bin:INFO
config:INFO
clustering:INFO
drv_pmem:INFO
drv_ssd:INFO
exchange:INFO
exp:INFO
fabric:INFO
flat:INFO
geo:INFO
hb:INFO
health:INFO
hlc:INFO
index:INFO
info:INFO
info-port:INFO
job:INFO
migrate:INFO
mon:INFO
namespace:INFO
nsup:INFO
particle:INFO
partition:INFO
paxos:INFO
proto:INFO
proxy:INFO
proxy-divert:INFO
query:INFO
record:INFO
roster:INFO
rw:INFO
rw-client:INFO
scan:INFO
security:INFO
service:INFO
service-list:INFO
sindex:INFO
skew:INFO
smd:INFO
storage:INFO
truncate:INFO
tsvc:INFO
udf:INFO
xdr:INFO
xdr-client:INFO
note

The leading cf: in front of some of the context names was dropped as of version 3.13.

note

rw-client and proxy-divert contexts added in version 3.16.0.1.

note

audit context added in version 5.7.

Some of the commonly referred to contexts:

ContextDescription
asTypically Aerospike initialization information.
drv_ssdPersistent storage related information (not necessarily ssd).
infoSomewhat overloaded context that provides information about various systems.
nsupNamespace supervisor messages.

Severity Levels

Severity levels indicate the importance of a log message.

  • During normal operation, messages with mostly INFO levels should be logged. Setting the default severity level to INFO requires that you modify the configuration parameter context. (Prior to Aerospike Server version 4.9, the default severity level was INFO.)
  • If some problems occur, messages with WARNING levels are logged.
  • For errors severely impacting the operation of the database or in rare cases, problems that cause the server to shut down, messages of CRITICAL level are logged. By default, Aerospike logs messages at CRITICAL level.

The configuration can be modified to log lower level messages for debugging purposes. See Changing Log Levels.

SeverityDescription
CRITICALCritical error messages. Indicates a fatal error has occurred, usually resulting in Aerospike shutdown. Default and highest level.
WARNINGWarning messages.
INFOInformational messages.
DEBUGDebugging information.
DETAILVerbose debugging information.

Search Server Messages

73 removed server messages

could not allocate xxxxxxxxxx-byte arena stage xxx: No space left on device

Severity:

WARNING

Context:

arenax

Additional information

Indicates for the index-type flash configuration that the mount points have run out of space. It may be required to manually delete the arena files and fsck the disk partitions.

allowing x fill-migrations after y seconds delay

Severity:

INFO

Context:

as

Introduced:

4.3.1

Additional information

This message appears after a recluster event, if a value is set for the configuration parameter migrate-fill-delay and if the recluster event has caused fill-migrations to be scheduled.

waiting for storage: 1569063 objects, 1819777 scanned

Severity:

INFO

Context:

as

Additional information

Objects and scanned values will diverge due to various reasons such as scanning records that were previously expired or expired while the system was down.

Occurs: When cold starting a node that has data on persistent storage (such as SSD).

Parameters:

objects: Number of objects from storage device that will be retained

scanned: Number of objects that have been scanned on the storage device

finished clean shutdown - exiting

Severity:

INFO

Context:

as

Introduced:

3.0

Additional information

This message is the last one of a sequence of messages logged during Aerospike server shutdown. The message signifies that Aerospike was shutdown with "trusted" status which is a necessary condition for a subsequent fast restart of a namespace that is configured with storage-engine device. See this knowledge-base article on details regarding ASD shutdown process.

abandoned batch from 11.22.33.44 with 23 transactions after 30000 ms.

Severity:

WARNING

Context:

batch

Introduced:

4.1

Additional information

Occurs: When a batch-index transaction is abandoned due to one or more delays pushing its total time above the allowed threshold. This threshold is either twice the client total timeout or 30 second if the timeout is not set on the client. Each occurrence will also increment the batch_index_error statistic.

Parameters:

(IP Address): The client originating IP address for the transaction.

(Number of transactions): The number of batch sub transactions in the impacted batch index transaction.

(Abandoned time): The total time the batch index transaction has been running before being abandoned.

{NAMESPACE} bin-name quota full - can’t add new bin-name

Severity:

WARNING

Context:

bin

Additional information

Aerospike Server version 5.0: Max number of bins per namespace (65535) has been reached.
Aerospike Server version 4.9 and earlier: Max number of bins per namespace (32767) has been reached.

See KB article How to clear up set and bin names when it exceeds the maximum set limit.

Parameters:

NAMESPACE: The namespace that has reached the maximum number of bins.

evicted from cluster by principal node BB9030011AC4202

Severity:

WARNING

Context:

clustering

Additional information

The paxos principal node has determined that this node is not a valid cluster member. See the log on the principal for more information.

Parameters:

principal: Node ID of the paxos principal node

ignoring paxos accepted from node BB9030011AC4202 - it is not in acceptor list

Severity:

WARNING

Context:

clustering

Additional information

A paxos clustering message from another node was ignored. This can result from network stability issues.

Parameters:

node: Node sending the invalid paxos message

ignoring paxos accepted from node BB9030011AC4202 with invalid proposal id

Severity:

WARNING

Context:

clustering

Additional information

A paxos clustering message from another node was ignored. This can result from network stability issues.

Parameters:

node: Node sending the invalid paxos message

ignoring stale join request from node BB9030011AC4202 - delay estimate 83108(ms)

Severity:

INFO

Context:

clustering

Additional information

A paxos clustering message from another node was ignored. This can result from network stability issues resulting in delayed delivery of packets.

Parameters:

node: Node sending the invalid paxos message

delay estimate: Estimated delay of the message in millisecond.

invalid feature key signature

Severity:

CRITICAL

Context:

config

Additional information

The key file has been modified and no longer matches the digital signature. The key file must be EXACTLY as it was downloaded: even apparently innocuous operations like cutting and pasting the contents can change the file so that it no longer matches.

CRITICAL (config): (features_ee.c:184) trailing garbage in /etc/aerospike/features.conf, line 21

Severity:

CRITICAL

Context:

config

Additional information

The feature-key-file has been tampered with. Replace the file with the original provided by Aerospike. For more details refer to this Knowledge Base article.

failed CONFIG_CHECK check - MESSAGE

Severity:

WARNING

Context:

config

Introduced:

5.7

Additional information

Indicates that a configuration best-practice was violated at startup.

Parameters:

CONFIG_CHECK: Name of the check which was violated, could be one of the following:

sMessage: Description of how the best-practice was violated.

failed best-practices checks - see 'https://docs.aerospike.com/operations/install/linux/bestpractices'

Severity:

WARNING

Context:

config

Introduced:

5.7

Additional information

Indicates that there are failed best-practices, this message follow a set of warning for each best-practice that was violated. Will become a CRITICAL when enforce-best-practices is set to true.

failed best-practices checks - see 'https://docs.aerospike.com/operations/install/linux/bestpractices'

Severity:

CRITICAL

Context:

config

Introduced:

5.7

Additional information

Indicates that there are failed best-practices, this message follow a set of warning for each best-practice that was violated. Will become a WARNING when enforce-best-practices is set to false.

proto input from 10.10.10.10:58273: msg greater than 134217728, likely request from non-Aerospike client, rejecting: sz 25776186477

Severity:

WARNING

Context:

demarshal

Removed:

4.4.0.1

Additional information

Request with a record size larger then the configured block size, or could be random packets hitting Aerospike’s service port (for example something scanning all ports) or even potentially malformed messages coming from a client. For oversized but otherwise valid requests, try increasing write-block-size.

Parameters:

from: IP address of the sender of the bad request.

blocksize: Size of the write-block in bytes.

sz: Size of the bad request in bytes.

{namespace_name} out of space

Severity:

WARNING

Context:

drv_pmem

Introduced:

5.2

Additional information

Indicates a shortage of free storage blocks. Refer to the How to Recover Contiguous Free Blocks article.
Note: To save log space, this message will be logged with a (repeated: nnn) prefix just once per ticker-interval during periods of repetition.

Parameters:

namespace: Namespace being written to

{namespace_name} write: size 9437246 - rejecting 1142f0217ababf9fda5b1a4de66e6e8d4e51765e

Severity:

DETAIL

Context:

drv_pmem

Introduced:

5.2

Additional information

Most likely appearing as a result of exceeding the write-block-size. For more information, see the KB on write-block-size. The record's digest is the last item in the log entry.

Parameters:

namespace: Namespace being written to

size: Total size of the record that was rejected

/dev/nvme0n1p1 init wblocks: pristine-id 155209 pristine 241520 free-q 13331, defrag-q 1

Severity:

INFO

Context:

drv_ssd

Additional information

At startup, the status of unwritten blocks, free blocks and blocks on the defrag queue. The sum of free-q and pristine blocks indicates the total space available for writing. This is discussed in detail in this Knowledge Base article.

Parameters:

pristine-id: The ID of the first unwritten (pristine) block on the disk.

pristine: The number of completely unwritten (pristine) blocks on the disk.

free-q: The number of blocks that have been through the defragmentation process and are available to be re-written.

defrag-q: The number of blocks that are awaiting defragmentation on the defrag queue.

namespace NS waiting for defrag: 5 pct available, waiting for 10 ...

Severity:

INFO

Context:

drv_ssd

Additional information

The node is stuck in a defrag loop at startup where it is not able to defragment enough to get device_free_pct below defrag-startup-minimum. Please see the linked documentation page for instructions on how to proceed.

Parameters:

namespace: The namespace impacted.

avail pct: Current available percent.

required pct: Target available percent to reach in order to start up.

{namespace_name} /dev/sda: used-bytes 296160983424 free-wblocks 885103 write-q 0 write (12659541,43.3) defrag-q 0 defrag-read (11936852,39.1) defrag-write (3586533,10.2) shadow-write-q 0 tomb-raider-read (13758,598.0)

Severity:

INFO

Context:

drv_ssd

Introduced:

3.10

Additional information

Parameters:

{namespace}: Name of the namespace the device and stats belongs to.

/dev/sda: Name of the device for which the following stats apply.

used-bytes: Number of bytes on this device that are in use. Corresponds to the storage-engine.device[ix].used_bytes statistic.

free-wblocks: The number of wblocks that are free (the device_available_pct). Corresponds to the storage-engine.device[ix].free_wblocks statistic.

write-q: Number of write buffers pending to be written to the SSD. When this reaches the max-write-cache configured value (default 64M), 'device overload' errors will be returned and queue too deep warnings will be printed on the server log. Corresponds to the storage-engine.device[ix].write_q statistic.

write: Total number of SSD write buffers written to this device since the Aerospike Server started, (including defragmentation), and the number of write buffers written per second. Corresponds to the storage-engine.device[ix].writes statistic.

defrag-q: Number of wblocks pending defragmentation. Those are blocks that have fallen below the defrag-lwm-pct, waiting to be read and have their relevant content recombined in a fresh swb. The defrag-sleep setting controls the sleep period in between each block being read (default 1ms). Corresponds to the storage-engine.device[ix].defrag_q statistic.

defrag-read: Total number of write blocks that have been sent to the defragmentation queue (defrag-q) and will be processed (read) by the defragmentation thread on this device, and the normalization to the average number of wblocks processed per second during the interval at which this message is logged. Usually the defrag-q will be at 0 and w-blocks will be read as they are put on the defrag-q. In such cases, the defrag-read number represents the number of w-blocks read by the defragmentation thread. Corresponds to the storage-engine.device[ix].defrag_reads statistic.

defrag-write: Total number of write blocks written by defragmentation on this device since the Aerospike Server started, and the number of wblocks written per second (subset of write). Corresponds to the storage-engine.device[ix].defrag_writes statistic.

shadow-write-q: Number of write buffers pending to be written to the shadow device (only printed when a shadow device is configured). When this reaches the max-write-cache configured value (default 64M), 'device overload' errors will be returned and queue too deep warnings will be printed on the server log. Corresponds to the storage-engine.device[ix].shadow_write_q statistic.

tomb-raider-read: Total number of blocks read by the tomb-raider in the current cycle, and the current number of wblocks read per second. Only printed when the tomb-raider is active.

{namespace_name} device /dev/sda prior shutdown not clean

Severity:

INFO

Context:

drv_ssd

Additional information

Indicates the previous shutdown was not trusted. The node will have to perform a cold start.

{namespace_name} read_ssd: invalid rblock_id :0x0c0008d663318a674a7bd379f6efd3bb1f55141d

Severity:

WARNING

Context:

drv_ssd

Additional information

A record that has an invalid read-block is being read. This can happen if a node runs out of memory and swb cannot be allocated. It could also happen on some earlier versions if a node runs out of device_available_pct (before checking upfront for the available free blocks). A cold start of the node should resolve the issue.

Parameters:

NAMESPACE: Namespace the record with invalid read-block resides in.

rblock_id: Digest of the record with invalid read-block.

metadata mismatch - removing <DIGEST_ID>

Severity:

WARNING

Context:

drv_ssd

Introduced:

4.9.0.19

Additional information

This message is due to the change in a primary index bit and was introduced as a fix to AER-6335. Issue would occur for XDR enabled namespaces upgrading from a 4.9 version prior to 4.9.0.19. The workaround is to proceed with a rolling cold-restart.

{namespace_name} durable delete fail: queue too deep: exceeds max 544

Severity:

WARNING

Context:

drv_ssd

Introduced:

5.7

Additional information

This warning message indicates that although the disks themselves are not necessarily faulty or nearing end of life, they are not keeping up with the load placed upon them. See Why do I see warning - queue too deep for more information.

{namespace_name} immigrate fail: queue too deep: exceeds max 576

Severity:

WARNING

Context:

drv_ssd

Introduced:

5.7

Additional information

This warning message indicates that although the disks themselves are not necessarily faulty or nearing end of life, they are not keeping up with the load placed upon them. See Why do I see warning - queue too deep for more information.

{namespace_name} udf fail: queue too deep: exceeds max 512

Severity:

WARNING

Context:

drv_ssd

Introduced:

5.7

Additional information

All UDF writes fail by design.

{namespace_name} write fail: queue too deep: exceeds max 512

Severity:

WARNING

Context:

drv_ssd

Introduced:

3.0

Additional information

This warning message indicates that although the disks themselves are not necessarily faulty or nearing end of life, they are not keeping up with the load placed upon them. See Why do I see warning - queue too deep for more information.

/dev/nvme0n1p1: bad device-id 3192497567

Severity:

CRITICAL

Context:

drv_ssd

Introduced:

4.0

Additional information

Appears in cases of device corruption or if a device was not erased before starting Aerospike service. See documentation on SSD Initialization and SSD Setup for more details and recommended steps for initialization of SSD's. In case of a corruption seen on system logs, device might need to be replaced.

/dev/nvme0n1p1: read failed: errno 5 (Input/output error)

Severity:

CRITICAL

Context:

drv_ssd

Additional information

Indicates an error during a system call to the storage device. Depending on the transaction path in which this occurs, the database could abort if the integrity of the underlying data is not known (typically on write transactions). In case of a corruption seen on system logs, firmware version should be checked or device might need to be replaced.

/dev/sdd defrag start

Severity:

INFO

Context:

drv_ssd

Removed:

3.3.17

Additional information

Occurs: This marks the beginning of a new cycle of the defragmentation subsystem on the SSD device.

/dev/sdd curr_pos 430536 wblocks:1785 recs:42341 waits:0(0) lock-time:19 ms total-time:2154 ms.

Severity:

INFO

Context:

drv_ssd

Removed:

3.3.17

Additional information

wblocks consistently reaching defrag-max-blocks is an indicator that the defrag configuration and/or the system is unable to keep up with defrag. This typically leads to breaching stop-writes-pct which causes new writes to fail. If the total-time is less than the defrag-period then first try decreasing the defrag-period. If defrag still isn't able to catch up and the total-time > defrag-period then your system hardware cannot handle the load. A few considerations:

  • Consider increasing the number of disks per node. Each disk has its own defrag thread, having more disks means more blocks can be defragged in parallel.
  • Consider increasing the number of nodes in the cluster. We distribute data evenly across the cluster, adding more nodes (scaling out) relaxes the need to buy expensive hardware (scaling up).

Occurs: When the defrag cycle completes

Parameters:

curr_pos: Last block read by defrag algorithm

wblocks: Number of blocks that will be recovered during this cycle. Limited by defrag-max-blocks setting

recs: Number of records moved from defragged blocks

waits: During defrag if we find that the queue to write to the device is longer than defrag-queue-hwm then we will sleep for 1 msec intervals till the queue falls below defrag-queue-lwm

lock-time: How long it took to acquire a lock for each defraggable wblock

total-time: How long this defrag run took

can't add record to index

Severity:

CRITICAL

Context:

drv_ssd

Additional information

For the index-type flash configuration this indicates that the mount points have run out of space. It may be required to manually delete the arena files and fsck the disk partitions.

defrag_move_record: couldn’t get swb

Severity:

WARNING

Context:

drv_ssd

Removed:

5.7, 5.6.0.13, 5.4.0.21, 5.3.0.26, 5.2.0.36, 5.1.0.42

Additional information

Indicates the node has a shortage of free storage blocks. Refer to the How to Recover Contiguous Free Blocks article.

{namespace_name} defrag: drive Drive1 totally full - waiting for vacated wblocks to be freed

Severity:

WARNING

Context:

drv_ssd

Introduced:

5.7, 5.6.0.13, 5.4.0.21, 5.3.0.26, 5.2.0.36, 5.1.0.42

Additional information

Indicates the node has no free storage blocks. In this situtation the defragmentation process will wait until a free block is available. See How to Recover Contiguous Free Blocks for more information.
Note: To save log space, this message will be logged with a (repeated: nnn) prefix just once per ticker-interval during periods of repetition.

Parameters:

{namespace_name}: Affected namespace name.

drive: Affected storage device name.

device /dev/sdb: read_complete: added 0 expired 0

Severity:

INFO

Context:

drv_ssd

Introduced:

4.5.1.5

Additional information

Parameters:

device: Name of the device for which the following stats apply. The stats are a summary of a cold start which read the entire device.

added: Total number of unique records loaded from this device.

expired: Number of records skipped because they were expired.

device /dev/sdb: read complete: UNIQUE 20401681 (REPLACED 5619021) (OLDER 11905062) (EXPIRED 0) (EVICTED 0) records

Severity:

INFO

Context:

drv_ssd

Introduced:

4.5.1.5

Additional information

Parameters:

device: Name of the device for which the following stats apply. The stats are a summary of a cold start which read the entire device.

UNIQUE: Total number of unique records loaded from this device.

REPLACED: Number of records that replaced a version loaded earlier during the device scan (won the conflict resolution).

OLDER: Number of records that were skipped because newer version was loaded earlier during the device scan (lost the conflict resolution).

EXPIRED: Number of records skipped because they were expired.

EVICTED: Number of records skipped because they were evicted.

device /dev/sdb: read complete: UNIQUE 20401681 (REPLACED 5619021) (OLDER 11905062) (EXPIRED 0) (MAX-TTL 0) records

Severity:

INFO

Context:

drv_ssd

Removed:

4.5.1.5

Additional information

Parameters:

device: Name of the device for which the following stats apply. The stats are a summary of a cold start which read the entire device.

UNIQUE: Total number of unique records loaded from this device.

REPLACED: Number of records that replaced a version loaded earlier during the device scan (won the conflict resolution).

OLDER: Number of records that were skipped because newer version was loaded earlier during the device scan (lost the conflict resolution).

EXPIRED: Number of records skipped because they were expired.

MAX-TTL: Number of records that got their ttl truncated down if above the max-ttl specified.

device /dev/sdd: free 50071M contig 29486M w-q 0 w-free 224963 swb-free 10 w-tot 40149228

Severity:

INFO

Context:

drv_ssd

Removed:

3.3.17

Additional information

Parameters:

free: Amount of free space on disk

contig: amount of contiguous free space on disk

w-q: Objects pending to be written to the SSD

w-free: Number for free write blocks on disk

swb-free: Number of free SSD write buffers. This is only an indication that at some point extra swb (streaming write buffers) were allocated (sign of a back log) and then subsequently released to the swb pool, which will eventually reduce those down if those stay unused

w-tot: Total number of SSD write buffers persisted to device.

device /dev/sdc: used 296160983424, contig-free 110637M (885103 wblocks), swb-free 16, w-q 0 w-tot 12659541 (43.3/s), defrag-q 0 defrag-tot 11936852 (39.1/s) defrag-w-tot 3586533 (10.2/s)

Severity:

INFO

Context:

drv_ssd

Introduced:

3.6.1

Removed:

3.10

Additional information

Parameters:

device: Name of the device for which the following stats apply.

used: Number of bytes of this device in use.

contig-free: Amount of space occupied by free wblocks, and the number of wblocks free in parenthesis.

swb-free: Number of free SSD write buffers. This is only an indication that at some point extra swb (streaming write buffers) were allocated (sign of a back log) and then subsequently released to the swb pool, which will eventually reduce those down if those stay unused.

w-q: Number of write buffers pending to be written to the SSD. When this reaches the max-write-cache configured value (default 64M), 'device overload' errors will be returned and queue too deep warnings will be printed on the server log.

w-tot: Total number of SSD write buffers ever written to this device (including defragmentation), and the number of write buffers written per second in parenthesis.

defrag-q: Number of wblocks pending defrag. Those are blocks that have fallen below the defrag-lwm-pct, waiting to be read and have their relevant content recombined in a fresh swb. The defrag-sleep setting controls the sleep period in between each block being read (default 1ms).

defrag-tot: Total number of write blocks ever processed (read) by defragmentation on this device, and the number of wblocks processed per second in parenthesis.

defrag-w-tot: Total number of write blocks ever written by defragmentation on this device, and the number of wblocks written per second in parenthesis (subset of w-tot).

device /dev/sdb - swb buf valloc failed

Severity:

WARNING

Context:

drv_ssd

Additional information

Indicates a shortage of memory. Make sure the nodes have enough memory.

Parameters:

device: Device aerospike was trying to read or write to at the time of the error.

device /dev/sdd defrag: rblock_id 952163461 generation mismatch (4:3) :0xc0b12ffb353e0385179f39c85edc7791264f11aa

Severity:

WARNING

Context:

drv_ssd

Additional information

This is a case where the generation value for the index has been advanced while the generation value for the record on the drive has not advanced (this happened in some very old Aerospike versions). The error is notifying that defrag found this discrepancy. This warning should appear only once as the defrag process will resolve the discrepancy.

Parameters:

device: Disk where this occurred.

rblock_id: Identifier of the disk block being defragged.

mismatch: Generation of the record in the index and on disk, respectively.

digest: Digest of the record in question.

device has AP partition versions but 'strong-consistency' is configured

Severity:

CRITICAL

Context:

drv_ssd

Introduced:

4.0.0

Additional information

A namespace has data that was written while it was in AP mode, but is being started in SC mode. This is not permitted. Refer to the strong-consistency configuration option for details.

encryption key or algorithm mismatch

Severity:

WARNING

Context:

drv_ssd

Additional information

Happens at startup. Signifies that the previously used encryption key file to encrypt the data on the storage device doesn't match the one currently provided.

error: block extends over read size: foff 5242880 boff 1047552 blen 1392

Severity:

WARNING

Context:

drv_ssd

Additional information

The device likely has a bad sector. If this issue occurs frequently, replace the device.

Parameters:

foff: Offset of file containing malformed block.

boff: Offset of malformed block.

blen: Length of malformed block.

get_key: failed as_storage_record_read_ssd()

Severity:

WARNING

Context:

drv_ssd

Additional information

Symptom of having run out of storage space. Resolved by a cold start.

get_key: failed ssd_read_record()

Severity:

WARNING

Context:

drv_ssd

Additional information

Aerospike was not able to read the record from storage. This may indicate a hardware failure; please see this FAQ for more information.

load_n_bins: failed ssd_read_record()

Severity:

WARNING

Context:

drv_ssd

Removed:

5.1.0

Additional information

Aerospike was not able to read the record from storage. This may indicate a hardware failure; please see this FAQ for more information.

load_bins: failed ssd_read_record()

Severity:

WARNING

Context:

drv_ssd

Additional information

Aerospike was not able to read the record from storage. This may indicate a hardware failure; please see this FAQ for more information.

read_all: failed as_storage_record_read_ssd()

Severity:

INFO

Context:

drv_ssd

Additional information

Symptom of having run out of storage space. Resolved by a cold start.

read: bad block magic offset 303403269632

Severity:

WARNING

Context:

drv_ssd

Additional information

The SSD is corrupted so the expected value of a block does not match the actual value. There are potential 3 root causes:

  • Names of raw devices swapped on server reboot (e.g. storage pointed to by /dev/sda is now pointed to by /dev/sdb).
  • storage-engine stanza has changed to reorder devices.
  • Hardware failure and actual data corruption. Refer to Configuring Devices with WWID for the first point and use SMART or ACT to check the hardware for the last point.

Parameters:

offset: Location in the device where the mismatch was noticed.

read failed: expected 512 got -1: fd 9900 data 0x7f277981b000 errno 5

Severity:

WARNING

Context:

drv_ssd

Additional information

Aerospike tried to read 512 bytes from the device, the read() call returned -1 (error) with errno 5 which is EIO (I/O error). Potential hardware issue.

Parameters:

expected: Size of block requested from disk.

fd: File descriptor used for read.

data: Offset of block.

ssd_read: record b98946e3d616790 has no block associated, fail

Severity:

WARNING

Context:

drv_ssd

Additional information

This is a result of having run out of space. The record was written in the index but not flushed to disk, so we have this inconsistency. A cold start will resolve this.

Parameters:

record: The record's hashed key.

write bins: couldn’t get swb

Severity:

WARNING

Context:

drv_ssd

Removed:

5.2

Additional information

Indicates a shortage of free storage blocks. Refer to the How to Recover Contiguous Free Blocks article.

{namespace} out of space

Severity:

WARNING

Context:

drv_ssd

Introduced:

5.2

Additional information

Indicates a shortage of free storage blocks. Refer to the How to Recover Contiguous Free Blocks article.
Note: To save log space, this message will be logged with a (repeated: nnn) prefix just once per ticker-interval during periods of repetition.

Parameters:

{ns}: Namespace being written to.

{namespace_name} write: size 9437246 - rejecting 1142f0217ababf9fda5b1a4de66e6e8d4e51765e

Severity:

DETAIL

Context:

drv_ssd

Introduced:

5.2

Additional information

Most likely appearing as a result of exceeding the write-block-size. For more information, see the KB on write-block-size. The record's digest is the last item in the log entry.

Parameters:

namespace: Namespace being written to.

size: Total size of the record that was rejected.

write: size 9437246 - rejecting &lt;Digest>:0xd751c6d7eea87c82b3d6332467e8bc9a3c630e13

Severity:

DETAIL

Context:

drv_ssd

Introduced:

3.16

Removed:

5.2

Additional information

Appears with the WARNING message about failed as_storage_record_write() for exceeding the write-block-size.

Parameters:

size: Total size of the record that was rejected

<Digest>: Digest of the record that was rejected

error sending exchange data

Severity:

WARNING

Context:

exchange

Additional information

Failure exchanging partition maps with another node, due to the node not being able to receive messages (down, or disconnected from the network).

blocking client transactions in orphan state!

Severity:

WARNING

Context:

exchange

Additional information

The node is not currently part of any cluster, so it is not allowing clients to access the partitions it holds.

received duplicate exchange data from node 783f4ac2fbb57e81

Severity:

INFO

Context:

exchange

Additional information

Another node has resent exchange data because it did not receive an acknowledgment from this node within half the heartbeat interval. This is most likely due to a networking issue of some kind.

Parameters:

node: NodeID of the origin of the unacknowledged exchange data

received duplicate ready to commit message from node 783f4ac2fbb57e81

Severity:

INFO

Context:

exchange

Additional information

A node has resent the ready to commit message because it did not receive an acknowledgment from this node within half the heartbeat interval. This is most likely due to a networking issue of some kind between nodes in the cluster. Those ready to commit messages are sent by each node to the principal node when the exchange data has been completed on the node. The principal node also follows this and will send itself such ready to commit message. On a given node, the exchange data is done when the node has successfully sent its partition map to all the nodes in the cluster as well as received the partition map from each node in the cluster. This requires each node to ack back that it has received the exchange data. Only then would a node tell the principal, that it is ready to commit. While the principal is waiting to receive the ready to commit from some nodes, other nodes would keep sending their ready to commit as they are ready and waiting. Therefore, the nodes for which such message is seen are the nodes that are ready and the other nodes would be the ones having potential issues completed their exchange data.

Parameters:

node: Node ID of the node that is continuing to send the ready to commit message as it is waiting for the principal to acknowledge.

predexp deprecated - use new expressions API

Severity:

WARNING

Context:

exp

Introduced:

5.6

Additional information

This indicates the use of the deprecated Predicate Expressions API, which was replaced in version 5.2 by the Aerospike Expressions API. The deprecated API is removed in server 6.0.
This warning is logged no more than once per log ticker cycle.

msg_read: could not deliver message type 1

Severity:

INFO

Context:

fabric

Additional information

Heartbeat message handler is not registered on the node. Message should disappear soon after the node joins the cluster. Otherwise, restart the asd service on the node.

Parameters:

message type: Internal code.

error creating fabric published endpoint list

Severity:

CRITICAL

Context:

fabric

Additional information

This issue can occur if a failure in network interface initialization. Please check system logs for time of issue and verify the fabric network interface got initialized properly.

no IPv4 addresses configured for fabric

Severity:

WARNING

Context:

fabric

Additional information

This issue can occur if a failure in network interface initialization. Please check system logs for time of issue and verify the fabric network interface got initialized properly.

r_msg_sz > sizeof(fc->r_membuf) 1048582

Severity:

WARNING

Context:

fabric

Additional information

This is from the run_fabric_accept thread (rather than the run_fabric_recv thread). This thread is responsible for accepting new connections. These messages are only expected to store the NodeID and the ChannelID which would always be just a few bytes and therefore never expected to be over 1MiB. This WARNING indicates likely non Aerospike traffic against the fabric port. Make sure the fabric port (default 3002) is not exposed. parameters:

  • name: "r_msg_sz" description: | Internal code.

record too small 0

Severity:

WARNING

Context:

flat

Additional information

Inbound record for migration is corrupt. Appears with the WARNING message about handle insert: got bad record and is documented in the following KB Article

failed to submit command to /dev/nvme0: x109

Severity:

WARNING

Context:

hardware

Introduced:

4.3

Additional information

This warning message is benign. It indicates that the underlying NVMe device does not support health information check. To suppress this message, set log level to critical for the hardware context.

heartbeat TLS server handshake with 10.11.12.13:3012 failed

Severity:

WARNING

Context:

hb

Additional information

The other node could be reached, but there was an error in setting up the TLS connection. Check the other log messages near this one for details.

Parameters:

address:port: Address of the peer where the handshake failed.

heartbeat TLS client handshake failed - 10.219.136.101 {10.219.136.101:3012}

Severity:

WARNING

Context:

hb

Additional information

The other node could be reached, but there was an error in setting up the TLS connection. Check the other log messages near this one for details.

Parameters:

address:ports: Address(es) of the peer where the handshake failed.

Timeout while connecting

Severity:

WARNING

Context:

hb

Additional information

A peer node could not be reached on the heartbeat port, for whatever reason (down, network issues, etc).

mesh size recv failed fd 361: Connection timed out

Severity:

WARNING

Context:

hb

Additional information

An incomplete heartbeat message was received because the socket had some problem.

Parameters:

fd: ID number of the file descriptor the message was received on.

error message: Error message returned from the OS.

sending mesh message to BB9030011AC4202 on fd 361 failed : No route to host

Severity:

WARNING

Context:

hb

Additional information

Tried to send a heartbeat message to a peer, but the socket had some problem.

Parameters:

node: Node that should have received the message.

fd: ID number of the file descriptor the message was sent on.

error message: Error message returned from the OS.

sending mesh message to BB9030011AC4202 on fd 361 failed : Broken pipe

Severity:

WARNING

Context:

hb

Additional information

Tried to send a heartbeat message to a peer, the connection broke. May be 0 rather than a node ID if the node has not been able to retrieve the remote node ID.

Parameters:

node: Node that should have received the message.

fd: ID number of the file descriptor the message was sent on.

error message: Error message returned from the OS.

unable to parse heartbeat message on fd 361

Severity:

WARNING

Context:

hb

Additional information

Received a malformed message on the heartbeat port, possibly due to a non-Aerospike process unintentionally or maliciously trying to contact the port.

Parameters:

fd: ID number of the file descriptor the message was received on.

ignoring message from BB9030011AC4202 with different cluster name(dev_cluster)

Severity:

WARNING

Context:

hb

Additional information

A node that is not part of this cluster is trying to send heartbeats. Most likely that node has this node wrongly listed as a seed node in the network/heartbeat stanza of its aerospike.conf.

Parameters:

node ID: Node ID of the interloper node.

cluster name: The cluster name that node is presenting, which should be enough to let you track it down.

ignoring delayed heartbeat - expected timestamp less than 1483213248012 but was 1483213252345 from node: BB9020011AC4202

Severity:

WARNING

Context:

hb

Additional information

A heartbeat message arrived that wasn't generated during the last heartbeat interval. May indicate clock skew across the cluster.

Parameters:

expected ts: Latest timestamp expected (in milliseconds since the Aerospike Epoch of 2010-01-01 00:00:00)

actual ts: Timestamp actually in the message

node: Source of the message.

Found a socket 0x7f0979812460 without an associated channel.

Severity:

WARNING

Context:

hb

Additional information

This warning occurs in those rare cases where a network failure causes two error events on the same socket. No action is required. The warning is harmless in general.

Parameters:

socket: HEX number identifying the socket within the OS.

updating mesh endpoint address from {10.219.136.101:3002} to {10.219.136.101:3002,10.219.148.101:3002}

Severity:

INFO

Context:

hb

Additional information

This info message indicates that the local node connected to one IP but it got 2 in return. This is common in situations where nodes have multiple NIC’s and an address is not specified in the network/heartbeat context.

Parameters:

original address: Address connected to initially.

new addresses: List of addresses advertised by the node connected to.

error allocating space for [multicast/mesh] recv buffer of size 1195725862 on fd 773

Severity:

WARNING

Context:

hb

Additional information

It is likely that non-Aerospike network traffic sending some random message to the heartbeat port on this machine where some bits are misinterpreted in the Aerospike protocol as a large buffer size. Such requests can lead to denial of service, or disruption of cluster traffic if they are frequent. It is recommended that you block access to the heartbeat port from outside the cluster. Refer to the following knowledge base article: How to Secure Aerospike Database Servers.

Parameters:

size: Size of the message received (or the interpretation of a size).

fd: File descriptor of the socket the non-Aerospike message was received on.

could not create heartbeat connection to node - 10.219.136.101 {10.219.136.101:3012}

Severity:

WARNING

Context:

hb

Additional information

This can indicate that a node of the cluster is down, or that the IPs of the cluster have changed (eg, a node has been restarted). If the indicated node is expected to be part of the cluster, troubleshoot as normal, but if not, follow the tip-clear and services-alumni-reset steps from the node removal instructions to clear the errors.

Occurs: When a node expected to be part of the cluster cannot be reached on the heartbeat port

Parameters:

IP:port: IP and heartbeat port of the host that could not be reached

closing mesh heartbeat sockets

Severity:

INFO

Context:

hb

Introduced:

5.3

Additional information

This message appears early in the normal shutdown sequence for mesh network nodes. It appears before any namespace-specific shutdown messages. It indicates that the node's heartbeat is being stopped, so that the node will be promptly removed from the cluster (in case the shutdown process is lengthy and the node was not quiesced).

closing multicast heartbeat sockets

Severity:

INFO

Context:

hb

Introduced:

5.3

Additional information

This message appears early in the normal shutdown sequence for multicast network nodes. It appears before any namespace-specific shutdown messages. It indicates that the node's heartbeat is being stopped, so that the node will be promptly removed from the cluster (in case the shutdown process is lengthy and the node was not quiesced).

HLC jumped by 129787milliseconds with message from bb9a9008a0a0142. Current physical clock:1524726956965 Current HLC:1524727086665 Incoming HLC:1524727086752 Tolerable skew:1000 ms

Severity:

WARNING

Context:

hlc

Removed:

3.16

Additional information

This an indication of mismatched hardware clock between nodes, probably due to NTP misconfiguration. This happens frequently on virtual environments (GCE, AWS). It could be due to live migration, for example. (The HLC warning on node X does not mean that node X has a clock skew. The warning lists the node that caused a jump because its clock was ahead (129s for the above example). These warnings will keep getting printed until the clocks get in sync.) To resolve, set up NTP correctly. The best practice is to configure 4 NTP servers so that any one that emits bad ticks can be eliminated.

Parameters:

jumped by: Difference between current and incoming HLC values.

from node: Node that sent the message

current physical clock: This node's current clock time in milliseconds since Unix (not Citrusleaf) epoch.

current HLC: This node's current value for the hybrid logical clock.

incoming HLC: Value sent by the other node. This node has to use the larger of the two.

tolerable skew: Hardcoded to 1000ms.

could not allocate 1073741824-byte arena stage 13: Cannot allocate memory

Severity:

WARNING

Context:

index

Additional information

The Aerospike Enterprise edition allocates memory in 1GiB (1073741824-byte) arenas for the primary index. This message means that the process failed to allocate 13th such arena. A contiguous 1GiB of memory should be available. The process will still continue to serve read and update transactions but all new writes will fail. This message will always be followed by the following warning:

WARNING (index): (index.c:737) (repeated

)arenax alloc failed

{namespace_name} index-flash-usage: used-bytes 5502926848 used-pct 1 alloc-bytes 16384000 alloc-pct 92

Severity:

INFO

Context:

info

Introduced:

4.3.0.2

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace configured with 'index-type flash'.

Parameters:

{ns_name}: Name of the namespace the device and stats belongs to.

index-flash-usage: Name for which the following stats apply.

used-bytes: Total bytes in-use on the mount for the primary index used by this namespace on this node.

used-pct: Percentage of the mount in-use for the primary index used by this namespace on this node.

alloc-bytes: Total bytes allocated on the mount for the primary index used by this namespace on this node. This statistic represents entire 4KiB chunks which have at least one element in use. This statistic was introduced in 5.6. Corresponds to the index_flash_alloc_bytes statistic.

alloc-pct: Percentage of the mount allocated for the primary index used by this namespace on this node. This statistic represents entire 4KiB chunks which have at least one element in use. This statistic was introduced in 5.6. Corresponds to the index_flash_alloc_pct statistic.

{namespace_name} index-pmem-usage: used-bytes 5502926848 used-pct 1

Severity:

INFO

Context:

info

Introduced:

4.5

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace configured with 'index-type pmem'.

Parameters:

{ns_name}: Name of the namespace the index and stats belongs to.

index-pmem-usage: Name for which the following stats apply.

used-bytes: Total bytes in-use on the mount for the primary index used by this namespace on this node.

used-pct: Percentage of the mount in-use for the primary index used by this namespace on this node.

NODE-ID bb97f1d46894206 CLUSTER-SIZE 12

Severity:

INFO

Context:

info

Introduced:

3.9

Additional information

Parameters:

NODE-ID: The node id, generated based on the mac address and the service port

CLUSTER-SIZE: Number of nodes recognized by this node as being in the cluster

system-memory: free-kbytes 305769484 free-pct 57 heap-kbytes (135693715,211404072,233721856) heap-efficiency-pct 58.1

Severity:

INFO

Context:

info

Introduced:

3.9

Removed:

4.7.0.2

Additional information

Parameters:

free-kbytes: Amount of free RAM in kilobytes for the host. Corresponds to system_free_mem_kbytes. For versions prior to 3.16.0.4, the amount of shared memory used is wrongly reported as free.

free-pct: Percentage of all ram free (rounded to nearest percent) for the host. Corresponds to the system_free_mem_pct. For versions prior to 3.16.0.4, the amount of shared memory used is wrongly reported as free.

heap-kbytes: Heap statistics, in order: (heap_allocated_kbytes, heap_active_kbytes, heap_mapped_kbytes). Introduced as of version 3.10.1.

heap-efficiency-pct: Provides an indication of the jemalloc heap fragmentation. This represents the heap_allocated_kbytes / heap_mapped_kbytes ratio. A lower number indicates a higher fragmentation rate. Introduced as of version 3.10.1. Corresponds to the heap_efficiency_pct statistic.

system: total-cpu-pct 76 user-cpu-pct 44 kernel-cpu-pct 32 free-mem-kbytes 7462956 free-mem-pct 52 thp-mem-kbytes 4096

Severity:

INFO

Context:

info

Introduced:

4.7.0.2

Additional information

Parameters:

total-cpu-pct: Percent of time the CPU spent servicing user-space or kernel space tasks (i.e. percent of time not idle). Corresponds to system_total_cpu_pct.

user-cpu-pct: Percent of time the CPUs spent servicing user-space tasks. Corresponds to system_user_cpu_pct.

kernel-cpu-pct: Percent of time the CPUs spent servicing kernel-space tasks. Corresponds to system_kernel_cpu_pct.

free-mem-kbytes: Amount of free RAM in kilobytes for the host. Corresponds to system_free_mem_kbytes.

free-mem-pct: Percentage of all RAM free (rounded to nearest percent) for the host. Corresponds to the system_free_mem_pct.

thp-mem-kbytes: Amount of memory in use by the Transparent Huge Page mechanism, in kilobytes. Corresponds to system_thp_mem_kbytes. Displayed in 5.7 and later.

process: cpu-pct 28 threads (8,67,46,46) heap-kbytes (71477,72292,118784) heap-efficiency-pct 60.2

Severity:

INFO

Context:

info

Introduced:

4.7.0.2

Additional information

Parameters:

cpu-pct: Percent CPU time Aerospike was scheduled since previously reported. Corresponds to process_cpu_pct.

threads: Thread statistics, in order: (threads_joinable, threads_detached, threads_pool_total, threads_pool_active). Introduced as of version 5.6.

heap-kbytes: Heap statistics, in order: (heap_allocated_kbytes, heap_active_kbytes, heap_mapped_kbytes).

heap-efficiency-pct: Provides an indication of the jemalloc heap fragmentation. This represents the heap_allocated_kbytes / heap_mapped_kbytes ratio (prior to 5.7), or the heap_allocated_kbytes / heap_active_kbytes ratio (6.0 or later). A lower number indicates a higher fragmentation rate. Introduced as of version 3.10.1. Corresponds to the heap_efficiency_pct statistic.

in-progress: info-q 5 rw-hash 0 proxy-hash 0 tree-gc-q 0

Severity:

INFO

Context:

info

Introduced:

3.9

Additional information

Parameters:

tsvc-q: Removed in 4.7.0.2. Number of transactions siting in the transaction queue, waiting to be picked up by a transaction thread. Corresponds to the tsvc_queue statistic.

info-q: Number of transactions on the info transaction queue. Corresponds to the info_queue statistic.

nsup-delete-q: Removed in 4.5.1. Number of records queued up for deletion by the nsup thread.

rw-hash: Number of transactions that are parked on the read write hash. This is used for transactions that have to be processed on a different node. For example, prole writes, or read duplicate resolutions (when requested through client policy). Corresponds to the rw_in_progress statistic.

proxy-hash: Number of transactions on the proxy hash waiting for transmission on the fabric. Corresponds to the proxy_in_progress statistic.

rec-refs: Removed in 3.10. Number of references to a primary key.

tree-gc-q: Introduced in 3.10. This is the number of trees queued up, ready to be completely removed (partitions drop). Corresponds to the tree_gc_queue statistic.

fds: proto (38553,57711444,57672891) heartbeat (27,553,526) fabric (648,2686,2038)

Severity:

INFO

Context:

info

Introduced:

3.9

Additional information

Parameters:

proto: Client connections statistics, in order: client_connections, client_connections_opened, client_connections_closed - include connections that are reaped after idle (reaped connections correspond to reaped_fds statistic), properly shutdown by the client (initiated a proper socket close) or preliminary packet parsing errors (like unexpected headers, etc...) most of these would have a WARNING in the logs.

heartbeat: Heartbeat connections statistics, in order: heartbeat_connections, heartbeat_connections_opened, heartbeat_connections_closed.

fabric: Fabric connections statistics, in order: fabric_connections, fabric_connections_opened, fabric_connections_closed.

heartbeat-received: self 887075 : foreign 35456447

Severity:

INFO

Context:

info

Introduced:

3.9

Additional information

Parameters:

self: Number of heartbeats current node has received from itself (should be 0 for mesh).

foreign: Number of heartbeats the current node has received from all other nodes combined.

fabric-bytes-per-second: bulk (1525,7396) ctrl (33156,46738) meta (42,42) rw (128,128)

Severity:

INFO

Context:

info

Introduced:

3.11.1.1

Additional information

Occurs: Periodically displayed, every 10 seconds by default.

Parameters:

bulk: Current transmit and receive rate for fabric-channel-bulk. This channel is used for record migrations during rebalance.

ctrl: Current transmit and receive rate for fabric-channel-ctrl. This channel is used to distribute cluster membership change events and partition migration control messages.

meta: Current transmit and receive rate for fabric-channel-meta. This channel is used to distribute System Meta Data (SMD) after cluster change events.

rw: Current transmit and receive rate for fabric-channel-rw (read/write). This channel is used for replica writes, proxies, duplicate resolution, and various other intra-cluster record operations.

early-fail: demarshal 0 tsvc-client 1 tsvc-batch-sub 0 tsvc-udf-sub 0

Severity:

INFO

Context:

info

Introduced:

3.9

Removed:

4.5.1

Additional information

Occurs: Periodically displayed, every 10 seconds by default. Aggregate across all namespaces. Will only be displayed if there has been any transaction that failed early on this node.

Parameters:

demarshal: Failure during the demarshal phase of a transaction.

tsvc-client: Failure for client initiated transactions, before getting to the namespace part. This can be due to authentication failure (at the socket level), an initial partition imbalance (node just started and hasn't joined the cluster yet, which results in an unavailable error back to the client), or a missing or bad namespace provided.

tsvc-batch-sub: Similar as above, but as part of a batch sub transaction.

tsvc-udf-sub: Similar as above, but as part of a udf sub transaction.

early-fail: demarshal 0 tsvc-client 1 tsvc-from-proxy 0 tsvc-batch-sub 0 tsvc-from-proxy-batch-sub 0 tsvc-udf-sub 0 tsvc-ops-sub 0

Severity:

INFO

Context:

info

Introduced:

4.5.1

Additional information

Occurs: Periodically displayed, every 10 seconds by default. Aggregate across all namespaces. Will only be displayed if there has been any transaction that failed early on this node. Cumulative since asd start. Single-digit counts are probably benign.

Parameters:

demarshal: Failure during the demarshal phase of a transaction. Metric: demarshal_error.

tsvc-client: Failure for client initiated transactions, before getting to the namespace part. This can be due to authentication failure (at the socket level), an initial partition imbalance (node just started and hasn't joined the cluster yet, which results in an unavailable error back to the client), or a missing or bad namespace provided. Metric: early_tsvc_client_error.

tsvc-from-proxy: Failure for proxied transactions, before getting to the namespace part. This can be due to authentication failure (at the socket level), an initial partition imbalance (node just started and hasn't joined the cluster yet, which results in an unavailable error), or a missing or bad namespace provided. Metric: early_tsvc_from_proxy_error.

tsvc-batch-sub: Similar to above, but as part of a batch sub transaction. Metric: early_tsvc_batch_sub_error.

tsvc-from-proxy-batch-sub: Similar to above, but as part of a proxied batch sub transaction. Metric: early_tsvc_from_proxy_batch_sub_error.

tsvc-udf-sub: Similar to above, but as part of a udf sub transaction. Metric: early_tsvc_udf_sub_error.

tsvc-ops-sub: Versions 4.7 and above only. Similar to above, but as part of a scan/query background ops sub transaction. Metric: early_tsvc_ops_sub_error.

batch-index: batches (234,0,0) delays 0

Severity:

INFO

Context:

info

Introduced:

3.9

Additional information

Occurs: Periodically displayed, every 10 seconds by default. Aggregate across all namespaces. Will only be displayed if batch-index transactions have been issued on this node.

Parameters:

batches: Number of batch-index jobs since the server started (Success,Error,Timed out). Success means all the sub-transactions for the batch-index job were dispatched successfully. The sub-transactions for the batch-index job could still error or time out individually even if the parent batch-index job reported a success status. Also, a parent batch-index job not succeeding could still have some of its sub-transactions processed (with any resulting status). There is no correlation possible between a parent batch-index job status and its sub-transactions statuses.

Related metrics:

delays: Number of times the job's response buffer has been delayed by the sending process's WOULDBLOCK to avoid overflowing the buffer.

Related metric:

{ns_name} objects: all 845922 master 281071 prole 564851

Severity:

INFO

Context:

info

Introduced:

3.9

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Number of objects for this namespace on this node along with the master and prole breakdown.

Parameters:

{ns_name}: "ns_name" will be replaced by the name of a particular namespace.

all: Total number of objects for this namespace on this node (master and proles).

master: Number of master objects for this namespace on this node.

prole: Number of prole (replica) objects for this namespace on this node.

{ns_name} tombstones: all 11252 xdr (11223,0) master 5501 prole 5751 non-replica 0

Severity:

INFO

Context:

info

Introduced:

5.5

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Number of tombstones for this namespace on this node along with the breakdown.

Parameters:

{ns_name}: "ns_name" will be replaced by the name of a particular namespace.

all: Total number of tombstones for this namespace on this node.

xdr: Number of xdr tombstones and bin cemeteries - (xdr_tombstones,xdr_bin_cemeteries).

master: Number of master tombstones for this namespace on this node.

prole: Number of prole (replica) tombstones for this namespace on this node.

non-replica: Number of non-replica tombstones for this namespace on this node.

{ns_name} migrations: remaining (654,289,254) active (1,1,0) complete-pct 88.49

Severity:

INFO

Context:

info

Introduced:

3.9

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. When migrations have completed this line is reduced to {ns_name} migrations - complete.

Parameters:

{ns_name}: "ns_name" will be replaced by the name of a particular namespace.

remaining: Total number of transmit and receive partition migrations outstanding for this node as well as signals, as of new cluster protocol introduced in version 3.13 (tx,rx,sg). Signals represents the number of signals to send to other nodes (non replica nodes) for partitions to drop. This log line will change to complete once migrations are completed on this node.

active: Number of transmit and receive partition migrations currently in progress, as well as active signals as of new cluster protocol introduced in version 3.13 (tx,rx,sg).

complete-pct: Percent of the total number of partition migrations scheduled for this rebalance that have already completed.

{ns_name} memory-usage: total-bytes 3121300 index-bytes 140544 set-index-bytes 70272 sindex-bytes 221544 data-bytes 2688940 used-pct 0.05

Severity:

INFO

Context:

info

Introduced:

3.9

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. When 'data-in-memory' is false, 'data-bytes' will not be included. 'set-index-bytes' is included as of version 5.6.

Parameters:

total-bytes: Total number of bytes used in memory for {ns_name} on the local node.

index-bytes: Number of bytes holding the primary index in system memory for {ns_name} on the local node.
Will display 0 when index is not stored in RAM.

set-index-bytes: Number of bytes holding set indexes in process memory for {ns_name} on the local node. Displayed as of version 5.6.

sindex-bytes: Number of bytes holding secondary indexes in process memory for {ns_name} on the local node.

data-bytes: Number of bytes holding data in process memory for {ns_name} on the local node. Displayed only when 'data-in-memory' is set to true for {ns_name}.

used pct: Percentage of bytes used in memory for {ns_name} on the local node.

{ns_name} device-usage: used-bytes 2054187648 avail-pct 92 cache-read-pct 12.35

Severity:

INFO

Context:

info

Introduced:

3.9

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Break down of the device usage if a storage-engine device has been configured for the namespace.

Parameters:

used-bytes: Number of bytes used on disk for {ns_name} on the local node.

avail-pct: Minimum percentage of contiguous disk space in {ns_name} on the local node across all devices. Corresponds to the device_available_pct statistic.

cache-read-pct: Percentage of reads from the post-write cache instead of disk. Only applicable when {ns_name} is not configured for data in memory. Corresponds to the cache_read_pct statistic.

{ns_name} pmem-usage: used-bytes 2054187648 avail-pct 92

Severity:

INFO

Context:

info

Introduced:

4.8

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Break down of the pmem storage file usage if storage-engine pmem has been configured for the namespace.

Parameters:

used-bytes: Number of bytes used on pmem storage files for {ns_name} on the local node.

avail-pct: Minimum percentage of contiguous pmem storage file space in {ns_name} on the local node across all pmem storage files. Corresponds to the pmem_available_pct statistic.

{ns_name} client: tsvc (0,0) proxy (0,0,0) read (126,0,1,3,1) write (2886,0,23,2) delete (197,0,1,19,3) udf (35,0,1,4) lang (26,7,0,3)

Severity:

INFO

Context:

info

Introduced:

3.9

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Basic client transactions statistics. Will only be displayed after client transactions hit this namespace on this node.
The following values define various actions which are displayed in the logs

  • S - success
  • C - complete, but success/failure indeterminate (e.g. proxy diverts, UDFs can successfully send a "FAILURE" response bin)
  • E - error
  • T - timed out
  • N - not found, which for reads and deletes is a result that wants to be distinguished from success but is not an error
  • F - result filtered out or action skipped by predexp (versions 4.7 and above only)
  • R,W,D - successful UDF read, write, delete operation respectively

Parameters:

tsvc: Failures in the transaction service, before attempting to handle the transaction (E,T). Also reported as the client_tsvc_error and client_tsvc_timeout statistics.

proxy: Client proxied transactions (C,E,T). This should only happen during migrations. Also reported as the client_proxy_complete, client_proxy_error, and client_proxy_timeout statistics.

read: Client read transactions (S,E,T,N,F). Also reported as the client_read_success, client_read_error, client_read_timeout, client_read_not_found, and client_read_filtered_out statistics.

write: Client write transactions (S,E,T,F). Also reported as the client_write_success, client_write_error, client_write_timeout, and client_write_filtered_out statistics.

delete: Client delete transactions (S,E,T,N,F). Also reported as the client_delete_success, client_delete_error, client_delete_timeout, client_delete_not_found, and client_delete_filtered_out statistics.

udf: Client UDF transactions (C,E,T,F). Refer to the lang stat breakdown for the underlying operation statuses. Also reported as the client_udf_complete, client_udf_error, client_udf_timeout, and client_udf_filtered_out statistics.

lang: Statistics for UDF operation statuses (R,W,D,E). Also reported as the client_lang_read_success, client_lang_write_success, client_lang_delete_success, and client_lang_error statistics.

{ns_name} xdr-client: write (1543,0,15) delete (134,0,3,25)

Severity:

INFO

Context:

info

Introduced:

3.16.0.1

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace receiving write transactions from an XDR client. XDR client transactions statistics. Will only be displayed after an XDR client transactions hit this namespace on this node. The values on this line are a subset of the values displayed in the client statistics line just above in the log file.
The following values define various actions which are displayed in the logs

  • S - success
  • E - error
  • T - timed out
  • N - not found, which for reads and deletes is a result that wants to be distinguished from success but is not an error

Parameters:

write: XDR client write transactions (S,E,T). Also reported as the xdr_client_write_success, xdr_client_write_error and xdr_client_write_timeout statistics.

delete: XDR client delete transactions (S,E,T,N). Also reported as the xdr_client_delete_success, xdr_client_delete_error, xdr_client_delete_timeout, and xdr_client_delete_not_found statistics.

{ns_name} from-proxy: tsvc (0,0) read (105,0,1,7) write (2812,0,22,1) delete (188,0,1,16,2) udf (35,0,1,3) lang (26,7,0,3)

Severity:

INFO

Context:

info

Introduced:

4.5.1

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Basic proxied transactions statistics. Will only be displayed after proxied transactions hit this namespace on this node.
The following values define various actions which are displayed in the logs

  • S - success
  • C - complete, but success/failure indeterminate (e.g. UDFs can successfully send a "FAILURE" response bin)
  • E - error
  • T - timed out
  • N - not found, which for reads and deletes is a result that wants to be distinguished from success but is not an error
  • F - result filtered out or action skipped by predexp (versions 4.7 and above only)
  • R,W,D - successful UDF read, write, delete operation respectively

Parameters:

tsvc: Failures in the transaction service, before attempting to handle the transaction (E,T). Also reported as the from_proxy_tsvc_error and from_proxy_tsvc_timeout statistics.

read: Proxied read transactions (S,E,T,N,F). Also reported as the from_proxy_read_success, from_proxy_read_error, from_proxy_read_timeout, from_proxy_read_not_found, and from_proxy_read_filtered_out statistics.

write: Proxied write transactions (S,E,T,F). Also reported as the from_proxy_write_success, from_proxy_write_error, from_proxy_write_timeout, and from_proxy_write_filtered_out statistics.

delete: Proxied delete transactions (S,E,T,N,F). Also reported as the from_proxy_delete_success, from_proxy_delete_error, from_proxy_delete_timeout, from_proxy_delete_not_found, and from_proxy_delete_filtered_out statistics.

udf: Proxied UDF transactions (C,E,T,F). Refer to the lang stat breakdown for the underlying operation statuses. Also reported as the from_proxy_udf_complete, from_proxy_udf_error, from_proxy_udf_timeout, and from_proxy_udf_filtered_out statistics.

lang: Statistics for proxied UDF operation statuses (R,W,D,E). Also reported as the from_proxy_lang_read_success, from_proxy_lang_write_success, from_proxy_lang_delete_success, and from_proxy_lang_error statistics.

{ns_name} xdr-from-proxy: write (743,0,11) delete (104,0,3,21)

Severity:

INFO

Context:

info

Introduced:

4.5.1

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace receiving proxied write transactions from an XDR client. Proxied XDR transactions statistics. Will only be displayed after a proxied XDR transaction hits this namespace on this node. The values on this line are a subset of the values displayed in the from-proxy statistics line just above in the log file.
The following values define various actions which are displayed in the logs

  • S - success
  • E - error
  • T - timed out
  • N - not found, which for reads and deletes is a result that wants to be distinguished from success but is not an error

Parameters:

write: XDR client write transactions (S,E,T). Also reported as the xdr_from_proxy_write_success, xdr_from_proxy_write_error and xdr_from_proxy_write_timeout statistics.

delete: XDR client delete transactions (S,E,T,N). Also reported as the xdr_from_proxy_delete_success, xdr_from_proxy_delete_error, xdr_from_proxy_delete_timeout, and xdr_from_proxy_delete_not_found statistics.

{ns_name} re-repl: all-triggers (525,0,32) unreplicated-records 14

Severity:

INFO

Context:

info

Introduced:

4.0

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Statistics for transactions re-replicating. This is only applicable to strong-consistency enabled namespaces. In strong consistency mode, write transactions failing replication are marked as unreplicated and will attempt to re-replicate once immediately (despite returning a timeout failure to the client), as well on any subsequent transaction attempted on the record (read or write).
The following values define various actions which are displayed in the logs

  • S - success
  • E - error
  • T - timed out

Parameters:

all-triggers: Re-replication transactions (S,E,T). Also reported as the re_repl_success, re_repl_error and re_repl_timeout statistics.

unreplicated-records: Number of unreplicated records in the namespace.
Displayed as of version 5.7. Also reported as the unreplicated_records statistic.

{ns_name} batch-sub: tsvc (0,0) proxy (0,0,0) read (959,0,0,51,1) write (0,0,0,0) delete (0,0,0,0,0) udf (0,0,0,0) lang (0,0,0,0)

Severity:

INFO

Context:

info

Introduced:

3.9

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Batch index transactions statistics. Will only be displayed after batch index transactions hit this namespace on this node. There is no correlation possible between a parent batch-index job status and its sub-transactions statuses. See the batch-index log entry for some details.

Parameters:

tsvc: Number of batch-index read sub transactions that failed in the transaction service (Error,Timed out). Corresponds to the batch_sub_tsvc_error and batch_sub_tsvc_timeout statistics.

proxy: Number of proxied batch-index read sub transactions (Success,Error,Timed out). Corresponds to the batch_sub_proxy_complete, batch_sub_proxy_error, and batch_sub_proxy_timeout statistics.

read: Number of batch-index read sub transactions (Success,Error,Timed out,Not found,Filtered out). Corresponds to the batch_sub_read_success, batch_sub_read_error, batch_sub_read_timeout, batch_sub_read_not_found, and batch_sub_read_filtered_out statistics.

write: Number of batch-index write sub transactions (Success,Error,Timed out,Filtered out). Corresponds to the batch_sub_write_success, batch_sub_write_error, batch_sub_write_timeout, and batch_sub_write_filtered_out statistics. Displayed as of version 6.0.

delete: Number of batch-index delete sub transactions (Success,Error,Timed out,Not found,Filtered out). Corresponds to the batch_sub_delete_success, batch_sub_delete_error, batch_sub_delete_timeout, batch_sub_delete_not_found, and batch_sub_delete_filtered_out statistics. Displayed as of version 6.0.

udf: Number of batch-index udf sub transactions (Complete,Error,Timed out,Filtered out). Corresponds to the batch_sub_udf_complete, batch_sub_udf_error, batch_sub_udf_timeout, and batch_sub_udf_filtered_out statistics. Displayed as of version 6.0.

lang: Number of batch-index lang sub transactions (Delete Success,Error,Read Success,Write Success). Corresponds to the batch_sub_lang_delete_success, batch_sub_lang_error, batch_sub_lang_read_success, and batch_sub_lang_write_success statistics. Displayed as of version 6.0.

{ns_name} from-proxy-batch-sub: tsvc (0,0) read (959,0,0,51,1) write (0,0,0,0) delete (0,0,0,0,0) udf (0,0,0,0) lang (0,0,0,0)

Severity:

INFO

Context:

info

Introduced:

4.5.1

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Proxied batch index transactions statistics. Will only be displayed after proxied batch index transactions hit this namespace on this node. There is no correlation possible between a parent batch-index job status and its sub-transactions statuses. See the batch-index log entry for some details.

Parameters:

tsvc: Number of proxied batch-index read sub transactions that failed in the transaction service (Error,Timed out). Corresponds to the from_proxy_batch_sub_tsvc_error and from_proxy_batch_sub_tsvc_timeout statistics.

read: Number of proxied batch-index read sub transactions (Success,Error,Timed out,Not found). Corresponds to the from_proxy_batch_sub_read_success, from_proxy_batch_sub_read_error, from_proxy_batch_sub_read_timeout, from_proxy_batch_sub_read_not_found, and from_proxy_batch_sub_read_filtered_out statistics.

write: Number of proxied batch-index write sub transactions (Success,Error,Timed out,Filtered out). Corresponds to the from_proxy_batch_sub_write_success, from_proxy_batch_sub_write_error, from_proxy_batch_sub_write_timeout, and from_proxy_batch_sub_write_filtered_out statistics. Displayed as of version 6.0.

delete: Number of proxied batch-index delete sub transactions (Success,Error,Timed out,Not found,Filtered out). Corresponds to the from_proxy_batch_sub_delete_success, from_proxy_batch_sub_delete_error, from_proxy_batch_sub_delete_timeout, from_proxy_batch_sub_delete_not_found, and from_proxy_batch_sub_delete_filtered_out statistics. Displayed as of version 6.0.

udf: Number of proxied batch-index udf sub transactions (Complete,Error,Timed out,Filtered out). Corresponds to the from_proxy_batch_sub_udf_complete, from_proxy_batch_sub_udf_error, from_proxy_batch_sub_udf_timeout, and from_proxy_batch_sub_udf_filtered_out statistics. Displayed as of version 6.0.

lang: Number of proxied batch-index lang sub transactions (Delete Success,Error,Read Success,Write Success). Corresponds to the from_proxy_batch_sub_lang_delete_success, from_proxy_batch_sub_lang_error, from_proxy_batch_sub_lang_read_success, and from_proxy_batch_sub_lang_write_success statistics. Displayed as of version 6.0.

{ns_name} dup-res: ask 1234 respond (10,4321)

Severity:

INFO

Context:

info

Introduced:

5.5

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Statistics for transactions that are asking or handling duplicate resolution.

Parameters:

ask: Number of duplicate resolution requests made by the node to other individual nodes. Also reported as the dup_res_ask statistic.

respond: Number of duplicate resolution requests handled by the node, broken up between transactions where a read was required and transactions where a read was not required. Also reported as the dup_res_respond_read and dup_res_respond_no_read statistics.

{ns_name} scan: basic (11,0,0) aggr (0,0,0) udf-bg (5,0,0), ops-bg (10,0,0)

Severity:

INFO

Context:

info

Introduced:

3.9

Removed:

6.0

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Scan transactions statistics. Will only be displayed after scan transactions hit this namespace on this node.

Parameters:

basic: Number of scan jobs since the server started (Success,Error,Aborted). Corresponds to the scan_basic_complete, scan_basic_error, and scan_basic_abort statistics.

aggr: Number of scan aggregation jobs since the server started (Success,Error,Aborted). Corresponds to the scan_aggr_complete, scan_aggr_error, and scan_aggr_abort statistics.

udf-bg: Number of scan background udf jobs since the server started (Success,Error,Aborted). Corresponds to the scan_udf_bg_complete, scan_udf_bg_error, and scan_udf_bg_abort statistics.

ops-bg: Versions 4.7 and above only. Number of scan background operations (ops) jobs since the server started (Success,Error,Aborted). Corresponds to the scan_ops_bg_complete, scan_ops_bg_error, and scan_ops_bg_abort statistics.

{ns_name} pi-query: short-basic (4,0,0) long-basic (36,0,0) aggr (0,0,0) udf-bg (7,0,0), ops-bg (3,0,0)

Severity:

INFO

Context:

info

Introduced:

6.0

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Primary index query (pi-query) transactions statistics. Will only be displayed after pi-query transactions hit this namespace on this node.

Parameters:

short-basic: Number of short primary index (pi)-query jobs since the server started (Success,Error,Aborted). Short queries are declared by the client, are unmonitored and typically run for a second or less. Corresponds to the pi_query_short_basic_complete, pi_query_short_basic_error, and pi_query_short_basic_timeout statistics.

long-basic: Number of long pi-query jobs since the server started (Success,Error,Aborted). Long queries are monitored and not time bounded. Corresponds to the pi_query_long_basic_complete, pi_query_long_basic_error, and pi_query__long_basic_abort statistics.

aggr: Number of pi-query aggregation jobs since the server started (Success,Error,Aborted). Corresponds to the pi_query_aggr_complete, pi_query_aggr_error, and pi_query_aggr_abort statistics.

udf-bg: Number of pi-query background udf jobs since the server started (Success,Error,Aborted). Corresponds to the pi_query_udf_bg_complete, pi_query_udf_bg_error, and pi_query_udf_bg_abort statistics.

ops-bg: Number of pi-query background operations (ops) jobs since the server started (Success,Error,Aborted). Corresponds to the pi_query_ops_bg_complete, pi_query_ops_bg_error, and pi_query_ops_bg_abort statistics.

{ns_name} query: basic (5,0) aggr (6,0) udf-bg (1,0) ops-bg (2,0)

Severity:

INFO

Context:

info

Introduced:

3.9

Removed:

5.7

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Secondary index query transactions statistics. Will only be displayed after query transactions hit this namespace on this node.

Parameters:

basic: Number of secondary index queries since the server started (Success,Abort). Corresponds to the query_lookup_success and query_lookup_abort statistics.

aggr: Number of query aggregation jobs since the server started (Success,Abort). Corresponds to the query_agg_success and query_agg_abort statistics.

udf-bg: Number of query background udf jobs since the server started (Success,Failure). Corresponds to the query_udf_bg_success and query_udf_bg_failure statistics.

ops-bg: Versions 4.7 and above only. Number of query background operations (ops) jobs since the server started (Success,Failure). Corresponds to the query_ops_bg_success and query_ops_bg_failure statistics.

{ns_name} query: basic (210,0,0) aggr (0,0,0) udf-bg (0,0,0) ops-bg (0,0,0)

Severity:

INFO

Context:

info

Introduced:

5.7

Removed:

6.0

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Secondary index query transactions statistics. Will only be displayed after query transactions hit this namespace on this node.

Parameters:

basic: Number of secondary index queries since the server started (Completed,Error,Abort). Corresponds to the query_basic_complete, query_basic_error and query_basic_abort statistics.

aggr: Number of query aggregation jobs since the server started (Completed,Error,Abort). Corresponds to the query_aggr_complete, query_aggr_error and query_aggr_abort statistics.

udf-bg: Number of query background udf jobs since the server started (Completed,Error,Abort). Corresponds to the query_udf_bg_complete, query_udf_bg_error and query_udf_bg_abort statistics.

ops-bg: Versions 4.7 and above only. Number of query background operations (ops) jobs since the server started (Success,Failure). Corresponds to the query_ops_bg_success and query_ops_bg_failure statistics.

{ns_name} si-query: short-basic (4,0,0) long-basic (26,0,0) aggr (0,0,0) udf-bg (7,0,0) ops-bg (3,0,0)

Severity:

INFO

Context:

info

Introduced:

6.0

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Secondary index query (si-query) transactions statistics. Will only be displayed after si-query transactions hit this namespace on this node.

Parameters:

short-basic: Number of short secondary index queries since the server started (Completed,Error,Abort). Short queries are declared by the client, are unmonitored and typically run for a second or less. Corresponds to the si_query_short_basic_complete, si_query_short_basic_error and si_query_short_basic_timeout statistics.

long-basic: Number of long secondary index queries since the server started (Completed,Error,Abort). Long queries are monitored and not time bounded. Corresponds to the si_query_long_basic_complete, si_query_long_basic_error and si_query_long_basic_abort statistics.

aggr: Number of si-query aggregation jobs since the server started (Completed,Error,Abort). Corresponds to the si_query_aggr_complete, si_query_aggr_error and si_query_aggr_abort statistics.

udf-bg: Number of si-query background udf jobs since the server started (Completed,Error,Abort). Corresponds to the si_query_udf_bg_complete, si_query_udf_bg_error and si_query_udf_bg_abort statistics.

ops-bg: Number of si-query background operations (ops) jobs since the server started (Completed,Error,Abort). Corresponds to the si_query_ops_bg_complete, si_query_ops_bg_error, and si_query_ops_bg_abort statistics.

{ns_name} udf-sub: tsvc (0,0) udf (2651,0,0,1) lang (52,2498,101,0)

Severity:

INFO

Context:

info

Introduced:

3.9

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Scan/query UDF sub-transactions statistics. Will only be displayed after scan/query UDF sub-transactions hit this namespace on this node.

Parameters:

tsvc: Number of udf sub transactions of scan/query background udf jobs that failed in the transaction service (Error,Timed out). Corresponds to the udf_sub_tsvc_error and udf_sub_tsvc_timeout statistics.

udf: Number of udf sub transactions of scan/query background udf jobs (Success,Error,Timed out,Filtered out). Corresponds to the udf_sub_udf_complete, udf_sub_udf_error, udf_sub_udf_timeout, and. udf_sub_udf_filtered_out statistics.

lang: Different status counts for underlying udf operations for sub transactions of scan/query background udf jobs (Read,Write,Delete,Error). Corresponds to the udf_sub_lang_read_success, udf_sub_lang_write_success, udf_sub_lang_delete_success, and udf_sub_lang_error statistics.

{ns_name} ops-sub: tsvc (0,0) write (2651,0,0,1)

Severity:

INFO

Context:

info

Introduced:

4.7.0.2

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Scan/query ops sub-transactions statistics. Will only be displayed after scan/query ops sub-transactions hit this namespace on this node.

Parameters:

tsvc: Number of ops sub-transactions of scan/query background ops jobs that failed in the transaction service (Error,Timed out). Corresponds to the ops_sub_tsvc_error and ops_sub_tsvc_timeout statistics.

write: Number of ops sub-transactions of scan/query background ops jobs (Success,Error,Timed out,Filtered out). Corresponds to the ops_sub_write_success](/reference/metrics#ops_sub_write_success), [ops_sub_write_error](/reference/metrics#ops_sub_write_error), [ops_sub_write_timeout](/reference/metrics#ops_sub_write_timeout), and. [ops_sub_write_filtered_out` statistics.

{ns_name} retransmits: migration 0 client-read 0 client-write (0,1) client-delete (0,0) client-udf (0,0) batch-sub 0 udf-sub (0,0) nsup 0

Severity:

INFO

Context:

info

Introduced:

3.10.1

Removed:

4.5.1

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Retransmit statistics. Will only be displayed if any retransmit has taken place.

Parameters:

migration: Number of retransmits that occurred during migrations. Corresponds to the migrate_record_retransmits statistic.

client-read: Number of retransmits that occurred during read transactions (that were being duplicate resolved). Corresponds to the retransmit_client_read_dup_res statistic.

client-write: Number of retransmits that occurred during write transactions (that were being duplicate resolved, and replica written respectively). Corresponds to the retransmit_client_write_dup_res and retransmit_client_write_repl_write statistics.

client-delete: Number of retransmits that occurred during delete transactions (that were being duplicate resolved, and replica written respectively). Corresponds to the retransmit_client_delete_dup_res and retransmit_client_delete_repl_write statistics.

client-udf: Number of retransmits that occurred during client initiated udf transactions (that were being duplicate resolved, and replica written respectively). Corresponds to the retransmit_client_udf_dup_res and retransmit_client_udf_repl_write statistics.

batch-sub: Number of retransmits that occurred during batch sub transactions (that were being duplicate resolved). Corresponds to the retransmit_batch_sub_dup_res statistic.

udf-sub: Number of retransmits that occurred during udf sub transactions of scan/query background udf jobs (that were being duplicate resolved, and replica written respectively). Corresponds to the retransmit_udf_sub_dup_res and retransmit_udf_sub_repl_write statistics.

nsup: Number of retransmits that occurred during nsup initiated delete transactions (that were being replica written). Corresponds to the retransmit_nsup_repl_write statistic.

{ns_name} retransmits: migration 0 all-read 0 all-write (0,1) all-delete (0,0) all-udf (0,0) all-batch-sub 0 udf-sub (0,0) ops-sub (0,0)

Severity:

INFO

Context:

info

Introduced:

4.5.1

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Retransmit statistics. Will only be displayed if any retransmit has taken place.

Parameters:

migration: Number of retransmits that occurred during migrations. Corresponds to the migrate_record_retransmits statistic.

all-read: Number of retransmits that occurred during read transactions (that were being duplicate resolved). Corresponds to the retransmit_all_read_dup_res statistic.

all-write: Number of retransmits that occurred during write transactions (that were being duplicate resolved, replica written respectively). Corresponds to the retransmit_all_write_dup_res and retransmit_all_write_repl_write statistics.

all-delete: Number of retransmits that occurred during delete transactions (that were being duplicate resolved and replica written respectively). Corresponds to the retransmit_all_delete_dup_res, and retransmit_all_delete_repl_write statistics.

all-udf: Number of retransmits that occurred during udf transactions (that were being duplicate resolved and replica written respectively). Corresponds to the retransmit_all_udf_dup_res and retransmit_all_udf_repl_write statistics.

all-batch-sub: Number of retransmits that occurred during batch sub-transactions (that were being duplicate resolved). Corresponds to the retransmit_all_batch_sub_dup_res statistic.

udf-sub: Number of retransmits that occurred during udf sub-transactions of scan/query background udf jobs (that were being duplicate resolved and replica written respectively). Corresponds to the retransmit_udf_sub_dup_res and retransmit_udf_sub_repl_write statistics.

ops-sub: Versions 4.7 and above only. Number of retransmits that occurred during ops sub-transactions of scan/query background ops jobs (that were being duplicate resolved and replica written respectively). Corresponds to the retransmit_ops_sub_dup_res and retransmit_ops_sub_repl_write statistics.

{ns_name} special-errors: key-busy 1234 record-too-big 5678

Severity:

INFO

Context:

info

Introduced:

3.16.0.1

Removed:

5.6

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Special errors statistics. Will only be displayed if any of those errors have occurred.

Parameters:

key-busy: Number of key busy errors. Corresponds to the fail_key_busy statistic. See knowledge-base article on Hot Key Error code 14.

record-too-big: Number of record too big errors. Corresponds to the fail_record_too_big statistic. See knowledge-base article on Record too Big.

{ns_name} special-errors: key-busy 1234 record-too-big 5678 lost-conflict (256,32)

Severity:

INFO

Context:

info

Introduced:

5.6

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. Special errors statistics. Will only be displayed if any of those errors have occurred.

Parameters:

key-busy: Number of key busy errors. Corresponds to the fail_key_busy statistic. See knowledge-base article on Hot Key Error code 14.

record-too-big: Number of record too big errors. Corresponds to the fail_record_too_big statistic. See knowledge-base article on Record too Big.

lost-conflict: Composed of the following metrics:

histogram dump: {ns-name}-{hist-name} (1344911766 total) msec (00: 1262539302) (01: 0049561831) (02: 0013431778) (03: 0007273116) (04: 0004299011) (05: 0003086466) (06: 0002182478) (07: 0001854797) (08: 0000312272) (09: 0000370715)

Severity:

INFO

Context:

info

Introduced:

3.9

Additional information

In the above example, 05:0003086466 implies 3,086,466 data points took between 16 and 32 msec. Additional histograms can be accessed by enabling microbenchmarks and/or storage-benchmarks statically or dynamically in the service context of the configuration. See the Monitoring latencies page for monitoring latencies and details about the histograms.

Occurs: Periodically printed to the logs (every 10 seconds by default).

Parameters:

histogram dump: Name of the histogram to follow for the {ns-name} namespace

total: Number of data points represented by this histogram (since the server started)

N: Number of data points within units (e.g. msec or bytes) greater than 2(N-1) and less than 2N for N>0, between 0 and 1 for N=0

xdr-dc dc2: nodes 8 latency-ms 19

Severity:

INFO

Context:

info

Introduced:

6.0

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each XDR destination cluster (or DC).

Parameters:

xdr-dc: Name of the XDR destination cluster.

nodes: See description of nodes metric.

latency-ms: See description of latency_ms metric.

{ns_name} xdr-dc dc2: lag 12 throughput 710 in-queue 250563 in-progress 81150 complete (1002215,0,0,0) retries (0,0,23) recoveries (2048,0) hot-keys 4655

Severity:

INFO

Context:

info

Introduced:

6.0

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each combination of namespace and XDR destination cluster (or DC).

Parameters:

{ns_name}: "ns_name" will be replaced by the name of a particular namespace.

xdr-dc: Name of the XDR destination cluster.

lag: See description of lag metric.

throughput: See description of throughput metric.

in-queue: See description of in_queue metric.

in-progress: See description of in_progress metric.

complete: Composed of the following metrics:

retries: Composed of the following metrics:

recoveries: Composed of the following metrics:

hot-keys: See description of hot_keys metric.

system memory: free 46749588kb (94 percent free)

Severity:

INFO

Context:

info

Removed:

3.9

Additional information

Parameters:

free: Amount of free RAM in kilobytes

percent free: Percentage of all ram free (rounded to nearest percent)

ClusterSize 36 ::: objects 200435832

Severity:

INFO

Context:

info

Introduced:

3.7.0

Removed:

3.9

Additional information

Parameters:

ClusterSize: Number of nodes recognized by this node as being in the cluster

objects: Number of objects held by this node (includes both master and prole objects)

heartbeat_received: self 887075 : foreign 35456447

Severity:

INFO

Context:

info

Removed:

3.9

Additional information

Parameters:

self: Number of heartbeats current node has received from itself (should be 0 for mesh).

foreign: Number of heartbeats the current node has received from all other nodes combined.

{ns_name} migrations - remaining (352 tx, 107 rx), active (4 tx, 16 rx), 59.15% complete

Severity:

INFO

Context:

info

Introduced:

3.7.0

Removed:

3.8.4

Additional information

Occurs: Periodically displayed, every 10 seconds by default, for each namespace. When migrations have completed this line is reduced to {ns_name} migrations - complete.

Parameters:

{ns_name}: "ns_name" will be replaced by the name of a particular namespace.

remaining: Total number of receive (rx) and transmit (tx) partition migrations outstanding for this node.

active: Number of receive and transmit partition migrations currently in progress.

% complete: Percent of the total number of partition migrations scheduled for this rebalance that have already completed.

node id bb94a189d290c00

Severity:

INFO

Context:

info

Introduced:

3.7.1

Removed:

3.9

Additional information

Occurs: Every 30th statistics cycle.

Parameters:

node id: Displays the node id of the local node.

reads 1051,52 : writes 5415,23

Severity:

INFO

Context:

info

Introduced:

3.7.1

Removed:

3.9

Additional information

Occurs: Every 30th statistics cycle.

Parameters:

reads: Cumulative values for read (successes),(failures).

writes: Cumulative values for write (successes),(failures).

udf reads 79,2 : udf writes 2508,0 : udf deletes 102,0 : lua errors 1

Severity:

INFO

Context:

info

Introduced:

3.7.1

Removed:

3.9

Additional information

Occurs: Every 30th statistics cycle.

Parameters:

udf reads: Cumulative value for udf reads (successes),(failures).

udf writes: Cumulative value for udf writes (successes),(failures).

udf deletes: Cumulative value for udf deletes (successes),(failures).

lua errors: Cumulative value for lua errors.

index (new) batches 18,0 : direct (old) batches 0,0

Severity:

INFO

Context:

info

Introduced:

3.7.1

Removed:

3.9

Additional information

Occurs: Every 30th statistics cycle.

Parameters:

index (new) batches: Cumulative value for batch index request (successes),(failures)

direct (old) batches: Cumulative value for batch direct request (successes),(failures)

aggregation queries 6,0 : lookup queries 5,0

Severity:

INFO

Context:

info

Introduced:

3.7.1

Removed:

3.9

Additional information

Occurs: Every 30th statistics cycle.

Parameters:

aggregation queries: Cumulative value for aggregation query (successes),(failures)

direct (old) batches: Cumulative value for lookup query (successes),(failures)

proxies 0,0

Severity:

INFO

Context:

info

Introduced:

3.7.1

Removed:

3.9

Additional information

Occurs: Every 30th statistics cycle.

Parameters:

proxies: Cumulative value for proxy (successes),(failures).

{namespace} objects 2195 : sub-objects 0 : master objects 2195 : master sub-objects 0 : prole objects 0 : prole sub-objects 0

Severity:

INFO

Context:

info

Removed:

3.9

Additional information

Parameters:

objects: Number of objects in {namespace} on the local node

sub-objects: Number of sub-objects in {namespace} on the local node

master objects: Number of master objects in {namespace} on the local node

master sub-objects: Number of master sub-objects in {namespace} on the local node

prole objects: Number of replica objects in {namespace} on the local node

prole sub-objects: Number of replica sub-objects in {namespace} on the local node

{namespace} memory bytes used 456939 (index 140480 : sindex 221513 : data 94946) : used pct 0.01

Severity:

INFO

Context:

info

Removed:

3.9

Additional information

Parameters:

memory bytes used: Total number of bytes used in memory for {namespace} on the local node.

index: Number of bytes holding the primary index in system memory for {namespace} on the local node.

sindex: Number of bytes holding secondary indexes in process memory for {namespace} on the local node.

data: Number of bytes holding data in system memory for {namespace} on the local node. Only applicable when {namespace} is configured for data in memory.

used pct: Percentage of bytes used in memory for {namespace} on the local node.

{ns_name} disk bytes used 596736 : avail pct 99: cache-read pct 12.00

Severity:

INFO

Context:

info

Removed:

3.9

Additional information

Parameters:

disk bytes used: Number of bytes used on disk for {namespace} on the local node.

avail pct: Minimum percentage of contiguous disk space in {namespace} on the local node.

cache-read pct: Percentage of reads from the post-write cache instead of disk. Only applicable when {namespace} is not configured for data in memory.

rec refs 201422013 ::: rec locks 0 ::: trees 0 ::: wr reqs 24 ::: mig tx 0 ::: mig rx 0

Severity:

INFO

Context:

info

Removed:

3.9

Additional information

Parameters:

rec refs: Number of record references on node

rec locks: Number of records currently locked

trees: obsolete

wr reqs: Number of transactions currently waiting on other nodes

mig tx: Number of objects currently being transmitted by migrations

mig rs: Number of objects currently being received by migrations

basic scans 11,0 : aggregation scans 0,0 : udf background scans 5,0 :: active scans 0

Severity:

INFO

Context:

info

Introduced:

3.7.1

Removed:

3.9

Additional information

Occurs: Every 30th statistics cycle.

Parameters:

basic scans: Cumulative value for basic scan (successes),(failures).

aggregation scans: Cumulative value for aggregation scan (successes),(failures).

udf background scans: Cumulative value for udf background scan (successes),(failures).

active scans: Value for the number of active scans.

replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0

Severity:

INFO

Context:

info

Removed:

3.9

Additional information

partition state-transition errors. Should be zero.

trans_in_progress: wr 23 prox 0 wait 0 ::: q 1 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (17, 2699692, 2699675) : hb (2, 3, 1) : fab (30, 44, 14)

Severity:

INFO

Context:

info

Removed:

3.9

Additional information

Parameters:

wr: Number of writes in progress

prox: Number of proxies in progress

wait: Number of waiting transactions

q: Number of transactions on the transaction queue

bq: Number of batch transactions on the batch transaction queue

iq: Number of info transactions on the info transaction queue

dq: Number of nsup transactions on the nsup transaction queue

fds - proto: (Number of opened connections between this node and clients, Number of connections ever opened between this node and clients, Number of connections ever closed between this node and clients - can be reaped after idle, properly shutdown by the client (initiated a proper socket close), or preliminary packet parsing errors (like unexpected headers, etc...) most of these would have a WARN in the logs)

hb: (Number of presently open heartbeat connections (should be 0 for multicast), Total number of heartbeat connections ever opened, Total number of heartbeat connections ever closed)

fab: (Number of presently open fabric (intra-cluster) connections, Total number of fabric connections ever opened, Total number of fabric connections ever closed)

heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0`

Severity:

INFO

Context:

info

Removed:

3.9

Additional information

Parameters:

bt: Bad Type: Received heartbeat packet with an undefined message type.

bf: Bad Pulse FD: Received heartbeat packet on the wrong socket File Descriptor (FD.)

nt: No Type: Received a heartbeat packet without a message type (i.e., PULSE, INFO_REQUEST, INFO_REPLY.)

ni: No ID: Received a heartbeat packet without the message ID (i.e., missing protocol version.)

nn: No Node in Pulse: Received heartbeat PULSE packet without the node field set.

nnir: No Node in Info Request: Received heartbeat INFO_REQUEST packet without the node field set.

nal: No ANV Length: Received a heartbeat packet of type v2 or greater without the required Adjacent Nodes Vector (ANV) length.

sf1: Send Failed 1: Failed to send a heartbeat INFO_REQUEST packet to the remote node(s) via multicast.

sf2: Send Failed 2: Failed to send a heartbeat INFO_REQUEST packet to the remote node via mesh.

sf3: Send Failed 3: Failed to send a heartbeat INFO_REPLY packet to the remote node(s) via multicast.

sf4: Send Failed 4: Failed to send a heartbeat INFO_REPLY packet to the remote node via mesh.

sf5: Send Failed 5: Failed to send a heartbeat PULSE packet to the remote node(s) via multicast.

sf6: Send Failed 6: Failed to send a heartbeat PULSE packet to the remote node via mesh.

mrf: Missing Required Field: Received heartbeat INFO_REPLY packet without one or more required field (i.e., node, address, port.)

eh: Expire HB: Have not received a heartbeat from a remote node within the configured timeout interval.

efd: Expire Fabric Dead: Both heartbeat and fabric have have not received packets from the remote node within the configured timeout interval.

efa: Expire Fabric Alive: Have not received a heartbeat from the remote node within the configured timeout interval, but fabric is still receiving packets from the remote node.

um: Unparsable msg: Received data on a heartbeat socket that could not be parsed into a heartbeat msg.

mcf: Mesh Connect Failure: Failed to get information about the connected mesh socket.

rc: Remote Close: A heartbeat socket was closed at the remote end.

tree_counts: nsup 1 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 1 ssdw 0 rw 24

Severity:

INFO

Context:

info

Removed:

3.9

Additional information

Parameters:

tree: nsup 1 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 1 ssdw 0 rw 24

nsup: Partitions the namespace supervisor thread currently holds a reference to

scan: Partitions the scan threads currently hold a reference to

batch: Partitions the batch threads currently hold a reference to

dup: Partitions the duplicate resolution algorithm currently hold a reference to

wprocess: Partitions the write process algorithm currently hold a reference to
Write process is the process that handles the prole writes and prole acks

migrx: Partitions the migration receive threads currently hold a reference to

migtx: Partitions the migration transmit threads currently hold a reference to

ssdr: Partitions the SSD read threads currently hold a reference to

ssdw: Partitions the SSD write threads currently hold a reference to

namespace NAMESPACE: disk inuse: 1458377696640 memory inuse: 10318259584 (bytes) sindex memory inuse: 0 (bytes) avail pct 5 cache-read pct 11.07

Severity:

INFO

Context:

info

Removed:

3.9

Additional information

Parameters:

namespace: The namespace the following data pertains to

disk inuse: Amount of disk the namespace is using

memory inuse: Amount of memory the namespace is using

sindex memory inuse: Amount of memory used by secondary indexes

avail pct: This is the minimum between the amount of available memory and the amount of contiguous disk-space.

cache-read pct: Percentage of reads being read from the post-write cache rather than going to disk.

partitions: actual 89 sync 99 desync 0 zombie 0 wait 0 absent 3908

Severity:

INFO

Context:

info

Removed:

3.9

Additional information

Parameters:

actual: Number of master partitions owned by this node

sync: Number of non-master partitions owned by this node

absent: Number of partitions not owned by this node

desync: Number of partitions owned by this node, but currently without data. Only none-zero when cluster state is changing

zombie: Number of partitions with data, but not owned by this node. Only none-zero when cluster state is changing

wait: Should always be zero

histogram dump: {HISTOGRAM NAME} (1344911766 total) (00: 1262539302) (01: 0044998665) (02: 0013431778) (03: 0007273116) (04: 0004299011) (05: 0003086466) (06: 0002182478) (07: 0001854797) (08: 0000312272) (09: 0000370715) (10: 0000643337) (11: 0001045861) (12: 0001991430) (13: 0000882538)

Severity:

INFO

Context:

info

Removed:

3.9

Additional information

Additional histograms can be accessed by enabling microbenchmarks and/or storage-benchmarks statically or dynamically in the service context of the configuration.

Occurs: Periodically printed to the logs (every 10 seconds by default).

Parameters:

histogram dump: Name of the histogram to follow

total: Number of data points represented by this histogram

N: Number of data points that greater than 2N and less than 2(N+1)

handle insert: binless pickle, dropping <Digest>:0x2b2c08f859bd4eb401982e038a2bdcae2b74c853

Severity:

WARNING

Context:

migrate

Additional information

Attempted to migrate a record that had no bins, so didn't insert it into the receiving partition.

Parameters:

digest: The 160-bit digest of the record, in hex.

missing acks from node BB9030011AC4202

Severity:

WARNING

Context:

migrate

Additional information

This means that a particular migration thread has begun to throttle because it has 16MB or more of un-acked records, and has remained in that state for at least 5 seconds. The message will continue to print every 5 seconds until the outstanding acks drop below the threshold. This happens when migration has been pushed higher than your network or machine/disks can sustain. To mitigate, decrease migrate-threads (down to a minimum of 1). This change will not take effect immediately; threads will terminate as the partition they are handling completes migration.

Parameters:

node: The destination node of the migrate thread.

migrate: record flatten failed ce8a775a68c93d49

Severity:

WARNING

Context:

migrate

Additional information

The storage is full for this namespace, during the time of migration. Resolve by allocating more storage for namespace, or reducing volume of data in namespace.

Parameters:

record: Record identifier.

migrate: handle insert: got bad record

Severity:

WARNING

Context:

migrate

Additional information

Inbound record for migration is corrupt. Appears with the WARNING message about record too small 0 and is documented in the following KB Article.

migrate: handle insert: got no record

Severity:

WARNING

Context:

migrate

Additional information

Appears on the destination (inbound) node with the warning message about handle insert: got bad record. Associated with the new unreadable digest warning for Server 6.0.

migrate: unreadable digest

Severity:

WARNING

Context:

migrate

Additional information

Appears on the source node when the record could not be read locally. Introduced with Server 6.0. Associated with the handle insert: got no record warning.

malloc

Severity:

CRITICAL

Context:

msg

Additional information

Indicates a shortage of memory. Make sure nodes have enough memory.

fail persistent memory delete

Severity:

CRITICAL

Context:

namespace

Additional information

Indicates an issue deleting PMEM memory on startup. Can be resultant from incorrect permissions on the /mnt/pmem directory.

can't add SET (at sets limit)

Severity:

WARNING

Context:

namespace

Additional information

This warning indicates that the maximum number of sets per namespace has been breached. A namespace can hold a maximum of 1023 sets. Such an issue can cause migrations to get stuck and writes to fail. The article How to clear up set and bin names has details on how to address this situation.

Parameters:

set: Name of set that was being created in excess of the limit.

at set names limit, can't add set

Severity:

WARNING

Context:

namespace

Additional information

This warning indicates that the maximum number of sets per namespace has been breached. A namespace can hold a maximum of 1023 sets. Such an issue can cause migrations to get stuck and writes to fail. The article How to clear up set and bin names has details on how to address this situation.

ns can’t attach persistent memory base block: block does not exist

Severity:

WARNING

Context:

namespace

Additional information

Indicates a missing shared memory block. This commonly happens when a node is rebooted, but shared memory blocks can be deleted for other reasons. The node will have to perform a cold start.

{ns_name} found no valid persistent memory blocks, will cold start

Severity:

INFO

Context:

namespace

Additional information

Indicates that the treex and base shared memory blocks are missing. This typically happens when a node is rebooted or after an ungraceful shutdown. The node will have to perform a cold start.

{ns_name} persisted arena stages

Severity:

INFO

Context:

namespace

Introduced:

4.6.0.2

Additional information

This message is one of a sequence of messages logged during Aerospike server shutdown of storage-engine device namespaces. The message signifies that the arena stages for the namespace have been persisted to storage. An unusual delay in the appearance of this message during shutdown might be due to index-type configured to pmem.

{ns_name} persisted tree roots

Severity:

INFO

Context:

namespace

Introduced:

4.6.0.2

Additional information

This message is one of a sequence of messages logged during Aerospike server shutdown of storage-engine device namespaces. The message signifies that the namespace's common partition index tree information has been persisted to storage. An unusual delay in the appearance of this message during shutdown might be due to a high number of partition-tree-sprigs configured for the namespace.

{ns_name} persisted trusted base block

Severity:

INFO

Context:

namespace

Introduced:

4.6.0.2

Additional information

This message is one of a sequence of messages logged during Aerospike server shutdown of storage-engine device namespaces. The message signifies that the persistent memory base block for the namespace has been persisted to storage with "trusted" status. Note that "trusted" status is a necessary condition for a subsequent fast restart of the namespace.

fabric_connection_process_readable() recv_sz -1 msg_sz 0 errno 110 Connection timed out

Severity:

WARNING

Context:

network

Additional information

The above warning message indicates that a fabric connection timed out. In server version 5.6 and later, the log line has the node-id to identify which node is having the fabric connection time out.

{namespace} would evict all 146897768 records eligible - not evicting!

Severity:

WARNING

Context:

nsup

Additional information

The namespace supervisor (nsup) configuration required it to evict all expirable records (records with a configured time-to-live or TTL). Most commonly seen when evict-tenths-pct is set to 1000 or greater, but could also happen when the distribution of TTLs is extremely skewed.

Parameters:

namespace: Namespace where this happened.

record count: Total number of evictable records (that is, records with a TTL) in the namespace.

{namespace} failed set evict-void-time 328967879

Severity:

WARNING

Context:

nsup

Additional information

When high-water-disk-pct, high-water-memory-pct, or mounts-high-water-pct is breached on any node and eviction starts, the timestamp before which records need to be evicted is propagated to all nodes through the System Meta Data (SMD) mechanism, to ensure that no orphaned replicas are left anywhere in the cluster. If not all nodes have acknowledged the message within 5 seconds, this message will be logged. Occasional instances of this message do not indicate a serious problem, as the timestamp will be picked up on the next namespace supervisor (nsup) cycle.

Parameters:

ns: The namespace where evictions were triggered.

evict void time: The timestamp, in seconds since the Aerospike Epoch of 2010-01-01T00:00:00, before which records should be evicted.

{bigdata} hwm breached but nothing to evict

Severity:

WARNING

Context:

nsup

Additional information

The amount of data in memory (high-water-memory-pct) or on disk (high-water-disk-pct, mounts-high-water-pct) has exceeded the limits set for the namespace, triggering evictions, but there is no data with a finite TTL to be evicted.

Parameters:

ns: Namespace where the high water mark was breached.

{NAMESPACE} failed to create evict-prep thread 5

Severity:

CRITICAL

Context:

nsup

Additional information

Indicates a shortage of memory. Make sure nodes have enough memory.

Parameters:

NAMESPACE: The namespace nsup was looking at when memory ran low.

thread: Thread identifier.

{ns-name} nsup-start

Severity:

INFO

Context:

nsup

Removed:

4.5.1

Additional information

Occurs: Information logged when the namespace supervisor (nsup) starts for a given namespace.

{ns-name} nsup-start: expire

Severity:

INFO

Context:

nsup

Introduced:

4.5.1

Removed:

4.6.0

Additional information

Occurs: Logged when the namespace supervisor (nsup) begins expiration processing for a namespace.

{ns-name} nsup-start: expire-threads 1

Severity:

INFO

Context:

nsup

Introduced:

4.6.0

Additional information

Occurs: Logged when the namespace supervisor (nsup) begins expiration processing for a namespace.

Parameters:

expire-threads: The number of threads to be used for the expiration processing cycle. Corresponds to the configured nsup-threads.

{ns-name} nsup-start: evict-ttl 2745 evict-void-time (287665530,287665930)

Severity:

INFO

Context:

nsup

Introduced:

4.5.1

Removed:

4.6.0

Additional information

Occurs: Logged when the namespace supervisor (nsup) begins eviction processing for a namespace.

Parameters:

evict-ttl: The specified eviction depth for the namespace, expressed as a time to live threshold in seconds, below which any eligible records will be evicted.

evict-void-time: The current effective eviction depth and the specified eviction depth for the namespace. Each is expressed as a void time, in seconds since 1 January 2010 UTC.

{ns-name} nsup-start: evict-threads 1 evict-ttl 2745 evict-void-time (287665530,287665930)

Severity:

INFO

Context:

nsup

Introduced:

4.6.0

Additional information

Occurs: Logged when the namespace supervisor (nsup) begins eviction processing for a namespace.

Parameters:

evict-threads: The number of threads to be used for the eviction processing cycle.

evict-ttl: The specified eviction depth for the namespace, expressed as a time to live threshold in seconds, below which any eligible records will be evicted.

evict-void-time: The current effective eviction depth and the specified eviction depth for the namespace. Each is expressed as a void time, in seconds since 1 January 2010 UTC.

{ns-name} nsup-done: non-expirable 42162 expired (576066,922) evicted (24000935,259985) evict-ttl 134000 total-ms 155

Severity:

INFO

Context:

nsup

Introduced:

4.5.1

Additional information

Occurs: Logged when the namespace supervisor (nsup) completes expiration or eviction processing for a namespace. See the most recently logged nsup-start entry for the namespace to determine which type of processing has just completed. Also note that expiration processing will never evict records, but eviction processing can expire records.

Parameters:

non-expirable: The number of records without a TTL. These records do not expire and will never be eligible for eviction.

expired: Number of records removed due to expiration; total since the node started and total for the current nsup cycle. In this example, 922 records expired in the most recent nsup cycle, and 576066 records have expired since the node was last started.

evicted: Number of records evicted (early-expired); total since the node started and total for the current nsup cycle. In this example, nsup evicted 259985 records in the most recent cycle, and 24000935 records since the node was last started.

evict-ttl: The high-end expiration-time of evicted (early-expired) records (in seconds).

total-ms: Duration of the just completed nsup expiration or eviction processing cycle, in milliseconds. In this example, the processing cycle completed in 155 milliseconds

{ns-name} nsup-done: master-objects (638101,45630) expired (576066,922) evicted (24000935,259985) evict-ttl 0 waits (0,0) total-ms 155

Severity:

INFO

Context:

nsup

Introduced:

3.14

Removed:

4.5.1

Additional information

Occurs: Information logged after the namespace supervisor (nsup) completed a run for a namespace.

Parameters:

master-objects: Number of records scanned in the most recent nsup cycle. In this example, nsup scanned 638101 records. The second number is the number of records without TTL (those will never expire and will not be eligible for eviction).

expired: Number of records removed due to expiration, total since the node started and for the current nsup cycle. In this example, 922 records expired in the most recent nsup cycle, and 576066 records have expired since the node was last started.

evicted: Number of records evicted (early-expired), total since the node started and for the current nsup cycle. In this example, nsup evicted 259985 records in the most recent cycle, and 24000935 records since the node was last started.

evict-ttl: The high-end expiration-time of evicted (early-expired) records (in seconds).

waits: Accumulated waiting time for different stages of deletes to finish, in milliseconds. In order:

  • n_general_waits: the number of milliseconds nsup slept during general expiration and eviction while waiting for the nsup-delete-queue to drop to 10,000 elements or less (throttling).
  • n_clear_waits: the number of milliseconds until the nsup-delete-queue has cleared, at the end of the cycle for the current namespace.

total-ms: Duration of the most recent nsup cycle, in milliseconds. In this example, the nsup cycle completed in 155 milliseconds

{ns-name} Records: 638101, 0 0-vt, 922(576066) expired, 259985(24000935) evicted, 0(0) set deletes. Evict ttl: 0. Waits: 0,0,0. Total time: 155 ms

Severity:

INFO

Context:

nsup

Introduced:

3.8

Removed:

3.14

Additional information

Occurs: Information logged after the namespace supervisor (nsup) completed a run for a namespace. Values in parenthesis indicate cumulative count, instead of current-cycle count.

Parameters:

Records: Number of records scanned in the most recent nsup cycle. In this example, nsup scanned 638101 records.

0-vt: Number of records without TTL.

expired: Number of records removed due to expiration. In this example, 922 records expired in the most recent nsup cycle, and 576066 records have expired since the node was last started.

evicted: Number of records early-expired. In this example, nsup evicted 259985 records in the most recent cycle, and 24000935 records since the node was last started.

set deletes: If a set-delete command was issued, number of records deleted.

Evict ttl: The high-end expiration-time of early-expired records (in seconds).

Waits: Accumulated waiting time for different stages of delete to finish, in milliseconds. In each cycle, nsup performs set-deletes before general expiration and eviction.

  • n_set_waits: The first wait is the number of milliseconds that nsup slept during set-deletes stage while waiting for the nsup-delete-queue to drop to 10,000 elements or less (Throttling).
  • n_clear_waits: The second wait is the number of milliseconds until the nsup-delete-queue cleared (including the previous namespace if applicable) before beginning general expiration and eviction (Minimize unnecessary eviction if deletes already pending). For the last namespace in the nsup cycle, this is reported on its own line, nsup clear waits: 1441
  • n_general_waits: The third wait is the number of milliseconds nsup slept during general expiration and eviction while waiting for the nsup-delete-queue to drop to 10,000 elements or less (Throttling).

Total time: Duration of the most recent nsup cycle, in milliseconds. In this example, the nsup cycle completed in 155 milliseconds.

{ns-name} Records: 37118670, 0 0-vt, 0(377102877) expired, 185677(145304222) evicted, 0(0) set deletes, 0(0) set evicted. Evict ttls: 34560,38880,0.118. Waits: 0,0,8743. Total time: 45467 ms

Severity:

INFO

Context:

nsup

Removed:

3.8

Additional information

Occurs: Information logged after the namespace supervisor (nsup) completed a run for a namespace. Values in parenthesis indicate cumulative count, instead of current-cycle count.

Parameters:

Records: Number of records scanned in the most recent nsup cycle. In this example, nsup scanned 37118670 records

0-vt: Number of records without TTL

expired: Number of records removed due to expiration. In this example, no records expired in the most recent nsup cycle, and 377102877 records have expired since the node was last started

evicted: Number of records early-expired. In this example, nsup evicted 185677 records in the most recent cycle, and 145304222 records since the node was last started

set deletes: If a set-delete command was issued, number of records deleted

set evicted: If a set-eviction watermark is set, number of records early-expired

Evict ttls: The low-end and high-end expiration-time of early-expired records (in seconds) followed by the percentage of records evicted in the partially evicted bucket (within which evictions are random). In this example, the evicted records would have expired naturally within the next 34560 to 38880 seconds. Nsup evicted 0.118 percent of the records in the last (partial) bucket that it had to go through.

Waits: Accumulated waiting time for different stages of delete to finish, in microseconds. In each cycle, nsup performs set-deletes before general expiration and eviction.

  • n_set_waits: The first wait is the number of microseconds that nsup slept during set-deletes stage while waiting for the nsup-delete-queue to drop to 10,000 elements or less (Throttling).
  • n_clear_waits: The second wait is the number of microseconds until the nsup-delete-queue cleared (including the previous namespace if applicable) before beginning general expiration and eviction (Minimize unnecessary eviction if deletes already pending). For the last namespace in the nsup cycle, this is reported on its own line, nsup clear waits: 1441
  • n_general_waits: The third wait is the number of microseconds nsup slept during general expiration and eviction while waiting for the nsup-delete-queue to drop to 10,000 elements or less (Throttling).

Total time: Duration of the most recent nsup cycle, in milliseconds. In this example, the nsup cycle completed in 45467 milliseconds

nsup clear waits: 1441

Severity:

INFO

Context:

nsup

Removed:

3.14

Additional information

Occurs: Information logged after the namespace supervisor (nsup) completed a full cycle (for all namespaces).

Parameters:

clear waits: Number of microseconds until the nsup-delete-queue cleared from the deletes in the last namespace in the cycle. This is also printed as part of each namespace runs as the second number for Waits in the per namespace nsup completion log entry.

{ns-name} sindex-gc start

Severity:

INFO

Context:

nsup

Introduced:

3.14.0

Removed:

4.6.0

Additional information

Occurs: Starting secondary index (sindex) garbage collection. Replaced by "sindex-gc-start" message in 4.6.0.

{ns-name} sindex-gc: Processed: 3133360101, found:365961945, deleted: 365952323: Total time: 62667962 ms

Severity:

INFO

Context:

nsup

Introduced:

3.14.0

Removed:

4.6.0

Additional information

Occurs: Secondary index (sindex) garbage collection cycle summary. Replaced by "sindex-gc-done" message in 4.6.0.

Parameters:

Processed: Count of sindex entries that have been checked. Corresponds to the sindex_gc_objects_validated statistic.

found: Count of sindex entries found eligible for garbage collection. Corresponds to the sindex_gc_garbage_found statistic.

deleted: Count of sindex entries deleted through garbage collection (may be lower than above number if those entries got deleted while the garbage collector was running, for example through a competing truncate command). Corresponds to the sindex_gc_garbage_cleaned statistic.

Total time: Duration of a cycle of sindex garbage collection in milliseconds.

{ns-name} breached eviction hwm (memory), memory sz:18043661211 (1235879488 + 40166892 + 16767614831) hwm:18038862643, index-device sz:0 hwm:0, disk sz:18024160016 hwm:64424509440

Severity:

WARNING

Context:

nsup

Additional information

Occurs: Checking memory or disk usage at start of nsup cycle and finding that the high-water mark has been breached

Parameters:

breached eviction hwm: memory or disk depending on which high water mark was breached

memory: Memory used in bytes (primary index + secondary indexes + data in memory), and the amount of the high-water mark (total available * high-water-memory-pct. For versions 5.6 and above the set index memory used would be added as well.

index-device: Used space on device for index-type flash in bytes and the amount of the high-water mark (total available * mounts-high-water-pct

disk: Disk space used in bytes and the amount of the high-water mark (total available * high-water-disk-pct

{ns-name} breached stop-writes limit (memory), memory sz:129805368 (90093056 + 0 + 39712312) limit:128849018, disk avail-pct:100

Severity:

WARNING

Context:

nsup

Additional information

Occurs: Upon checking memory at start of nsup cycle and finding that a threshold triggering stop writes has been breached (either memory based on the stop-writes-pct configuration parameter or disk based on the min-avail-pct configuration parameter)

Parameters:

breached stop-writes limit: memory or device-avail-pct or memory & device-avail-pct depending on which threshold was breached

memory: Memory used in bytes, tracked under memory_used_bytes (primary index + secondary indexes + data in memory) [and set index bytes for versions 5.6 and above] and the amount of the stop-writes-pct (total available * stop-writes-pct)

disk avail-pct: The available contiguous free space, tracked under device_available_pct

{ns-name} no records below eviction void-time 329702784 - threshold bucket 9998, width 3154 sec, count 4000000 > target 20000 (0.5 pct)

Severity:

WARNING

Context:

nsup

Additional information

Occurs: Checking the eviction buckets from the bottom up, but there are no records found until one bucket has enough to put the total over the limit for one eviction cycle

Parameters:

void-time: Timestamp at the top of the bucket that breached the eviction limit

threshold bucket: Which bucket out of the evict-hist-buckets breached the eviction limit

width: Width of the bucket in seconds

count: Total records found in this bucket

target: Number of records allowed for this eviction cycle and the evict-tenths-pct used to calculate it

failed OS_CHECK check - MESSAGE

Severity:

WARNING

Context:

os

Introduced:

5.7

Additional information

Indicates that a linux best-practices was violated at startup.

Parameters:

OS_CHECK: Name of the check which was violated, could be one of the following:

sMessage: Description of how the best-practice was violated.

{ns_name} fresh-partitions 240

Severity:

INFO

Context:

partition

Introduced:

3.12.1

Removed:

3.13

Additional information

Number of partitions that are created fresh or empty because a number of nodes, greater than the replication factor, has left the cluster.

Occurs: When fresh partitions are introduced, typically during split brain situation.

Parameters:

fresh-partitions: The number of fresh (or empty) partitions that are introduced in the cluster.

{ns_name} 2 of 5 nodes are quiesced

Severity:

INFO

Context:

partition

Introduced:

4.3.1

Additional information

Number of nodes quiesced in the cluster (or sub cluster).

Occurs: When the cluster changes (node addition, removal, or network splits) or when the cluster receives a 'recluster' info command.

Parameters:

nodes participating: The number of nodes quiesced in this sub-cluster out of the total number of nodes observed in the sub-cluster (for strong-consistency namespaces, it will be full roster size instead of observed in the sub-cluster).

{ns_name} rebalanced: expected-migrations (1215,1224,1215) fresh-partitions 397

Severity:

INFO

Context:

partition

Introduced:

3.13

Additional information

Number of partitions expected to migrate (transmitted, received, and signals) as well as other migration related statistics and fresh partition number.

Occurs: For non strong-consistency namespaces, when the cluster changes, because of node addition, removal or network splits.

Parameters:

expected-migrations: The number of partitions expected to migrate (Transmitted, Received, Signals) as part of this reclustering event. Those correspond to the migrate_tx_partitions_initial, migrate_rx_partitions_initial, and migrate_signals_remaining statistics respectively.

fresh-partitions: Number of partitions that are created fresh or empty because a number of nodes, greater than the replication factor, has left the cluster.

{ns_name} rebalanced: regime 295 expected-migrations (826,826,826) expected-appeals 0 unavailable-partitions 425

Severity:

INFO

Context:

partition

Introduced:

4.0

Additional information

Number of partitions expected to migrate (transmitted, received, and signals) as well as other migration related statistics and partition availability details.

Occurs: For strong-consistency namespaces, when the cluster changes, because of node(s) leaving or joining the cluster (or network splits).

Parameters:

regime: This number increments every time there is a reclustering event. This is used in strong consistent namespace and is leveraged by the client libraries. For further details on regime, refer to the Strong Consistency architecture document.

expected-migrations: The number of partitions expected to migrate (Transmitted, Received, Signals) as part of this reclustering event. Those correspond to the migrate_tx_partitions_initial, migrate_rx_partitions_initial, and migrate_signals_remaining statistics respectively.

expected-appeals: The number of appeals expected as part of this reclustering event. Appeals occur after a node has been cold-started. The replication state of each record is lost on cold-start and all records must assume an unreplicated state. An appeal resolves replication state from the partition's acting master. These are important for performance; an unreplicated record will need to re-replicate to be read which adds latency. During a rolling cold-restart, an operator may want to wait for the appeal phase to complete after each restart to minimize the performance impact of the procedure. Corresponds to the appeals_tx_remaining statistic but only at the initial time of the reclustering event.

unavailable-partitions: The number of partitions that are unavailable as the roster is not complete and all writes that have occurred to those partitions are not present. Partitions remaining unavailable after the cluster is formed by the full roster will become dead and require the use of the revive command to make them available again, which could lead to inconsistencies, depending on what lead to those partition being dead. Revived nodes restore availability only when all nodes are trusted. Corresponds to the unavailable_partitions statistic.

{ns_name} rebalanced: regime 295 expected-migrations (826,826,826) expected-appeals 0 dead-partitions 425

Severity:

WARNING

Context:

partition

Introduced:

4.0

Additional information

Number of partitions expected to migrate (transmitted, received, and signals) as well as other migration related statistics and partition availability details.

Occurs: For strong-consistency namespaces, when the cluster reforms with all roster members but resulting in dead partitions present.

Parameters:

regime: This number increments every time there is a reclustering event. This is used in strong consistent namespace and is leveraged by the client libraries. For further details on regime, refer to the Strong Consistency architecture document.

expected-migrations: The number of partitions expected to migrate (Transmitted, Received, Signals) as part of this reclustering event. Those correspond to the migrate_tx_partitions_initial, migrate_rx_partitions_initial, and migrate_signals_remaining statistics respectively.

expected-appeals: The number of appeals expected as part of this reclustering event. Appeals occur after a node has been cold-started. The replication state of each record is lost on cold-start and all records must assume an unreplicated state. An appeal resolves replication state from the partition's acting master. These are important for performance; an unreplicated record will need to re-replicate to be read which adds latency. During a rolling cold-restart, an operator may want to wait for the appeal phase to complete after each restart to minimize the performance impact of the procedure. Corresponds to the appeals_tx_remaining statistic but only at the initial time of the reclustering event.

unavailable-partitions: The number of partitions that are dead. Corresponds to the dead_partitions statistic. Will require the use of the revive command to make such partitions available again, which could lead to inconsistencies, depending on what lead to those partition being dead.

{ns_name} 5 of 6 nodes participating - regime 221 -> 223

Severity:

INFO

Context:

partition

Introduced:

4.0

Additional information

Number of nodes participating in the cluster (or sub cluster) as well as the regime change.

Occurs: For strong-consistency namespaces, when the cluster changes, because of node(s) leaving or joining the cluster (or network splits).

Parameters:

nodes participating: The number of nodes participating in this cluster out of the total number of nodes for the full roster.

regime: This number increments every time there is a reclustering event. This is used in strong consistent namespace and is leveraged by the client libraries. For further details on regime, refer to the Strong Consistency architecture document.

protocol write fail: fd 123 sz 30 errno 32

Severity:

DEBUG

Context:

proto

Additional information

Indicates that the client has closed the socket before the server could respond. Client may have timed out, has too many open connections, or there may be network issues.

Parameters:

fd: File descriptor of the socket to the client.

sz: Size of the response that couldn't be sent.

errno: Error number returned by OS. 32, EPIPE, is common.

Ring buffer file /opt/aerospike/xdr/digestlog should be at least 18057 bytes Boot strap failed for digest log file (null)

Severity:

WARNING

Context:

rbuffer

Additional information

The minimum possible size of the XDR digestlog, as specified in the xdr-digestlog-path configuration parameter, is 18057 bytes. This is not a limitation that should ever come up in practice, because realistic sizes are in the tens or hundreds of GB.

unable to create digest log file /opt/aerospike/xdr/digestlog: No such file or directory

Severity:

WARNING

Context:

rbuffer

Additional information

asd tries to create the digest log at start time if it does not already exist, but may fail due to permissions issues, a bad directory, path, etc.

Parameters:

log path: Path of the file that asd tried to create, from the configuration parameter xdr-digestlog-path

system error: The message from the OS giving the reason for failing to create the file

{bigdata} record replace: drives full

Severity:

WARNING

Context:

record

Additional information

No more space left in storage.

Parameters:

ns: Namespace being written to at the time storage filled up.

{AS_PARTICLE} map_subcontext_by_key() cannot create key with non-storage elements

Severity:

WARNING

Context:

record

Additional information

A map context cannot be created with elements that are not stored, such as a wildcard.

{AS_PARTICLE} cdt_process_state_context_eval() bin is empty and op has no create flags

Severity:

WARNING

Context:

record

Additional information

The map context does not exist.

{AS_PARTICLE} packed_list_get_remove_by_index_range() index 155 out of bounds for ele_count 155

Severity:

WARNING

Context:

record

Additional information

There was an attempt to remove an element at index 156 in a list of 155 elements. Since index starts from 0, index 155 means element 156.

{namespace_name} record replace: failed write 1142f0217ababf9fda5b1a4de66e6e8d4e51765e

Severity:

DETAIL

Context:

record

Introduced:

5.2

Additional information

Most likely appearing as a result of exceeding the write-block-size. For more information, see the KB on write-block-size. The record's digest is the last item in the log entry. In order to determine what set is being written to, see the KB on How to return the set name of a record using its digest.

Parameters:

{namespace_name}: Namespace being written to.

write_master: disallowed ttl with nsup-period 0

Severity:

WARNING

Context:

rw

Additional information

When expirations in a namespace are disabled by setting nsup-period to 0 (which is the default in versions 4.9+), records with a TTL other than 0 may not be written to that namespace. This is to avoid confusion as to whether the records should be subject to expiration (or eviction). Simply set the nsup-period to a value other than 0 to allow records with a TTL other than 0 to be written. If there is a need to allow records with a non 0 TTL without having the nsup thread running, it is possible to set 'allow-ttl-without-nsup` to true, but this is absolutely not recommended as it would prevent those records to be properly deleted upon expiration.

{bigdata}: write_master: drives full

Severity:

WARNING

Context:

rw

Additional information

No more space left in storage.

Parameters:

ns: Namespace being written to at the time storage filled up.

dup-res ack: no digest

Severity:

WARNING

Context:

rw

Additional information

During a downgrade from a 4.5.3+ server to an earlier version, the following protocol-related warning may be experienced temporarily on a node with the earlier version. This warning is harmless. This warning is resultant of an older node briefly receiving 4.5.3+ protocol fabric messages from newer nodes. This message will cease after the next rebalance.

got rw msg with unrecognized op 8

Severity:

WARNING

Context:

rw

Additional information

During a downgrade from a 4.5.3+ server to an earlier version, the following protocol-related warning may be experienced temporarily on a node with the earlier version. This warning is harmless. This warning is resultant of an older node briefly receiving 4.5.3+ protocol fabric messages from newer nodes. This message will cease after the next rebalance.

{ns} can't get stored key &lt;Digest&gt;:0x5230acd92762fa6f827e902d58199dd7b928479c

Severity:

WARNING

Context:

rw

Additional information

This message indicates that the index has the stored-key flag set for a record, but the record in storage does not contain the key. Normally this would never happen, but there was a bug in some older versions whereby this mismatch could occur when a node containing old data was cold-started and brought back into the cluster with records that had been deleted but not durably deleted. This has been corrected in versions from 4.4.0.8 onward (and 4.3.1.8 onward in the 4.3.x branch) so should no longer be a concern. Enterprise Edition Licensees can contact Aerospike Support for further guidance when encountering this error.

key mismatch - end of universe?

Severity:

WARNING

Context:

rw

Additional information

This message results from KEY_MISMATCH error. It indicates that any update, delete or read request for a record which has key stored, the incoming key does not match with the existing stored key. This would in theory indicate a RIPEMD-160 key collision which is of course not likely (for details refer to the paper on Collision Resistance of RIPEMD-160. This message would therefore occur in case of key / hash mismatch on the application side or some message level corruption.

{namespace} drop while replicating

Severity:

WARNING

Context:

rw

Additional information

This message indicates that a record was dropped while replicating. For example if a truncation is run and records are still being replicated while the truncation hits the master record. This can also happen when non durable deletes are allowed (strong-consistency-allow-expunge) on a strong-consistency enabled namespace. This will increment the client_write_error metric.

WARNING (rw): (write.c:926) write_master: null/empty set name not allowed for namespace {namespace}

Severity:

WARNING

Context:

rw

Additional information

The write fails because the set name is null or empty and the configuration parameter disallow-null-setname is true.

{bar} write_master: record too big &lt;Digest&gt;:0xd751c6d7eea87c82b3d6332467e8bc9a3c630e13

Severity:

DETAIL

Context:

rw

Introduced:

3.16

Additional information

Appears with the WARNING message about failed as_storage_record_write() for exceeding the write-block-size. For more information, see the KB on FAQ - Write Block Size. In order to determine what set is being written to, see the KB on How to return the set name of a record using its digest.

Parameters:

{ns}: Namespace being written to

<Digest>: Digest of the record that was rejected

{bar} write_master: failed as_storage_record_write() &lt;Digest&gt;:0xd751c6d7eea87c82b3d6332467e8bc9a3c630e13

Severity:

WARNING

Context:

rw

Introduced:

3.16

Removed:

5.2

Additional information

Most likely appearing as a result of exceeding the write-block-size. For more information, see the KB on write-block-size. Setting the rw and drv_ssd contexts to detail logging will provide the accompanying explanatory log messages. Refer to the log-set and Changing Log Levels documentation pages for how to dynamically change log levels. In order to determine what set is being written to, see the KB on How to return the set name of a record using its digest.

Parameters:

{ns}: Namespace being written to

<Digest>: Digest of the record that was rejected

{namespace_name} write_master: failed as_storage_record_write() 1142f0217ababf9fda5b1a4de66e6e8d4e51765e

Severity:

DETAIL

Context:

rw

Introduced:

5.2

Additional information

Most likely appearing as a result of exceeding the write-block-size. For more information, see the KB on write-block-size. The record's digest is the last item in the log entry. In order to determine what set is being written to, see the KB on How to return the set name of a record using its digest.

Parameters:

{namespace_name}: Namespace being written to.

{ns_name} client 10.0.3.182:51160 write &lt;Digest&gt;:0x8df238affec6f8e3a2c22d6c54c91c5bc4f3ff81

Severity:

DETAIL

Context:

rw-client

Introduced:

3.16.0.1

Additional information

Provides details on the originating client's IP address, the transaction type and the digest of the record being accessed.

Occurs: Single line per client transactions that get as far as successfully reserving a partition. Requires log level to be set to detail for the rw-context: asinfo -v "log-set:id=0;rw-client=detail"

Parameters:

client: The originating client's IP address.

transaction <Digest>: The transaction type (read/write/delete/udf) as well as the digest.

basic scan job 283603086331273222 failed to start (4)

Severity:

WARNING

Context:

scan

Additional information

A scan job was submitted with maxRetries greater than 0 and failed or timed out on one or more nodes, causing a retry to conflict with the remaining scan that has the same transaction ID and exit with error code 4 (parameter error). See Why do I get a warning - “job with trid X already active” when issuing a scan? for more details.

Parameters:

trid: trid of the scan job being retried

scan msg from 10.11.12.13 has unrecognized set name_of_the_set

Severity:

WARNING

Context:

scan

Additional information

A scan specifying an unknown set was received. This generally is just a user error (mistyped the set name).

Parameters:

address: Node that the scan message came from.

setname: Set name that doesn't exist on this node.

send error - fd 646 sz 1049036 rv 521107

Severity:

WARNING

Context:

scan

Additional information

A scan is trying to send data back to the client, but it cannot send the full data set. This is likely due to an aborted client-side scan. The scan will need to be re-run.

Parameters:

fd: File descriptor of the socket to the client.

sz: Size of the response attempted.

rv: Size that was successfully sent.

error sending to 10.0.0.1:45678 - fd 646 sz 1049036 Connection timed out

Severity:

WARNING

Context:

scan

Additional information

A scan is trying to send data back to the client, but it cannot send the full data set. This is likely due to an aborted client-side scan. The scan will need to be re-run.

Parameters:

client: Client address

.

fd: File descriptor of the socket to the client.

sz: Size of the response attempted.

error: Error message returned by OS.

not starting scan 261941093 because rchash_put() failed with error -4

Severity:

WARNING

Context:

scan

Additional information

Server rejects scan transaction because it is already processing. Wait for the scan to complete or abort it before issuing the scan transaction again.

Parameters:

scan: ID of the scan.

errno: Error code.

starting basic scan job 8104671463142312256 {namespace:set} rps 0 sample-pct 100 socket-timeout 30000 from 172.22.xx.yy:52842

Severity:

INFO

Context:

scan

Introduced:

4.7.0.2

Removed:

4.9

Additional information

A basic scan job is initiated on specified namespace and set.

Parameters:

scan job: ID of the scan

rps: Configured records per second of the scan. Configured in the scan policy. If not specified, cluster maximum background-scan-max-rps will take in effect.

sample-pct: Percentage of records to return from the scan (if specified). Default of 100 percent. Configured in the scan policy.

metadata-only: Only if scan is configured to initiate a metadata scan.

fail-on-cluster-change: Only if configured in the scan policy or via AQL to halt scans if there is a cluster change (node(s) leaving or joining the cluster). See knowledge-base article on AEROSPIKE_ERR_CLUSTER_CHANGE on scanning records for details.

socket-timeout: Configured socket timeout. Configured in the client policy. Default of 30 seconds if not specified.

client: Client IP and port

starting basic scan job 8104671463142312256 {namespace:set} rps 0 sample-max 100 socket-timeout 30000 from 172.22.xx.yy:52842

Severity:

DEBUG

Context:

scan

Introduced:

4.9

Additional information

A basic scan job is initiated on specified namespace and set.

Parameters:

scan job: ID of the scan

rps: Configured records per second of the scan. Configured in the scan policy. If not specified, cluster maximum background-scan-max-rps will take in effect.

sample-max: Maximum number of records to return from the scan (if specified). Configured in the scan policy.

metadata-only: Only if scan is configured to initiate a metadata scan.

fail-on-cluster-change: Only if configured in the scan policy if there is a cluster change (node(s) leaving or joining the cluster). See knowledge-base article on AEROSPIKE_ERR_CLUSTER_CHANGE on scanning records for details.

socket-timeout: Configured socket timeout. Configured in the client policy. Default of 30 seconds if not specified.

client: Client IP and port

fd 361 send failed, errno 113

Severity:

WARNING

Context:

security

Additional information

Tried to send the client a message indicating security is not supported on this CE server, but failed due to a socket issue.

Parameters:

fd: ID of the file descriptor the message was sent on.

errno: Linux error code for the problem, usually something like 113 ("No route to host") or 110 ("Connection timed out").

role violation | authenticated user: sally | action: delete | detail: {test|setB} [D|ee50d7c1d0f427ed5c41ef8a18efd85412b973ff]

Severity:

INFO

Context:

security

Introduced:

3.7.0.1

Additional information

Provides details on the role violation, including transaction type, namespace, set and relevant record's digest.

Occurs: Occurs when role violation happens, if audit logging configured to report those. Refer to the Security Configuration paragraph for details.

Parameters:

authenticated user: The authenticated user violating the role's permissions.

action: The transaction type (read/write/delete/udf) or user or data related operation.

detail: The namespace, set and digest of the record involved in the role violation.

login - internal user credential mismatch

Severity:

WARNING

Context:

security

Introduced:

4.1.0.1

Additional information

Provides a warning that a login failed using a valid internal user that is found in the Access Control List with an incorrect password.

Occurs: Occurs when a failed login is attempted using an internal user that exists in the Access Control List but where an incorrect password has been supplied.

login - internal user not found

Severity:

WARNING

Context:

security

Introduced:

4.1.0.1

Additional information

Provides a warning that a login is attempted and the user specified does not exist in the Access Control list.

Occurs: Occurs when a login is attempted and the user specified does not exist in the Access Control list.

permitted | authenticated user: admin | action: create user | detail: user=bruce;roles=read-write

Severity:

INFO

Context:

security

Introduced:

3.7.0.1

Additional information

Provides details on the operation performed, including potentially transaction type, namespace, set and relevant record's digest.

Occurs: Occurs when permitted transaction happen under an authenticated user (in this case a user-admin related operation). Refer to the Security Configuration paragraph for details.

Parameters:

authenticated user: The authenticated user performing the operation.

action: The transaction type (read/write/delete/udf) or user or data related operation.

detail: The details of the operation, either for a user or admin related operation or namespace, set and digest of the record involved if a single record transaction.

login - internal user not found

Severity:

WARNING

Context:

security

Introduced:

4.1.0.1

Additional information

Provides a warning that a login failed with an internal user that is not found in the Access Control List.

Occurs: Occurs when a failed login is attempted with an internal user that is not found in the Access Control List. To log the details of the internal user that is not found, the report-authentication & report-violation configuration parameters must be set to true. Refer to the Security Configuration paragraph for details.

login - internal user using ldap

Severity:

WARNING

Context:

security

Introduced:

4.6.0.2

Additional information

Provides a warning that a login failed with an internal user that is not found in the Access Control List.

Occurs: Occurs when a login attempt fails with an internal user that is not found in the Access Control List (ACL). There are some conditions when the warning message is presented to the user while incorporating the use of an ACL.

One condition is when an encrypted ‘external’ (clear password encrypted) is used for LDAP but a stored hashed 'internal' password also exists for that user.

A second condition is when an incompatible Aerospike Client is attempting to connect to an Aerospike Server 4.6 or newer. Refer to the following Knowledge Base article for details to ensure you are using a compatible Aerospike Client version.

Another condition is when Aerospike clusters utilizing Cross-Datacenter Replication (XDR) are running with conflicting Aerospike Enterprise Edition Server versions. When running XDR and an ACL, Aerospike Enterprise Edition Server versions 4.1.0.1 to 4.3.0.6 are incompatible with Aerospike Enterprise Edition Server version 4.6 or newer. The incompatible Aerospike Enterprise Edition Server versions 4.1.0.1 to 4.3.0.6 cannot ship to Aerospike Enterprise Edition Server versions 4.6.0.2 or newer. The simplest workaround is to avoid using those incompatible Aerospike Enterprise Edition Server versions 4.1.0.1 to 4.3.0.6. Refer to the following Knowledge Base article for further details.

This warning message would typically begin to list in the logs when upgrading a cluster.

authentication failed (user) | client: 11.22.33.44:57754 | authenticated user: <none> | action: login | detail: user=baduser

Severity:

INFO

Context:

security

Additional information

Provides details on the internal user not found, including failure, client with IP address and port, authenticated user, action, and user specified.

Occurs: Occurs when a failed login is attempted with an internal user that is not found in the Access Control List, if audit logging is configured to report those. Refer to the Security Configuration paragraph for details. The report-authentication & report-violation configuration parameters must be set to true to enable this log message output.

Parameters:

authenticated failed (user): The user authentication failed.

client: The client IP Address and port.

authenticated user: The authenticated user, in this example: none.

action: The action being performed, in this case login.

detail: The username involved in the failed authentication.

refusing client connection - proto-fd-max 50000

Severity:

WARNING

Context:

service

Additional information

The server has reached the configured maximum for incoming file descriptors (proto-fd-max). This corresponds directly to incoming client connections. Connections above the maximum will be refused.

Parameters:

proto-fd-max: Currently configured value for proto-fd-max

Sindex-ticker: ns=ns-name si=&lt;all&gt; obj-scanned=500000 si-mem-used=47913 progress= 2% est-time=2336995 ms

Severity:

INFO

Context:

sindex

Additional information

Occurs: Information logged at startup when secondary indices are being rebuilt.

Parameters:

ns: Namespace name.

si: Secondary indexes being rebuilt.

obj-scanned: Number of objects scanned.

si-mem-used: Memory used by the secondary indices.

progress: Progress in percent.

est-time: Estimated remaining time in millisecond for the secondary indices to be fully rebuilt.

{ns-name} sindex-gc-start

Severity:

INFO

Context:

sindex

Introduced:

4.6.0

Additional information

Occurs: Starting secondary index (sindex) garbage collection.

{ns-name} sindex-gc-done: processed 3133360101 found 365961945 deleted 365952323 total-ms 62667962

Severity:

INFO

Context:

sindex

Introduced:

4.6.0

Removed:

5.7.0

Additional information

Occurs: Secondary index (sindex) garbage collection cycle summary. Replaced by modified the sindex-gc-done message in server version 5.7.

Parameters:

processed: Count of sindex entries that have been checked. Corresponds to the sindex_gc_objects_validated statistic.

found: Count of sindex entries found eligible for garbage collection. Corresponds to the sindex_gc_garbage_found statistic.

deleted: Count of sindex entries deleted through garbage collection (may be lower than above number if those entries got deleted while the garbage collector was running, for example through a competing truncate command). Corresponds to the sindex_gc_garbage_cleaned statistic.

total-ms: Duration of the sindex garbage collection cycle in milliseconds.

{ns-name} sindex-gc-done: cleaned (40000,40000) total-ms 23

Severity:

INFO

Context:

sindex

Introduced:

5.7.0

Additional information

Occurs: Secondary index (sindex) garbage collection cycle summary.

Parameters:

cleaned: Count of sindex entries that have been cleaned (cumulative, current). First value corresponds to the sindex_gc_cleaned statistic. Second value corresponds to the number of sindex entries cleaned in the current round.

total-ms: Duration of the sindex garbage collection cycle in milliseconds.

QTR Put in hash failed with error -4.

Severity:

WARNING

Context:

sindex

Introduced:

4.6.0

Additional information

Occurs: During secondary index query execution when two queries have been run with the same transaction ID. This error should be corrected within the application by setting distinct or random transaction IDs. See the QTR Put in Hash Failed article as well.

Queuing namespace {ns-name} for sindex population by device scan

Severity:

INFO

Context:

sindex

Introduced:

5.3.0

Additional information

Occurs: During startup, when the namespace's devices are about to be scanned in order to populate secondary indexes. This happens when sindex-startup-device-scan is true.

Queuing namespace {ns-name} for sindex population by index scan

Severity:

INFO

Context:

sindex

Introduced:

5.3.0

Additional information

Occurs: During startup, when the namespace's primary index is about to be scanned in order to populate secondary indexes. This happens when sindex-startup-device-scan is false.

failed to allocate a System Metadata cmd event

Severity:

CRITICAL

Context:

smd

Additional information

Indicates a shortage of memory. Make sure nodes have enough memory.

Error while connecting: 113 (No route to host)

Severity:

WARNING

Context:

socket

Additional information

Failed to make the connection to peer node for sending heartbeat messages.

Parameters:

errno: Linux error code returned by the OS.

error message: Message corresponding to errno.

Error while creating socket for 10.168.10.1:3002: 24 (Too many open files)

Severity:

WARNING

Context:

socket

Additional information

Too many file descriptors are in use. This can lead to an assertion that would cause the node to abort.

Refer to the Increase the Maximum Number of Open Files article.

epoll_create() failed: 24 (Too many open files)

Severity:

CRITICAL

Context:

socket

Additional information

Occurs when the server has hit the system configured file descriptor limit, which leads to an assertion causing the node to abort.

Refer to the Increase the Maximum Number of Open Files article.

too many addresses for interface <interface> - truncating <IP address>

Severity:

WARNING

Context:

socket

Additional information

Too many IP addresses associated with a network interface. Limit is 20 addresses per interface.

bind: socket in use, waiting (port:3001)

Severity:

WARNING

Context:

socket

Additional information

Some other process is already using that port and asd cannot listen on it. Use losf -i:3001 or netstat -plant | grep :3001 to find out what process is causing the conflict.

Parameters:

port: Port number that asd needs to listen on.

Error while connecting socket to 10.100.0.101:3002

Severity:

WARNING

Context:

socket

Additional information

The node is unable to connect to a configured peer node. The peer may not be up, or there may be connectivity issues. If the node had been permanently removed, refer to the documentation on Removing a Node.

Parameters:

address:port: Address and port that could not be connected to.

{ns_name} partitions shut down

Severity:

INFO

Context:

storage

Introduced:

4.6.0.2

Additional information

This message is one of a sequence of messages logged during Aerospike server shutdown of storage-engine device namespaces. The message signifies that all of the namespace's partitions and index trees have been locked, so that no records are accessible.

compression is not available for storage-engine memory

Severity:

WARNING

Context:

storage

Introduced:

4.9.0

Additional information

This message means that an attempt was made to add compression on a memory storage device. This is not supported.

{ns_name} storage devices flushed

Severity:

INFO

Context:

storage

Introduced:

4.6.0.2

Additional information

This message is one of a sequence of messages logged during Aerospike server shutdown of storage-engine device namespaces. The message signifies that the data in write buffers for the namespace's devices has been successfully flushed to those devices.

{ns_name} storage-engine memory - nothing to do

Severity:

INFO

Context:

storage

Introduced:

4.6.0.2

Additional information

This message is logged during Aerospike server shutdown of storage-engine memory namespaces. The message simply notes that there are no storage shutdown tasks needed for the namespace.

initiating storage shutdown ...

Severity:

INFO

Context:

storage

Introduced:

3.0

Removed:

4.6.0.2

Additional information

This message is a part of the logs that indicate that a shutdown of Aerospike process was initiated.

flushing data to storage ...

Severity:

INFO

Context:

storage

Introduced:

3.0

Removed:

4.6.0.2

Additional information

This message is a part of the logs that indicate that a shutdown of Aerospike process was initiated. This message signifies that the data in write memory buffer is being flushed to persistence. This includes arena stages, persisted roots and base blocks.

completed flushing to storage

Severity:

INFO

Context:

storage

Introduced:

3.0

Removed:

4.6.0.2

Additional information

This message is a part of the logs that indicate that a shutdown of Aerospike process was initiated. This message signifies that the data in write memory buffer has been successfully flushed to persistence.

SSL_accept with 127.0.0.1:34100 failed: error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca

Severity:

WARNING

Context:

tls

Additional information

The client sent a proper client certificate for mutual authentication, but sent a CA certificate that either doesn't match the client certificate, or doesn't match the server CA certificate.

Parameters:

IP:port: Source IP address and port of the client connection.

SSL_accept with 127.0.0.1:34094 failed: error:14089086:SSL routines:ssl3_get_client_certificate:certificate verify failed

Severity:

WARNING

Context:

tls

Additional information

The client sent a proper client certificate and CA certificate for mutual authentication, but sent a TLS name that doesn't match the server's TLS name as set in tls-authenticate-client and the server's certificate.

Parameters:

IP:port: Source IP address and port of the client connection.

SSL_accept with 127.0.0.1:34086 failed: error:140890C7:SSL routines:ssl3_get_client_certificate:peer did not return a certificate

Severity:

WARNING

Context:

tls

Additional information

The client sent a proper CA certificate for mutual authentication that matches the server CA certificate, but didn't send a client certificate.

Parameters:

IP:port: Source IP address and port of the client connection.

SSL_accept I/O unexpected EOF with 127.0.0.1:46630

Severity:

WARNING

Context:

tls

Additional information

The server is attempting to use a connection that has been closed by the client. This can happen if a client times out a transaction, for example if the server is too slow to respond or if there are network disruptions.

Parameters:

IP:port: Source IP address and port of the client connection.

SSL_read I/O unexpected EOF with 127.0.0.1:46491

Severity:

WARNING

Context:

tls

Additional information

The server is attempting to use a connection that has been closed by the client. This can happen if a client times out a transaction, for example if the server is too slow to respond or if there are network disruptions.

Parameters:

IP:port: Source IP address and port of the client connection.

TLS verify result: unable to get local issuer certificate

Severity:

WARNING

Context:

tls

Additional information

A CA or intermediate CA is not trusted by the server. This may be caused by a self-signed certificate or CA that is not yet added to the server truststore.

{ns-name|set-name} got command to truncate to now (226886718769)

Severity:

INFO

Context:

truncate

Introduced:

3.12

Additional information

Occurs: Truncate command received. Will only appear on the node to which the info command was issued. The command is distributed to other nodes via system metadata (SMD), and only the truncating/starting/restarting/truncated/done log entries will appear on those nodes.

Parameters:

(timestamp): The lut time in milliseconds since the Citrusleaf epoch (00:00:00 UTC on 1 Jan 2010).

{ns-name|set-name} truncating to 226886718769

Severity:

INFO

Context:

truncate

Introduced:

3.12

Additional information

Occurs: Truncate command received. Will appear on all the nodes after a truncate command is issued.

Parameters:

timestamp: The lut time in milliseconds since the Citrusleaf epoch (00:00:00 UTC on 1 Jan 2010).

{ns-name} starting truncate

Severity:

INFO

Context:

truncate

Introduced:

3.12

Additional information

Occurs: Truncate command being processed for the namespace.

{ns-name} truncated records (10,50)

Severity:

INFO

Context:

truncate

Introduced:

3.12

Additional information

Occurs: Truncate command being processed for the namespace.

Parameters:

(current,total): Current truncation count (10 in this example) followed by the total number of records have been deleted by truncation since the server started (50 in this example). Those counts are only kept at the namespace level.

{ns-name} done truncate

Severity:

INFO

Context:

truncate

Introduced:

3.12

Additional information

Occurs: Truncate command completed.

{ns-name|set-name} undoing truncate - was to 226886718769

Severity:

INFO

Context:

truncate

Introduced:

4.3.1.11, 4.4.0.11, 4.5.0.6, 4.5.1.5

Additional information

Occurs: Truncate command undone.

{ns-name} flagging truncate to restart

Severity:

INFO

Context:

truncate

Introduced:

3.12

Additional information

Occurs: This log line indicates that another truncation pass through the namespace will be required once the current pass has completed. Truncation can act on more than one set at a time, therefore, if a truncate command is received for a set in a namespace that is currently already going through truncation (for a different set for example), a subsequent iteration would be required. For example, consider a scenario where a truncate against set s2 is ongoing when a truncate command against set s4 is triggered. In this case, from the moment the log message appears (when the command to truncate against set4 is issued), both set2 and set4 will be truncated together until the end of the current pass. Then truncation will restart another pass through the namespace, so that the rest of set4 gets truncated.

{ns-name|set-name} tombstone covers no records on this node

Severity:

INFO

Context:

truncate

Introduced:

3.12

Additional information

Occurs: This log line occurs at cold start-up and is a listing of all set truncations found in the truncation SMD file (System Meta Data) that are not covering any records. The internal mechanism is that all set truncation tombstones are marked as cenotaph before records are read from drive to build the index. The term tombstone refers here to such a truncation related entry, not to be confused with record level tombstones. We will only cycle through those tombstones once for a namespace during a cold-start. If the drives are all fresh, there will be a cenotaph log message for every set truncation in SMD. After the drives are read (which could potentially restore records from truncated sets, thus stripping a set-covering tombstone of its cenotaph status), all the set-truncation tombstones that are still cenotaphs (not covering any tombstones) are listed.

rejecting client transaction - initial partition balance unresolved

Severity:

WARNING

Context:

tsvc

Additional information

When using some older Aerospike Client Libraries, there is a very small window where a node joins a cluster and other nodes begin to advertise the node but the node hasn’t finished creating its partition table and a client picked up that advertised service and made a request. This message happens only during that window and should resolve on its own very quickly.

transaction is neither read nor write - unexpected

Severity:

WARNING

Context:

tsvc

Additional information

An invalid request has been received, either a read/write request where the corresponding bit is not set, or an operate() command with no operations. The error code -4 (FAIL_PARAMETER) is returned to the client if there is one, but this message can also be caused by non-Aerospike traffic reaching the service port.

large number of Lua function arguments (22)

Severity:

WARNING

Context:

udf

Additional information

As per the Known Limitations page, calling a Lua function with a large number of argument can cause instability in the Lua Runtime Engine. Although this problem is only known to become acute at around 50 arguments, that's not a sharp cutoff and lower values may still result in issues with execution of the UDF.

Parameters:

arg count: Number of arguments in the UDF call

drives full, record will not be updated

Severity:

WARNING

Context:

udf

Removed:

5.1.0

Additional information

No more space left in storage (happened while executing a UDF).

record has too many bins (513) for UDF processing

Severity:

WARNING

Context:

udf

Removed:

5.1

Additional information

As per the UDF known limitations page, records with more than 512 bins cannot be read or written by a UDF. Such records can exist in Aerospike, however.

Parameters:

bins: How many bins the record has.

UDF bin limit (512) exceeded (bin activity_map)

Severity:

WARNING

Context:

udf

Removed:

5.1

Additional information

As per the UDF known limitations page, records with more than 512 bins cannot be read or written by a UDF.

Parameters:

bin: Name of the 513th bin.

bin limit of 512 for UDF exceeded: 511 bins in use, 1 bins free, 3 new bins needed

Severity:

WARNING

Context:

udf

Removed:

5.1

Additional information

A UDF operation that adds bins to a record would result in more than 512 bins in the record. As per the UDF known limitations page, records with more than 512 bins cannot be read or written by a UDF.

Parameters:

in-use bins: Number of bins already in the record.

free bins: 512 - (# of in-use bins)

new bins: Number of new bins that would be need to be added for this operation to succeed.

bin limit of 512 for UDF exceeded: 512 bins in use, 0 bins free, >4 new bins needed

Severity:

WARNING

Context:

udf

Removed:

5.1

Additional information

A UDF operation is trying to add bins to a record that already has 512 bins. As per the UDF known limitations page, records with more than 512 bins cannot be read or written by a UDF.

Parameters:

new bins: Number of new bins that would be need to be added for this operation to succeed.

exceeded UDF max bins 512

Severity:

WARNING

Context:

udf

Introduced:

5.1

Additional information

UDF is attempting to set a bin value, which would result in writing a record with more than 512 bins. As per the UDF known limitations page, records with more than 512 bins cannot be written by a UDF.

too many bins for UDF

Severity:

WARNING

Context:

udf

Introduced:

5.1

Additional information

UDF is attempting to set a bin value, but has already set 512 bin values. As per the UDF known limitations page, records with more than 512 bins cannot be written by a UDF.

too many bins (513) for UDF

Severity:

WARNING

Context:

udf

Introduced:

5.1

Additional information

UDF is attempting to use a record (with server version 5.1 or older) or access the bins of a record (with server version 5.2+) with more than 512 bins. As per the UDF known limitations page, records with more than 512 bins cannot be accessed by a UDF, unless the UDF is read-only and accesses only the record's metadata, with server version 5.2+.

Parameters:

bins: Number of bins in the record.

udf_aerospike_rec_update: failure executing record updates (-3)

Severity:

WARNING

Context:

udf

Removed:

5.1.0

Additional information

UDF could not update a record, so the update was rolled back. See earlier messages in the log for further details.

Parameters:

error code: Undocumented error code.

UDF timed out 734 ms ago

Severity:

WARNING

Context:

udf

Additional information

An individual UDF transaction exceeded an internal timeout interval. Time gets checked at regular interval (of LUA instruction sets).

Parameters:

elapsed time: Milliseconds since the timeout period expired. Refer to the transaction-max-ms configuration for further details on when transactions could timeout on the server.

{namespace_name} failed write 1142f0217ababf9fda5b1a4de66e6e8d4e51765e

Severity:

DETAIL

Context:

udf

Introduced:

5.2

Additional information

Most likely appearing as a result of exceeding the write-block-size with a UDF write. For more information, see the KB on write-block-size. The record's digest is the last item in the log entry. In order to determine what set is being written to, see the KB on How to return the set name of a record using its digest.

Parameters:

{ns}: Namespace being written to.

udf applied with forbidden policy

Severity:

WARNING

Context:

udf

Introduced:

5.2

Additional information

A result of the client submitting a record UDF request with one or more forbidden policies indicated. A parameter error (code 4) is returned to the client in this situation. The forbidden policies are:
Generation
Generation gt
Create only
Update only
Create or replace
Replace only
Note that any of these policies can be enforced within the UDF itself via Aerospike's Lua API.
In server versions prior to 5.2, the policy flags are ignored.

FAILURE when calling llist add /opt/aerospike/sys/udf/lua/ldt/lib_llist.lua:4013: 1402:LDT-Unique Key or Value Violation

Severity:

WARNING

Context:

udf

Removed:

3.15.0.1

Additional information

Elements within a list must have a unique key. This warning hints that list with duplicate keys are being sent.

[INFO] TYPE ObjectValue(string) TYPE SearchKey(number)

Severity:

WARNING

Context:

udf

Introduced:

3.3.19

Removed:

3.14.1

Additional information

A list must have the same type for all it’s key. The first one will set it for the rest. So once a string is inserted as a key in the list, only strings can be inserted after.

Parameters:

existing type: Datatype established for keys in this list by the first element.

new type: Datatype of the invalid new element's key.

[WARNING]<lib_llist_2014_10_07.A:getKeyValue()> LLIST requires a KeyFunction for Objects

Severity:

INFO

Context:

udf

Introduced:

3.3.19

Removed:

3.14.1

Additional information

When inserting into a large list a keyfunction needs to be defined (to be used as a key for the element being inserted). This warning is when such keyfunction was not provided. (For Large Map, a key field or key function must be provided).

[ERROR]<lib_llist_2014_09_04.G:localWrite()>TopRec Update Error rc(-1)

Severity:

WARNING

Context:

udf

Introduced:

3.3.19

Removed:

3.14.1

Additional information

By default, for large lists, if the number of elements is below 100, they will be stored into a regular bin. Now if the number of passed elements is less then 100 but the combined size is above the write-block-size (128K for nsCostBasis), then it will not fit and this error will be triggered. This behavior is configurable at the UDF module level.

{NAMESPACE} failed to create set

Severity:

CRITICAL

Context:

xdr

Additional information

A record belonging to a previously-unknown set was received via XDR, but the limit on set names has already been reached locally, so the new set could not be created. If it happens while Aerospike is starting up, this message may be followed by an abort with SIGUSR1.

Parameters:

namespace: The namespace where the set could not be created.

XDR digestlog cannot keep up with writes. Dropping record.

Severity:

WARNING

Context:

xdr

Removed:

5.0

Additional information

Aerospike is writing data faster to the XDR digestlog than the underlying disk can handle. If using raw-disk backed xdr storage, consider switching over to file-backed xdr storage for the xdr-digestlog-path parameter. This takes advantage of filesystem caching for reads and writes. Otherwise, you may need to use a faster disk or join multiple disks using RAID-0 to allow for faster read/write to xdr digestlog. Also ensure that dmesg is properly checked for disk failures and a SMART disk test is performed.

Digest Log Write Failed !!! ... Critical error

Severity:

WARNING

Context:

xdr

Additional information

Occurs: XDR digestlog has grown larger than its partition. See Digestlog partition out of space for more information.

WARNING (xdr-client): (ship.c:527) DC <DCNAME> receive error [2] on <xx.xx.xx.xx:3000>

Severity:

WARNING

Context:

xdr

Introduced:

5.0

Additional information

Occurs: On its own (i.e if no other associated warning like bad protocol or similar), this warning is actually benign. This warning can also happen if there is a connection reset while trying to read from the socket. A potential cause could be the remote side having a node being restarted.

summary: throughput 3722 inflight 164 dlog-outstanding 100 dlog-delta-per-sec -10.0

Severity:

INFO

Context:

xdr

Introduced:

3.9

Additional information

Parameters:

throughput: The current throughput, shipping to destination cluster(s). When shipping to multiple clusters the throughput will represent the combined throughput to all destination clusters. Corresponds to the xdr_throughput statistic.

inflight: The number of records that are inflight, meaning that have been sent to the destination cluster(s) but for which a response has not been received yet. Corresponds to the xdr_ship_inflight_objects statistic.

dlog-outstanding: The number of record's digests yet to be processed in the digest log. In parenthesis, the average change normalized to digests per second (over the 10 seconds interval separating those log lines). Corresponds to the xdr_ship_outstanding_objects statistic.

dlog-delta-per-sec: The variation of the dlog-outstanding normalized on a per second basis. Gives an idea whether the number of entries in the digestlog is increasing or decreasing over time and at what pace.

detail: sh 5588691 ul 12 lg 11162298 rlg 54 rlgi 0 rlgo 54 lproc 11162198 rproc 45 lkdproc 0 errcl 54 errsrv 0 hkskip 6303 hkf 6299 flat 0

Severity:

INFO

Context:

xdr

Introduced:

3.9

Additional information

Parameters:

sh: The cumulative number of records that have been attempted to be shipped since this node started, across all datacenters. If a record is shipped to 3 different datacenters, then this number will increment by 3. Corresponds to the sum of the xdr_ship_success, xdr_ship_source_error and xdr_ship_destination_error statistics.

ul: The number of record's digests that have been written to the node but not logged yet to the digestlog (unlogged).

lg: The number of record's digests that have been logged this includes both master and replica records but a node only ships records for which it owns the master partition and will process records belonging to its replica partitions only when a neighboring source node goes down.

rlog: Relogged digests. The number of record's digests that have been relogged on this node due to temporary failures when attempting to ship. Corresponds to the dlog_relogged statistic.

rlgi: Relogged incoming digests. The number of record's digest that another node sent to this node (typically prole side relog or partition ownership change). Corresponds to the xdr_relogged_incoming statistic.

rlgo: Relogged outgoing digests. The number of record's digest log entries that were sent to another node (typically prole side relog or partition ownership change). Corresponds to the xdr_relogged_outgoing statistic.

lproc: The number of record's digests that have been processed locally. A processed digest does not necessarily imply a shipped record (for example, replica digests don't get shipped unless a source node is down, and hotkeys also don't have all their updates necessarily shipped). Corresponds to the dlog_processed_main statistics.

rproc: The number of replica record's digests that have been processed by this node. A node will process records belonging to its replica partitions only when a neighboring source node goes down. Corresponds to the dlog_processed_replica statistic.

lkdproc: The number of record's digests that have been processed as part of a linked down session. A link down session is spawned when a full destination cluster is down or not reachable. Corresponds to the dlog_processed_link_down statistic.

errcl: The number of errors encountered when attempting to ship due to the embedded client. For example, if the local XDR embedded client is having issues or delays in establishing connections. Corresponds to the xdr_ship_source_error statistic.

errsrv: The number of errors encountered when attempting to ship due to the destination cluster. For example if the destination cluster is temporarily overloaded. Corresponds to the xdr_ship_destination_error statistic.

hkskip: Hotkey skipped. Represents the number of record's digests that are skipped due to an already existing entry in the reader's thread cache (meaning a version of this record was just shipped). Corresponds to the xdr_hotkey_skip statistic.

hkf: Hotkey fetched. Represents the number of record's digest that are actually fetched and shipped because their cache entries expired and were dirty. Corresponds to the xdr_hotkey_fetch statistics.

flat: The average time in milliseconds to fetch records locally (this is an exponential moving average - 95/5). Corresponds to the xdr_read_latency_avg statistic.

WARNING (xdr): (dc.c:2356) {namespaceName} DC DC1 abandon result -4

Severity:

WARNING

Context:

xdr

Introduced:

5.0.0

Additional information

Occurs: In strong-consistency enabled namespaces, if XDR finds a record which has not been replicated, re-replication from master to replica will be triggered. XDR will print this warning message when this happens. XDR will attempt to ship that record again. Refer to the XDR Delays article for details on how XDR handles record in strong-consistency enabled namespaces. Refer to the XDR 5.0 Error Codes article for details on other error codes.

INFO (info): (dc.c:1024) xdr-dc DC1: lag 0 throughput 0 latency-ms 0 in-queue 0 outstanding 0 complete (3000,0,0,0) retries (0,0) recoveries (4096,0) hot-keys 0

Severity:

INFO

Context:

xdr

Introduced:

5.0.0

Removed:

5.1.0

Additional information

Parameters:

lag: See description of lag metric.

throughput: See description of throughput metric.

latency-ms: See description of latency_ms metric.

in-queue: See description of in_queue metric.

in-progress: See description of in_progress metric.

complete: Composed of the following metrics:

retries: Composed of the following metrics:

recoveries: Composed of the following metrics:

hot-keys: See description of hot_keys metric.

INFO (info): (dc.c:1353) xdr-dc dc2: lag 12 throughput 710 latency-ms 19 in-queue 250563 in-progress 81150 complete (1002215,0,0,0) retries (0,0,23) recoveries (2048,0) hot-keys 4655

Severity:

INFO

Context:

xdr

Introduced:

5.1.0

Removed:

5.3.0

Additional information

Parameters:

lag: See description of lag metric.

throughput: See description of throughput metric.

latency-ms: See description of latency_ms metric.

in-queue: See description of in_queue metric.

in-progress: See description of in_progress metric.

complete: Composed of the following metrics:

retries: Composed of the following metrics:

recoveries: Composed of the following metrics:

hot-keys: See description of hot_keys metric.

INFO (info): (dc.c:1353) xdr-dc dc2: nodes 8 lag 12 throughput 710 latency-ms 19 in-queue 250563 in-progress 81150 complete (1002215,0,0,0) retries (0,0,23) recoveries (2048,0) hot-keys 4655 In the XDR log line we use the following values when multiple namespaces are shipped to a DC: - nodes refers to DC-level alone - lag is the max of all namespaces - latency is the average across namespaces - everything else is the sum across namespaces

Severity:

INFO

Context:

xdr

Introduced:

5.3.0

Removed:

6.0

Additional information

Parameters:

nodes: See description of nodes metric.

lag: See description of lag metric.

throughput: See description of throughput metric.

latency-ms: See description of latency_ms metric.

in-queue: See description of in_queue metric.

in-progress: See description of in_progress metric.

complete: Composed of the following metrics:

retries: Composed of the following metrics:

recoveries: Composed of the following metrics:

hot-keys: See description of hot_keys metric.

[DC_NAME]: dc-state CLUSTER_UP timelag-sec 2 lst 1468006386894 mlst 1468006389647 (2016-07-08 19:33:09.647 GMT) fnlst 0 (-) wslst 0 (-) shlat-ms 0 rsas-ms 0.004 rsas-pct 0.0 con 384 errcl 0 errsrv 0 sz 6

Severity:

INFO

Context:

xdr

Introduced:

3.9

Additional information

Occurs: Every 1 minute, for each configured destination cluster (or DC).

Parameters:

[DC_NAME]: Name and status of the DC. Here are the different statuses: CLUSTER_INACTIVE, CLUSTER_UP, CLUSTER_DOWN, CLUSTER_WINDOW_SHIP. Corresponds to the dc_state statistic.

timelag-sec: The lag in seconds. This is computed as the difference between the current time and the time-stamp of the record that was last successfully shipped. This provides a sense of how 'far behind' the destination cluster lags behind the source cluster. This does not correspond to the time it will take the source cluster to 'catch up', neither does it necessarily relates to the number of outstanding digests to be processed. Corresponds to the dc_timelag statistic.

lst: The overall last ship time for the node (the minimum of all last ship times on this node).

mlst: The main last ship time (the last ship time of the dlogreader).

fnlst: The failed node last ship time (the minimum of the last ship times of all failed node shippers running on this node).

wslst: The window shipper last ship time (the minimum of the last ship times of all window shippers running on this node).

shlat-ms: Corresponds to the xdr_ship_latency_avg statistic.

rsas-ms: Average sleep time for each write to the DC for the purpose of throttling. Corresponds to the dc_ship_idle_avg statistic. (Stands for remote ship average sleep ms).

rsas-pct: Percentage of throttled writes to the DC. Corresponds to the dc_ship_idle_avg_pct statistic. (Stands for remote ship average sleep pct).

con: Number of open connection to the DC. If the DC accepts pipeline writes, there will be 64 connections per destination node. Only available as of version 3.11.1.1. Corresponds to the dc_open_conn statistic.

errcl: Number of client layer errors while shipping records for this DC. Errors include timeout, bad network fd, etc. Only available as of version 3.11.1.1. Corresponds to the dc_ship_source_error statistic.

errsrv: Number of errors from the remote cluster(s) while shipping records for this DC. Errors include out-of-space, key-busy, etc. Only available as of version 3.11.1.1. Corresponds to the dc_ship_destination_error statistic.

sz: The cluster size of the destination DC.. Only available as of version 3.11.1.1. Corresponds to the dc_size statistic.

dlog-q: capacity 64 used-elements 1 read-offset 0 write-offset 1

Severity:

DEBUG

Context:

xdr

Introduced:

3.9

Additional information

Provide status info on the dlog-q. The dlog-q queue is the in-memory digest log queue. Digests of records that have been written get put on this in-memory queue. The dlogwriter picks them from there and puts them in the on-disk digest log. See below for the details on the 4 numbers.

Occurs: Every 1 minute.

Parameters:

capacity: The size of the queue.

used-elements: The number of elements in the queue.

read-offset: The read pointer of the queue. In general, the number of elements in the queue is the difference between the read pointer and the write pointer, i.e., the number of elements that have been written to the queue but haven't yet been read.

write-offset: The write pointer of the queue. In general, the number of elements in the queue is the difference between the read pointer and the write pointer, i.e., the number of elements that have been written to the queue but haven't yet been read.

dlog: used global lastshiptime 1490623097271 (2017-03-27 13:58:17 GMT) and reclaimed 0 records. dlog-free-pct=93

Severity:

INFO

Context:

xdr

Introduced:

3.9

Removed:

3.12.1

Additional information

Provides digest log (dlog) related information.

Occurs: Every 1 minute.

Parameters:

used global lastshiptime: The minimum last ship time across all nodes in the cluster. Corresponds to the xdr_min_lastshipinfo statistics. This is used to know until what point can slots in the digest log be reclaimed, by keeping track of the oldest last ship time across all nodes in the cluster. Introduced in version 3.10.0.3.

reclaimed: Indicates how many digests were safely 'removed' from the digestlog. As shipping successfully proceeds, and records are shipped, digests which for sure are not necessary anymore can have their space reclaimed in the digest log. A linked down (destination cluster down or unreachable) is an example where digest log space cannot be reclaimed.

dlog-free-pct: Percentage of the digest log free and available for use. Corresponds to the dlog_free_pct statistic. Introduced in version 3.12.1.

dlog: free-pct 93 reclaimed 2456 glst 1490623097271 (2017-03-27 13:58:17 GMT)

Severity:

INFO

Context:

xdr

Introduced:

3.12.1

Additional information

Provides digest log (dlog) related information.

Occurs: Every 1 minute.

Parameters:

free-pct: Percentage of the digest log free and available for use. Corresponds to the dlog_free_pct statistic.

reclaimed: Indicates how many digests were safely 'removed' from the digestlog. As shipping successfully proceeds, and records are shipped, digests which for sure are not necessary anymore can have their space reclaimed in the digest log. A linked down (destination cluster down or unreachable) is an example where digest log space cannot be reclaimed.

glst: The minimum last ship time across all nodes in the cluster. Corresponds to the xdr_global_lastshiptime statistics. This is used to know until what point can slots in the digest log be reclaimed, by keeping track of the oldest last ship time across all nodes in the cluster.

Failed to seek during reclaim. Leaving sptr as is

Severity:

INFO

Context:

xdr

Additional information

Benign message if node on which this is logged did not receive new writes into the digest log. A background process running every 1 minute tries to reclaim the digest log based on a timestamp. It starts sampling the log by looking at the timestamp of the last record written in the digestlog. If it doesn't find a single record, it bails out early which would print this message.

'XXXX' cluster does not support pipelining

Severity:

WARNING

Context:

xdr

Additional information

This message usually indicates that one of the destinations is running an older version of XDR (pre-3.8) or there exist a misconfiguration of the kernel configs /proc/sys/net/core/wmem_max and /proc/sys/net/core/rmem_max between source and destinations clusters. For more information see xdr-showing-buffer-limit-err.

throughput 3722 : inflight 164 : dlog outstanding 100 (-10.0/s)

Severity:

INFO

Context:

xdr

Introduced:

3.8.1

Removed:

3.9

Additional information

Parameters:

throughput: The current throughput, shipping to destination cluster(s). When shipping to multiple clusters the throughput will represent the combined throughput to all destination clusters. Corresponds to the cur_throughput statistic.

inflight: The number of records that are inflight, meaning that have been sent to the destination cluster(s) but for which a response has not been received yet. Corresponds to the stat_recs_inflight statistic.

dlog outstanding: The number of record's digests yet to be processed in the digest log. In parenthesis, the average change normalized to digests per second (over the 10 seconds interval separating those log lines). Corresponds to the stat_recs_outstanding statistic.

sh 5588691 : ul 12 : lg 11162298 : rlg 54 : lproc 11162198 : rproc 45 : lkdproc 0 : errcl 54 : errsrv 0 : hkskip 6303 6299 : flat 0

Severity:

INFO

Context:

xdr

Introduced:

3.8.1

Removed:

3.9

Additional information

Parameters:

sh: The cumulative number of records that have been shipped since this node started. Corresponds to the stat_recs_shipped statistic.

ul: The number of record's digests that have been written to the node but not logged yet to the digestlog (unlogged).

lg: The number of record's digests that have been logged this includes both master and replica records but a node only ships records for which it owns the master partition and will process records belonging to its replica partitions only when a neighboring source node goes down.

rlog: The number of records that have been relogged on this node due to temporary failures when attempting to ship. Corresponds to the stat_recs_relogged statistic.

lproc: The number of record's digests that have been processed locally. A processed digest does not necessarily imply a shipped record (for example, replica digests don't get shipped unless a source node is down, and hotkeys also don't have all their updates necessarily shipped). Corresponds to the stat_recs_logged statistics.

rproc: The number of replica record's digests that have been processed by this node. A node will process records belonging to its replica partitions only when a neighboring source node goes down. Corresponds to the stat_recs_replprocessed statistic.

lkdproc: The number of record's digests that have been processed as part of a linked down session. A link down session is spawned when a full destination cluster is down or not reachable. Corresponds to the stat_recs_linkdown_processed statistic.

errcl: The number of errors encountered when attempting to ship due to the embedded client. For example, if the local XDR embedded client is having issues or delays in establishing connections. Corresponds to the err_ship_client statistic.

errsrv: The number of errors encountered when attempting to ship due to the destination cluster. For example if the destination cluster is temporarily overloaded. Corresponds to the err_ship_server statistic.

hkskip: There are 2 numbers to keeping track of hotkeys related processing. The first one represents the number of record's digests that are skipped due to an already existing entry in the reader's thread cache (meaning a version of this record was just shipped). The second one represents the number of record's digest that are actually shipped because their cache entries expired and were dirty. Corresponds to the noship_recs_hotkey and noship_recs_hotkey_timeout statistics.

flat: The average time in milliseconds to fetch records locally (this is an exponential moving average - 95/5). Corresponds to the local_recs_fetch_avg_latency statistic.

[DC_NAME] CLUSTER_UP : timelag 1 secs : lst 1460929221893 (2016-04-17 21:40:21.893 GMT) : mlst 1460929223000 (2016-04-17 21:40:23.000 GMT) : fnlst 0 (-) : wslst 0 (-) : shlat 0 ms

Severity:

INFO

Context:

xdr

Introduced:

3.8.1

Removed:

3.9

Additional information

Occurs: Every 1 minute, for each configured destination cluster (or DC).

Parameters:

[DC_NAME]: Name and status of the DC. Here are the different statuses: CLUSTER_INACTIVE, CLUSTER_UP, CLUSTER_DOWN, CLUSTER_WINDOW_SHIP. Corresponds to the dc_state statistic.

timelag: The lag in seconds. This is computed as the difference between the current time and the time-stamp of the record that was last successfully shipped. This provides a sense of how 'far behind' the destination cluster lags behind the source cluster. This does not correspond to the time it will take the source cluster to 'catch up', neither does it necessarily relates to the number of outstanding digests to be processed. Corresponds to the xdr_timelag statistic.

lst: The overall last ship time for the node (the minimum of all last ship times on this node).

mlst: The main last ship time (the last ship time of the dlogreader).

fnlst: The failed node last ship time (the minimum of the last ship times of all failed node shippers running on this node).

wslst: The window shipper last ship time (the minimum of the last ship times of all window shippers running on this node).

shlat: Corresponds to the latency_avg_ship statistic.

logq : (128 1 0 1)

Severity:

INFO

Context:

xdr

Introduced:

3.8.1

Removed:

3.9

Additional information

Provide status info on the lgoq. The logq queue is the in-memory digest log queue. Digests of records that have been written get put on this in-memory queue. The dlogwriter picks them from there and puts them in the on-disk digest log. See below for the details on the 4 numbers.

Occurs: Every 1 minute.

Parameters:

1st number: The size of the queue.

2nd number: The number of elements in the queue.

3rd number: The read pointer of the queue. In general, the number of elements in the queue is the difference between the read pointer and the write pointer, i.e., the number of elements that have been written to the queue but haven't yet been read.

4th number: The write pointer of the queue. In general, the number of elements in the queue is the difference between the read pointer and the write pointer, i.e., the number of elements that have been written to the queue but haven't yet been read.

Reclaimed 469400 records space in digest log...

Severity:

INFO

Context:

xdr

Introduced:

3.8.1

Removed:

3.9

Additional information

Indicates how many digests were safely 'removed' from the digestlog. As shipping successfully proceeds, and records are shipped, digests which for sure are not necessary anymore can have their space reclaimed in the digest log. A linked down (destination cluster down or unreachable) is an example where digest log space cannot be reclaimed.

Occurs: Every 1 minute.