Skip to main content
Loading

Aerospike Unique Data Agent

Tracks unique data usage of an Aerospike cluster. The agent runs as a service that monitors an entire cluster. It polls the cluster for statistics on memory and disk usage relevant to your license agreement and stores them for later processing. The uda integrates with asadm but can also be used with a custom client using the REST API.

To display the help documentation use the -u or --help flag.

> uda --help
An agent for monitoring and querying the unique data usage of your
Aerospike database. Start the agent with the 'start' command. The agent
when started puts entries in a store file. The agent listens on the --agent-port
for requests to query and filter unique data entries. For convenience, there is
an additional 'log' command to query and filter the entries file
created by the agent offline. Additionally, the 'data' command allows you to display your
current unique data usage.

Usage:
uda [command]

Available Commands:
data Gets the current data point. Does not log an entry.
help Help about any command
log For offline processing of the uda.store file.
start Start the agent.

Flags:
--config string Config file (default is /etc/aerospike/astools.conf)
-u, --help Display help information
--instance string For support of the aerospike tools toml schema. Sections with the
instance are read. e.g in the case where instance 'a' is specified
sections 'cluster_a', 'uda_a' are read.
-v, --version version for uda

Use "uda [command] --help" for more information about a command.

Install

Aerospike Unique Data Agent is provided by Aerospike-Tools version 7.1.1 and later, which is bundled within our Server Packages server version 6.0.0.5 and later.

  1. Edit the config file located at /etc/aerospike/astools.conf to connect to the Aerospike Database. Refer to Configuration and Aerospike Tools Configuration for more details.

  2. Enable the agent

    • For systemd supporting systems

      systemctl enable uda.service
    • Other systems (docker)

      ./usr/bin/uda --config ./etc/aerospike/astools.conf
  3. Check that the agent started successfully

    journalctl -u uda.service -f

    or

    systemctl status
  4. To try out the API and see API documentation run the following command and navigate to the provided url:

journalctl -u uda.service | grep API
Jun 29 14:09:05 ubuntu unique-data[1116531]: time="2021-06-29T14:09:05-07:00" level=info msg="API documentation can be found at http://192.168.1.1:8080/v1/swagger/index.html"

Configuration

Refer to uda --help below for a list of parameters that may be required to run uda within your cluster. You can configure uda using tools configuration files. Please see Aerospike Tools Configuration for more details.

Extra features available to the uda:

  • To keep sensitive information out of command history, create a file with a single line containing the value for the configuration parameter of your choice. Edit the configuration file and provide file:<filename> where <filename> is the name of the file you just created. Use this file to store any configuration options.

  • Providing an environment variable is similar except use env:<variable-name>

  • Providing a base64 encoded environment variable is similar except use env-b64:<variable-name>

  • Providing a base64 encoded value is similar except use b64:<base64-encoded-value>

uda start --help                                                                                                                                             15.408s
Starts the agent, stores entries in the provided --store-file
(default: /var/log/aerospike/uda.store), and listens on --agent-port (default: 8080) for
requests to query entries.

Usage:
uda start [flags]

Flags:
-a, --agent-port int Port number for agent to listen on. (default 8080)
--auth INTERNAL,EXTERNAL,PKI The authentication mode used by the server. INTERNAL uses
standard user/pass. EXTERNAL uses external methods (like LDAP) which are
configured on the server. EXTERNAL requires TLS. PKI allows TLS
authentication and authorization based on a certificate. No user name needs to
be configured. (default INTERNAL)
-h, --host host[:tls-name][:port][,...] The aerospike host. (default 127.0.0.1)
-P, --password "env-b64:<env-var>,b64:<b64-pass>,file:<pass-file>,<clear-pass>" The aerospike password to use to connect to the aerospike
cluster.
-p, --port int The default aerospike port. (default 3000)
-f, --store-file string Specify custom log file. (default "/var/log/aerospike/uda.store")
--tls-cafile env-b64:<cert>,b64:<cert>,<cert-file-name> The CA for the agent.
--tls-certfile env-b64:<cert>,b64:<cert>,<cert-file-name> The certifcate file of the agent for mutual TLS authentication.
--tls-enable Enable TLS authentication. If false, other tls options are
ignored.
--tls-keyfile env-b64:<cert>,b64:<cert>,<cert-file-name> The key file of the agent for mutual TLS authentication.
--tls-keyfile-password "env-b64:<env-var>,b64:<b64-pass>,file:<pass-file>,<clear-pass>" The password used to decrypt the key-file if encrypted.
--tls-name string The server TLS context to use to authenticate the connection.
--tls-protocols "[[+][-]all] [[+][-]TLSv1] [[+][-]TLSv1.1] [[+][-]TLSv1.2]" Set the TLS protocol selection criteria. This format is the same
as Apache's SSLProtocol documented at
https://httpd.apache.org/docs/current/mod/mod_ssl.html#ssl protocol. (default TLSV1.2)
-U, --user string The aerospike user to use to connect to the aerospike cluster.

Global Flags:
--config-file string Config file (default is /etc/aerospike/astools.conf)
-u, --help Display help information
--instance string For support of the aerospike tools toml schema. Sections with the
instance are read. e.g in the case where instance 'a' is specified
sections 'cluster_a', 'uda_a' are read.

Unique Data Calculation

Server 7.0 and later

Server 7.0 unified the different storage engine type simplifying the unique data calculation.

DISCOUNT_PER_RECORD = 39

For each namespace:
NS Replication Factor = The replication factor for the namespace
NS Master Objects = sum(Master objects for each node in this namespace)
NS Data Bytes = sum("data_used_bytes" / "data_compression_ratio") for each node in the namespace

Total Unique Usage Bytes = Total Unique Usage Bytes + (NS Data Bytes / NS Replication Factor) - (DISCOUNT_PER_RECORD * NS Master Objects)

Server 6.4 and earlier

Only 1 of the following should be greater than zero: NS Memory Bytes, NS Device Bytes, NS Pmem Bytes

Total Unique Usage Bytes = 0
DISCOUNT_PER_RECORD = 35 bytes for servers 5.7 and earlier, 39 for servers 6.0 and later.

For each namespace:
NS Replication Factor = The replication factor for the namespace
NS Master Objects = sum(Master objects for each node in this namespace)
NS Device Bytes = sum("device_used_bytes" / "device_compression_ratio") for each node in the namespace
NS Pmem Bytes = sum("pmem_used_bytes" / "pmem_compression_ratio") for each node in the namespace

If NS Device Usage Bytes is 0 and NS Pmem Usage Bytes is 0 then
NS Memory Bytes = sum("memory_used_data_bytes" + "memory_used_index_bytes" for each node in the namespace)
Else then
NS Memory Bytes = 0

Total Unique Usage Bytes = Total Unique Usage Bytes + ((NS Memory Bytes + NS Device Bytes + NS Pmem Bytes) / NS Replication Factor) - (DISCOUNT_PER_RECORD * NS Master Objects)

Storing Entries

A store file stores each entry every hour on the hour. A custom log path can be provided using --store-file or -f otherwise the default /var/log/aerospike/uda.store is used. Entries are also printed to stderr using the logger.

Entries

There are two type of data-entries, "info" and "error" determined by the "level" field.
Each entry type has the same fields for simplicity. An "info" entry has all fields filled in except "errors". An "error" entry has an "errors" list with length greater than zero and will not have accurate values for some or all other fields depending on where in the logging process the error occurred. Each request to retrieve data-entries will receive zero or more json objects of the form:

{
"level":"info",
"cluster_name":"null",
"cluster_generation":1,
"cluster_stable": true,
"node_count":5,
"hours_since_start":15,
"time":"2021-06-19T15:51:00.020869089-07:00",
"master_objects":58754,
"unique_data_bytes":927463,
"namespaces": {
<ns1>: {
"master_objects":13507,
"unique_data_bytes":453211,
},
. . .
},
"errors":[]
}

API

Get

v1/entries?{key=key&val=value}

Get all entries. Filtering is allowed by using the key and val parameter which allows filtering by key values, i.e. only return entries with "level"="error" or "cluster_name"="null".

Response: 200

{
"entries":
[
{
"cluster_name":"null",
"cluster_generation":1,
"node_count":1,
"hours_since_start":0,
"time":"2021-06-19T15:51:00.020869089-07:00",
"level":"info",
"master_objects":0,
"unique_data_bytes":0,
"errors":[]
},
. . .
]
}

v1/entries/range/index{?start=start-index&end=end-index&key=key&val=value}

Indicates that the request wants to filter the entries by their index. The first entry will have index = 0 and the last entry will have index = # of entries - 1. Both start and end are optional and will default to the first and last entry respectively. Entries are always sorted by insertion order so a lower index will be earlier in time than an higher index. Additionally key, value filtering is allowed after the index filtering.

Response: 200

{
"entries":
[
{
"cluster_name":"null",
"cluster_generation":1,
"node_count":1,
"hours_since_start":0,
"time":"2021-06-19T15:51:00.020869089-07:00",
"level":"info",
"master_objects":0,
"unique_data_bytes":0,
"errors":[]
},
. . .
]
}

v1/entries/range/time{?start=start-time&end=end-time&key=key&val=value}

Indicates that the request wants to filter the entries by their datetime. Value for start-time and end-time should be in the ISO-8601 extended format RC3339. Both start and end are optional and will default to the first and last entry respectively. Entries are always sorted by insertion order so the response list will be in temporal order. Additionally key, value filtering is allowed after datetime filtering.

Response: 200

{
"entries":
[
{
"cluster_name":"null",
"cluster_generation":1,
"node_count":1,
"hours_since_start":0,
"time":"2021-06-19T15:51:00.020869089-07:00",
"level":"info",
"master_objects":0,
"unique_data_bytes":0,
"errors":[]
},
. . .
]
}

v1/health

A health check endpoint that will return metrics related to the health of the service. Response: 200

{
"health": {
"total_missed_since_start": 1234,
"recent_missed_since_start": 0,
"hours_since_start":1484,
}
}

v1/ping

A endpoint to check if a connection can be established with the service. Response:

200: string("ping")

Integration with Asadm

Asadm can connect to the agent when creating a collectinfo archive and when running the summary command. In Asadm v. 2.5 and later these commands now have --agent-host and --agent-port flags for specifying how to connect to the agent. Providing these flags gives asadm the ability to display additional statistics on unique data usage, namely, min, max, and average data usage. Note that the Latest metric displayed shows the last measured data usage as reported by the agent. This means the Latest value may be up to an hour old. Asadm 2.8 and later added the ability to display namespace level license usage, include entries where the cluster was reportedly unstable in the summary aggregation using the --agent-unstable flag, and include the uda.store in the collectinfo using the --agent-raw-store flag.

Admin> summary --agent-host <agent-host> --agent-port <agent-port>
~~~~~~~~~~~~~~~~~~~~~~~~~~Cluster Summary~~~~~~~~~~~~~~~~~~~~~~~~~~
Migrations |False
Server Version |E-6.0.0.1
OS Version |Ubuntu 20.04.3 LTS (5.4.0-121-generic)
Cluster Size |1
Devices Total |0
Devices Per-Node |0
Devices Equal Across Nodes|True
Memory Total |8.000 GB
Memory Used |8.812 MB
Memory Used % |0.11
Memory Avail |7.991 GB
Memory Avail% |99.89
License Usage Latest |0.000 B
License Usage Latest Time |2022-07-13T15:00:00-07:00
License Usage Min |0.000 B
License Usage Max |11.711 MB
License Usage Avg |4.888 MB
Active |0
Total |2
Active Features |SIndex
Number of rows: 20

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Summary~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace|~~~~Drives~~~~|~~~~~~~Memory~~~~~~~|Replication| Master|~~~~~~~~~~~~~~~~~~~~~~~~~License Usage~~~~~~~~~~~~~~~~~~~~~~~~~
|Total|Per-Node| Total|Used|Avail%| Factors|Objects| Latest| Latest Time| Min| Max| Avg
| | | | %| | | | | | | |
bar | 0| 0|4.000 GB| 0.0| 100.0| 2|0.000 |0.000 B |2022-07-13T15:00:00-07:00|0.000 B |6.104 MB|918.963 KB
test | 0| 0|4.000 GB|0.22| 99.78| 2|0.000 |0.000 B |2022-07-13T15:00:00-07:00|0.000 B |5.607 MB| 3.990 MB
Number of rows: 2