Skip to main content

Backing up and Restoring Data with asbackup and asrestore

By definition, Aerospike replicates data among nodes of a cluster and across datacenters. However, common policy in computer operations is to secure your data by backing it up in case of a catastrophic disaster. Most organizations have a schedule and procedures for backup.

The Aerospike tools for backing up and restoring are asbackup and asrestore, which are part of the aerospike-tools package and are included with the Aerospike Enterprise Edition. Source code for these tools is available on github.

Backing up

The discussion here deals with only the most essential backup command and some common variations. Some frequently asked questions are also covered in knowledge-base article FAQ Asbackup.

Here are some considerations before you backup.

Namespace, nodes, and backup location

  • Determine the namespace you want to backup.
  • Decide whether you want to backup to a directory with individual backup files or backup to a single file.

What is backed up

By default all of the following data are backed up:

  • Keys
    • Key metadata: digest, TTL, generation count, and key.
    • Regular bins: string, integer, boolean, and binary.
    • collection data type (CDT) bins: list and map.
    • GeoJSON data type bins
    • HyperLogLog data type bins
  • Secondary index definitions
  • User-Defined Function (UDF) modules

For the exact backup file format, see the file format specification.

Cluster configurations not important for backup/restore

Backup and restore are cluster-configuration-agnostic. A backup can be restored to a cluster of any size and configuration. Restored data is evenly distributed among cluster nodes, regardless of that cluster's configuration.

Estimating disk space for the backup

Be sure you have enough room on disk to store the backup data.

For an estimate of the size of a single key, use the --estimate option. As shown in the following example, this option reads 10,000 keys from the specified namespace and prints the average size of the sampled keys.

asbackup --namespace namespaceName --estimate

Multiply the displayed estimated key size by the number of keys in the namespace and add 10% of the result for overhead and indexes:

Formula to calculate approximate disk space for backup
Estimated average key size from asbackup --estimate
× Number of keys in namespace
+ 10% of estimated key size
= x

asbackup command basics and useful variations

The most basic syntax of asbackup is as follows.

asbackup --host nodeIpAddressOrName --namespace namespaceName --directory pathToDirectoryForBackupFiles

where options and arguments are as follows:

  • --host nodeIpAddressOrName specifies any cluster node's IP address or hostname to back up.
  • --namespace namespaceName is the name of the namespace to back up. asbacksup a single namespace at a time.
  • --directory pathToDirectoryForBackupFiles is the name of a directory where the backed-up data are written to. Data are stored in multiple files with the .asb file extension. By default, each backup file is limited to 250 MiB. When this limit is reached, asbackup creates a new file.

Backing up to a single file

You can back up the cluster to a single file, rather than a directory:

$ asbackup --host nodeIpAddressOrName --namespace namespaceName --output-file nameOfBackupFile

Incremental backup

For Aerospike Server and tools version 3.12 and later, for incremental backup, you can use the following options, where the argument YYYY-MMM-DD_HH:MM:SS is the timestamp variable:

  • --modified-after YYYY-MMM-DD_HH:MM:SS backs up keys timestamped after the argument.
  • --modified-before YYYY-MMM-DD_HH:MM:SS backs up keys timestamped before the argument.

You may also backup by partitions to do incremental backups. See the documentation on partition lists.

Writing to stdout and piping

Instead of --output-file or --directory, use - to write the backup data to stdout. This is useful for pipes. The following example writes backup data to stdout with - and pipes to the gzip command to compress the output to the file:

$ asbackup --host nodeIpAddress --namespace namespaceName - | gzip nameOfBackupFile.gz

Note that the gzip utility is single-threaded. This may cause single CPU core saturation and produce a bottleneck. To take advantage of multi-core archive utilities, consider using xz instead.

Also note that a more updated method of compressing backup file data is to use the --compress runtime option. See compression and encryption.

Compression and Encryption

As of version 3.6.1, you may compress and/or encrypt backup file data as it is being written to the backup file with --compress and --encrypt. Each are enabled by passing the corresponding option followed by the algorithm to be used.

Here are the available compression algorithms:

AlgorithmDescription
zstdZstd compression, from facebook libztsd

Example:

$ asbackup --host nodeIpAddress --namespace namespaceName --compress zstd

As of version 3.11.0, you may also specify the compression level to be used by zstd via the --compression-level option. The levels supported are integers described by zstd. For more information see the zstd manual. The default value is decided by zstdlib's ZSTD_CLEVEL_DEFAULT which is currently 3.

Example:

$ asbackup --host nodeIpAddress --namespace namespaceName --compress zstd --compression-level 3

Here are the available encryption algorithms:

AlgorithmDescription
aes128AES 128-bit key-digest encryption, which uses the CTR128 algorithm to encrypt data. The SHA256 hash of the encryption key is used to generate the key used by CTR128
aes256AES 256-bit key-digest encryption, which is again the same, only using a 256-bit digest of the key for encryption and AES256 as the base encryption algorithm

For encryption, a private key must also be provided. The two ways of providing encryption keys are through an encryption key file in PEM format (with --encryption-key-file), or a base-64 encoded key passed in through an environment variable (with --encryption-key-env).

Examples:

With an encryption key file:

$ asbackup --host nodeIpAddress --namespace namespaceName --encrypt aes128 --encryption-key-file /path/to/key.pem

With an environment variable:

$ export PRIVATE_KEY='dGVzdCBrZXk='
$ asbackup --host nodeIpAddress --namespace namespaceName --encrypt aes256 --encryption-key-env PRIVATE_KEY

Note that when restoring compressed/encrypted backup files, the exact same compression/encryption flags must be provided to asrestore.

Safety of backup files

Be sure to store the backed-up data in a safe location, in case it is needed.

Other asbackup options and command help

asbackup has options described below that you might want to investigate. For more detail, typeasbackup --help or see these asbackup command-line options.

  • Backing-up specific nodes or connecting to a port other than the default 3000.
  • Securing connections via username/password or TLS certificates or both.
  • Backing up specific bins.
  • Backing up specific sets.
  • Using configuration files to help automate backups.

Backup resumption

Starting with asbackup 3.9.0, if a backup job is interrupted, or errors for any reason other than a failure to write to the disk (even killing the backup with ctrl-C), the backup state is saved to a .state file. Pass the path to this .state file to the --continue flag to resume the backup. All of the same command line arguments (except --remove-files) must be used when continuing a backup.

Restoring

The discussion deals with only the most essential restore commands and some common variations. Some frequently asked questions are also covered in knowledge-base article FAQ Asrestore.

Prerequisites and notes for restore

asrestore can restore only backups from Aerospike Server and tools version 3.0 or later. To restore a backup from earlier releases, contact Aerospike Support.

The TTL of restored keys is preserved, but the last-update-time and generation count are reset to the current time.

asrestore command basics and useful variations

The most basic syntax of asrestore is as follows.

asrestore --host nodeIpAddressOrName --directory pathToDirectoryOfBackupFiles

where options and arguments are as follows:

  • --host nodeIpAddressOrName specifies the cluster node's IP address or hostnames to be restored.
  • --directory pathToDirectoryOfBackupFiles is the name of a directory where the back-up files are.

Restoring from a single backup file

If you backed up to a single file, use the following syntax to restore from it:

asrestore --host nodeIpAddressOrName --input-file pathToBackupFile

Restoring to a different namespace

By default, data are restored to the namespace that was backed-up. Use the --namespace option to restore to a different namespace. You must specify the comma-separated old and new namespace names.

$ asrestore --host nodeIpAddressOrName --directory pathToDirectoryOfBackupFiles --namespace oldNamespaceName,newNamespaceName

Dealing with existing data-the Write Policy

The target namespace might already contain keys with the same IDs as were backed up. The Write Policy of asrestore, its logic for handling existing keys, is as follows:

  1. If the record from the backup is expired (based on its TTL value), the backup record is ignored.
  2. If the record does not exist in the namespace, the backup record is added to the namespace.
  3. If an older version of the record (that is, with a lower generation count) already exists in the namespace, the backup record is restored. If you want asrestore to ignore this condition, you can specify this option:
  • --unique: asrestore does not touch any existing records, regardless of generation counts.
  1. If a newer version of the record (that is, with a higher or same generation count) already exists in the namespace, the backup record is ignored. If you want asrestore to ignore this condition, you can specify this option:
  • --no-generation: asrestore overwrites any existing records, regardless of generation count.
  1. If the record in the namespace contains bins that are not present in the backup, those bins in the namespace are preserved. If you want asrestore to ignore this condition, you can specify this option:
  • --replace: When restoring a record from the backup, asrestore does not preserve namespace bins that are not present in the backup.

Reading from stdin, piping, and uncompressing

Instead of --input-file or --directory, use - with standard Unix pipes to read the backup data from stdin.

The following 3 usage examples uncompress a gzip file and then pipe the data to asrestore with the - option to read from stdin:

gunzip -c nameOfBackupFile.gz | asrestore --host nodeIpAddressOrName -
zcat nameOfBackupFile.gz | asrestore --host nodeIpAddressOrName -
cat nameOfBackupFile.gz | gzip -d | asrestore --host nodeIpAddressOrName -

This example cats a single, uncompressed backup file and pipes the data to asrestore's stdin with the - option:

cat pathOfSingleBackupFile | asrestore --host nodeIpAddress -  

Other asrestore options and command-line help

asrestore has options described below that you might want to investigate. For more detail, typeasrestore --usageor see these asrestore command-line options.

  • Restoring to specific nodes or connecting to a port other than the default 3000.
  • Securing connections via username/password or TLS certificates or both.
  • Restoring specific bins.
  • Restoring specific sets.
  • Using configuration files to help automate restores.

Transaction retries

  • Failed Record Uploads: If a transaction fails it is retried according to --max-retries and --retry-scale-factor. By default these are 5 and 15000us respectively. An exponential backoff strategy is followed where the delay is retry-scale-factor * 2 ** (retry_attempts - 1), or 0 on the first try. If --max-retries is exceeded the transaction is counted as a failure in the info level log output. Note: --retry-delay and --sleep-between-retries are deprecated in favor of --retry-scale-factor.

Possible error or informational messages from asrestore

  • Record exists: When the --unique option is used, this informational message is displayed.
  • Generation mismatch: The backup copy and existing copy of a key do not match, and so the key is not restored. You can override this behavior with the --no-generation option.
  • Invalid username or password: The wrong username or password was specified on the command line.