Backing up and Restoring Data with asbackup and asrestore
Aerospike replicates data among the nodes of a cluster and across datacenters. While this provides strong data resilience, we recommend using asbackup
to make regular backups of your data. The asrestore
utility restores backups made with asbackup
. These are part of the aerospike-tools
package, and are included with the Aerospike Enterprise Edition.
Making backups
This section describes the most essential backup commands and some common variations. Some frequently asked questions are also covered in the Knowledge Base article asbackup
FAQ.
Consider the following before designing your backup plan.
Namespace, nodes, and backup location
- Determine the namespace you want to back up.
- Decide whether you want to back up to a directory with individual backup files, or back up to a single file.
What is backed up
By default all of the following data is backed up:
- Keys.
- Key metadata: digest, TTL, generation count, and key.
- Regular bins: string, integer, boolean, and binary.
- Collection data type (CDT) bins: list and map.
- GeoJSON data type bins.
- HyperLogLog data type bins.
- Secondary index definitions.
- User-Defined Function (UDF) modules.
For the exact backup file format, see the file format specification at the Backup File Format repository on Github.
Restore to any cluster
Backup and restore are cluster-configuration-agnostic. A backup can be restored to a cluster of any size and configuration. Restored data is evenly distributed among cluster nodes, regardless of cluster configuration.
Estimating disk space for the backup
For an estimate, use the --estimate
option of asbackup
. As shown in the following example, this option reads 10,000 records from the specified namespace and prints the average size of the sampled records:
asbackup --namespace NAME --estimate
Multiply the displayed estimated record size by the number of records in the namespace, and add 10% of the result for overhead and indexes:
Formula to calculate approximate disk space for backup |
---|
Estimated average record size from asbackup --estimate × Number of records in namespace + 10% of estimated record size = approximate disk space needed for backup |
For more information about backup and restore required resources, see asbackup
and asrestore
resource usage.
asbackup
command basics and useful variations
The following example shows the basic syntax of asbackup
:
asbackup --host HOST --namespace NAME --directory DIRECTORY
--host HOST
specifies any cluster node's IP address or hostname to back up.--namespace NAME
is the name of the namespace to back up.asbacksup
backs up a single namespace at a time.--directory DIRECTORY
is the name of the directory where the backed up data is written. Data is stored in multiple files with the .asb file extension. By default, each backup file is limited to 250 MiB. When this limit is reached,asbackup
creates a new file.
Backing up to a single file
You can back up the cluster to a single file, rather than a directory:
asbackup --host HOST --namespace NAME --output-file FILENAME
Incremental backup
Use the following options to make incrememental backups. The argument YYYY-MMM-DD_HH:MM:SS
is the time stamp variable:
--modified-after YYYY-MMM-DD_HH:MM:SS
backs up keys time-stamped after the argument.--modified-before YYYY-MMM-DD_HH:MM:SS
backs up keys time-stamped before the argument.
You may also back up partitions to create incremental backups. Refer to partition lists.
Backing up individual hosts
Use --node-list NODE1:PORT,NODE2:PORT
to back up data on specific hosts. Backups will then be executed on a partition basis. PORT
is the Aerospike service port, by default 3000. The --node-list
flag is particularly useful when running multiple asbackup
processes, for example one per Aerospike host.
Throttling
If data can be retrieved from the database faster than it can be written, it may be necessary to throttle the retrieval rate. Use the --nice RATE
flag to restrict the rate at which data is written. The rate is specified in MB/s.
Writing to stdout and piping
Instead of --output-file
or --directory
, use -
to write the backup data to stdout
. This is useful for pipes. The following example writes backup data to stdout
with -
, and pipes the output to gzip
to create a compressed file:
asbackup --host HOST --namespace NAME --output-file - | gzip > FILENAME.GZ
Note that the gzip
utility is single-threaded. This may cause single-CPU core saturation and create a bottleneck. To take advantage of multi-core archive utilities, consider using xz
instead.
An updated method of compressing backup file data is to use the --compress
runtime option. Refer to Compression and encryption.
Compression and encryption
You may compress and encrypt backup file data as it is being written to the backup file with --compress
and --encrypt
. Each are enabled by passing the corresponding option followed by your chosen algorithm.
There is one available compression algorithm:
Algorithm | Description |
---|---|
zstd | Zstd compression, from the facebook libztsd repository on Github. |
For example:
asbackup --host HOST --namespace NAME --compress zstd
You may also specify the compression level to be used by zstd
via the --compression-level
option.
The levels supported are integers described by zstd
. For more information see the zstd manual.
Set the default compression level with the ZSTD_CLEVEL_DEFAULT
parameter.
For example:
asbackup --host HOST --namespace NAME --compress zstd --compression-level 3
These are the available encryption algorithms:
Algorithm | Description |
---|---|
aes128 | AES 128-bit key-digest encryption, which uses the CTR128 algorithm to encrypt data. The SHA256 hash of the encryption key is used to generate the key used by CTR128. |
aes256 | AES 256-bit key-digest encryption, which is again the same, only using a 256-bit digest of the key for encryption and AES256 as the base encryption algorithm. |
For encryption, you must provide a private key. The private encryption key may be in PEM format (with --encryption-key-file
), or a base64 encoded key passed in through an environment variable (with --encryption-key-env
).
For example, using an encryption key file:
asbackup --host HOST --namespace NAME --encrypt aes128 --encryption-key-file KEY.PEM
Using an environment variable:
export PRIVATE_KEY='PRIVATE KEY'
asbackup --host HOST --namespace NAME --encrypt aes256 --encryption-key-env PRIVATE_KEY
Replace 'PRIVATE KEY' with the contents of your private key file, between the header and footer. In the following example the key starts with b3Blb
and ends with eNfNpA=
:
-----BEGIN OPENSSH PRIVATE KEY-----
b3BlbnNzaC1rZXktdjEAAAAACmFlczI1Ni1jdHIAAAAGYmNyeXB0AAAAGAAAABDWTq8LwB
zXg7xnGj4VNY3GAAAAEAAAAAEAAAAzAAAAC3NzaC1lZDI1NTE5AAAAIHuu8YsX03XGjJ1L
YFbehI4Ha7g8EVybKB3dAAPt/iFq3u9eNfNpA=
-----END OPENSSH PRIVATE KEY-----
Note that when restoring compressed/encrypted backup files, the same compression/encryption flags must be provided to asrestore
.
Safety of backup files
It is a best practice to store backup files offsite in a secure location.
Other asbackup
options and command help
asbackup
has additional options that you might want to investigate. For more detail, type asbackup --help
, or refer to asbackup command-line options.
- Backing up specific nodes, or connecting to a port other than the default 3000.
- Securing connections via username/password, or TLS certificates, or both.
- Backing up specific bins.
- Backing up specific sets.
- Using configuration files to automate backups.
Backup resumption
If a backup job is interrupted, for example if you stop the backup with Ctrl-C, or it fails for any reason other than a failure to write to the disk, the backup state is saved to a .state
file. Pass the path to this .state
file to the --continue
flag to resume the backup. All of the same command line arguments, except --remove-files
, must be used when continuing a backup.
Restoring from backup
This section describes the most essential restore commands and some common variations. Some frequently asked questions are covered in the knowledge base article asrestore
FAQ.
Prerequisites and notes for restoring from backup
asrestore
can restore only backups from Aerospike Server and tools version 3.0 or later. To restore a backup from earlier releases, contact Aerospike Support.
The TTL of restored keys is preserved, but the last-update-time and generation count are reset to the current time.
asrestore
command basics and useful variations
The following example shows the basic syntax of asrestore
:
asrestore --host HOST --directory DIRECTORY
--host HOST
specifies the cluster node's IP address or hostnames to be restored.--directory DIRECTORY
is the name of the directory containing the backup files.
Restoring from a single backup file
If you backed up to a single file, use the following syntax to restore from it:
asrestore --host HOST --input-file FILENAME
Restoring to a different namespace
By default, data is restored to its original namespace. Use the --namespace
option to restore to a different namespace. You must specify the comma-separated old and new namespace names:
asrestore --host HOST --directory DIRECTORY --namespace OLD-NAMESPACE,NEW-NAMESPACE
Write policy for duplicate key IDs
The target namespace might already contain keys with the same IDs as the backup you are restoring. The logic of the write policy for managing existing keys is as follows:
- If the record from the backup is expired, based on its TTL value, the backup record is ignored.
- If the record does not exist in the namespace, the backup record is added to the namespace.
- If an older version of the record (that is, with a lower generation count) already exists in the namespace, the backup record is restored. If you want
asrestore
to ignore this condition, specify this option:
--unique
:asrestore
does not touch any existing records, regardless of generation counts.
- If a newer version of the record (that is, with a higher or same generation count) already exists in the namespace, the backup record is ignored. If you want
asrestore
to ignore this condition, specify this option:
--no-generation
:asrestore
overwrites any existing records, regardless of generation count.
- If the record in the namespace contains bins that are not present in the backup, those bins in the namespace are preserved. If you want
asrestore
to ignore this condition, specify this option:
--replace
: When restoring a record from the backup,asrestore
does not preserve namespace bins that are not present in the backup.
Reading from stdin, piping, and uncompressing
Instead of --input-file
or --directory
, use -
with standard Unix pipes to read the backup data from stdin.
The following three usage examples uncompress a gzip file and then pipe the data to asrestore
with the -
option to read from stdin:
gunzip -c BACKUP-FILE.GZ | asrestore --host HOST -i -
zcat BACKUP-FILE.GZ | asrestore --host HOST -i -
cat BACKUP-FILE.GZ | gzip -d | asrestore --host HOST -i -
This example concatenates a single uncompressed backup file, and pipes the data to asrestore
with the dash,-
, option:
cat BACKUP-FILE | asrestore --host HOST -i -
Other asrestore
options and command-line help
asrestore
includes options that you may find useful. For more detail, type asrestore --usage
, or see these asrestore command-line options.
- Restoring to specific nodes or connecting to a port other than the default 3000.
- Securing connections via username/password or TLS certificates or both.
- Restoring specific bins.
- Restoring specific sets.
- Using configuration files to help automate restores.
Transaction retries
- Failed Record Uploads: If a transaction fails it is retried according to
--max-retries
and--retry-scale-factor
. By default these are 5 and 150ms respectively. An exponential backoff strategy is followed where the delay isretry-scale-factor * 2 ** (retry_attempts - 1)
, or 0 on the first try. If--max-retries
is exceeded the transaction is counted as a failure in the info level log output. Note:--retry-delay
and--sleep-between-retries
are deprecated in favor of--retry-scale-factor
.
Possible error or informational messages from asrestore
- Record exists: When the
--unique
option is used, this informational message is displayed. - Generation mismatch: The backup copy and existing copy of a key do not match, and so the key is not restored. You can override this behavior with the
--no-generation
option. - Invalid username or password: The wrong username or password was specified on the command line.