Aerospike Loader (asloader)
The Aerospike Loader (asloader
) migrates data from another database to Aerospike. You provide
.DSV data files and an Aerospike schema file in JSON format. asloader
parses the .DSV files and loads the data into
the Aerospike cluster according to your schema.
Prerequisites
- Java 1.8 or later
- Maven 3.0 or later
Installation
asloader
is available:
As a jar file from https://github.com/aerospike/aerospike-loader/releases.
As source code on GitHub. To install, use the following command-line instructions:
git clone https://github.com/aerospike/aerospike-loader.git
cd aerospike-loader
./build
For releases prior to Aerospike Tools version 6.2, asloader
is available
as part of the Aerospike Tools package.
Dependencies
The following dependencies are downloaded automatically:
- Aerospike Java client 6.1.6 or greater
- Apache Commons CLI 1.2
- Log4j 2.17.1
- Junit 4.4
- Json-simple 1.1.1
Usage
If you downloaded the jar file from the releases page, use
$ java -cp aerospike-load-*-jar-with-dependencies.jar com.aerospike.load.AerospikeLoad <options> <data file name(s)/directory>
If you downloaded the source, use run_loader script along with options and data files.
$ ./run_loader <options> <data file name(s)/directory>
DATA_FILE_NAME_OR_DIRECTORY
can either be space-delimited files or a directory name containing data files. See the "Data Files" section for more details.
Options:
-h,--hosts <arg> List of seed hosts (default: localhost)
-p,--port <arg> Server port (default: 3000)
-U,--user <arg> User name
-P,--password <arg> Password
-n,--namespace <arg> Namespace (default: test)
-c,--config <arg> Column definition file in JSON format
-g,--max-throughput <arg> Set a target max transactions per second for the loader (default: 0 (don`t limit TPS)).
-T,--transaction-timeout <arg> Transaction timeout in milliseconds for write (default: no timeout)
-e,--expiration-time <arg> Time to expire of a record in seconds (default: never expire)
-tz,--timezone <arg> TimeZone of source where datadump is taken (default: local timeZone)
-ec,--abort-Error-Count <arg> Abort when error occurs more than this value (default: 0 (don`t abort))
-wa,--write-Action <arg> Write action if key already exists (default: update)
-tls,--tls-enable Use TLS/SSL sockets(default: False)
-tlsLoginOnly Use TLS/SSL sockets on node login only
-tp,--tls-protocols Allow TLS protocols. Values: TLSv1,TLSv1.1,TLSv1.2 separated by comma (default: TLSv1.2)
-tlsCiphers,--tls-cipher-suite Allow TLS cipher suites. Values: cipher names defined by JVM separated by comma (default: null (default cipher list provided by JVM))
-tr,--tls-revoke Revoke certificates identified by their serial number. Values: serial numbers separated by comma (default: null (Do not revoke certificates))
-uk,--send-user-key Send user defined key in addition to hash digest to store on the server. (default: userKey is not sent to reduce meta-data overhead)
-um,--unorderedMaps If this flag is present write all maps as unordered maps.
-u,--usage Print usage.
-v,--verbose Verbose mode for debug logging (default: INFO)
-V,--version Print version
For more details, refer to Options.
Threads
- There are 2 types of threads:
- reader threads (reads CSV files) (The number of reader threads = either number of CPUs or number of files in the directory, whichever one is lower.)
- writer threads (writes to the cluster) (The number of writer threads = number of CPUs * 5 (5 is scaleFactor))
Sample usage of all options:
run_loader -h nodex -p 3000 -n test -T 3000 -e 2592000 -ec 100 -tz PST -wa update -c ~/pathto/config.json datafiles/
Where:
Server IP: nodex (-h)
Port: 3000 (-p)
Namespace: test (-n)
Write Operation Timeout (in milliseconds): 3000 (-T)
Write Error Threshold: 100 (-ec)
Record Expiration: 2592000 (-e)
Timezone: PST (-tz)
Write Action: update (-wa)
Data Mapping: ~/pathto/config.json (-c)
Data Files: datafiles/
Demo example
The example directory contains two files: alldatatype.json
and alldatatype.dsv
. Run the following command to load data from the data file: alldatatype.dsv
.
run_loader -h localhost -c example/alldatatype.json example/alldatatype.dsv
For additional use cases, refer to the Examples.