Aerospike Loader Examples
This page contains usage examples for the asloader tool.
Prerequisites
- A running Aerospike instance.
- The
asloader
tool. If you've installed Aerospike Tools version 6.1.2 or earlier,asloader
is included. Otherwise, you can download anasloader
jar file from its [GitHub repository] (https://github.com/aerospike/aerospike-loader/releases). - A Java runtime such as OpenJDK.
Usage
asloader
requires two files; a data file and a configuration file. In the following
examples, the data file is called data.dsv
and the configuration file is called
config.json
.
Example 1: data with a header line
You can load delimited data files either with or without a header line with column names. It's also possible to specify which set to add a line of data. It will be added as one of its columns.
The following sample data file includes a header line with column names. The 4th column contains a string which specifies the set to which the record should be added.
user_location, user_id, last_visited, set_name, age, user_name, user_name_blob, user_rating
IND, userid1, 04/1/2014, facebook, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, twitter, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, twitter, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, facebook, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, twitter, 37, X10, 583130, 9.3
The following sample configuration file specifies a comma delimiter for the data file, as well as 8 columns of data and a header line containing field names.
{
"version" : "2.0",
"dsv_config": { "delimiter": "," , "n_columns_datafile": 8, "header_exist": true},
"mappings": [
{
"key": {"column_name":"user_id", "type": "string"},
"set": { "column_name":"set_name", "type": "string"},
"bin_list": [
{"name": "age",
"value": { "column_name": "age", "type" : "integer"}
},
{"name": "location",
"value": { "column_name": "user_location", "type" : "string"}
},
{"name": "name",
"value": { "column_name": "user_name", "type" : "string"}
},
{"name": "name_blob",
"value": { "column_name": "user_name_blob", "type" : "blob", "dst_type" : "blob", "encoding":"hex"}
},
{"name": "recent_visit",
"value": { "column_name": "last_visited", "type" : "timestamp", "encoding":"MM/dd/yy", "dst_type": "integer"}
},
{"name": "rating",
"value": { "column_name": "user_rating", "type" : "float"}
}
]
}
]
}
Config file details
delimiter
specifies the separator character for the data file.n_columns_datafile
specifies the number of columns in the data file.header_exist
specifies whether a header line is present in the data file.key
specifies the field in each line of data which should be used as the record's key.set
specifies the set to which new records should be added.bin_list
contains an array of bin mappings. In each bin mapping there are two entries: 1) Aerospike bin name and 2) value, which is the bin content mapping. If one column mapping is absent in the config file then that column will be skipped while loading.- Either
column_name
orcolumn_position
can be used to specify the column. - Native data types
integer
andstring
are stored as-is. - Data types other than native types include the additional fields
dst_type
andencoding
.
- Either
Run the command
To try out the above setup, copy the data into a file called data.dsv
, copy the configuration
into a file called config.json
, and run the following command:
java -cp aerospike-load-*-jar-with-dependencies.jar com.aerospike.load.AerospikeLoad data.dsv -h [aerospike-host-id] -c config.json -n test
Replace [aerospike-host-ip]
with the IP address of your Aerospike server. See the
Options page for additional command-line options.
Example 2: data without a header line
The following example uses a data file with no header information in first line. When your
data file doesn't have a header line, you should always use column_position
for bin
mapping.
Data file content
IND, userid1, 04/1/2014, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, 37, X10, 583130, 9.3
Config file content
{
"version" : "2.0",
"dsv_config": { "delimiter": "," , "n_columns_datafile": 7, "header_exist": false},
"mappings": [
{
"key": {"column_position": 2, "type": "string"},
"set": "my_set",
"bin_list": [
{"name": "age",
"value": { "column_position": 4, "type" : "integer"}
},
{"name": "location",
"value": { "column_position": 1, "type" : "string"}
},
{"name": "name",
"value": { "column_position": 5, "type" : "string"}
},
{"name": "name_blob",
"value": { "column_position": 6, "type" : "blob", "dst_type" : "blob" , "encoding":"hex"}
},
{"name": "recent_visit",
"value": { "column_position": 3, "type" : "timestamp", "encoding":"MM/dd/yy", "dst_type": "integer"}
},
{"name": "rating",
"value": { "column_position": 7, "type" : "float"}
}
]
}
]
}
Config file details
- There is no header information in the data file, so
header_exist
is false. - Each bin mapping is specified with
column_position
. - The
set
parameter is static. All new records are written to the setmy_set
.
Example 3: add a static value
To add a static value to every new record created from a data file, add a new bin
to the bin mapping in the config file. In the following example config file, the
value "my_value"
is added to a bin called static_value
in every new record created.
{
"version" : "2.0",
"dsv_config": { "delimiter": "," , "n_columns_datafile": 7, "header_exist": false},
"mappings": [
{
"key": {"column_position": 2, "type": "string"},
"set": "my_set",
"bin_list": [
{"name": "age",
"value": { "column_position": 4, "type" : "integer"}
},
{"name": "location",
"value": { "column_position": 1, "type" : "string"}
},
{"name": "name",
"value": { "column_position": 5, "type" : "string"}
},
{"name": "name_blob",
"value": { "column_position": 6, "type" : "blob", "dst_type" : "blob" , "encoding":"hex"}
},
{"name": "recent_visit",
"value": { "column_position": 3, "type" : "timestamp", "encoding":"MM/dd/yy", "dst_type": "integer"}
},
{"name": "rating",
"value": { "column_position": 7, "type" : "float"}
},
{"name": "static_value",
"value": "my_value"
}
]
}
]
}
For demonstration purposes, you can use the following data file with the above config file:
IND, userid1, 04/1/2014, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, 37, X10, 583130, 9.3
Example 4: add a timestamp
To add a timestamp with the current system time to every new record created from a
data file, add a new bin to the bin mapping in the config file. In the following example config
file, a new bin called write_time
is added to every new record with the current
Unix time since epoch.
{
"version" : "2.0",
"dsv_config": { "delimiter": "," , "n_columns_datafile": 7, "header_exist": false},
"mappings": [
{
"key": {"column_position": 2, "type": "string"},
"set": "my_set",
"bin_list": [
{"name": "age",
"value": { "column_position": 4, "type" : "integer"}
},
{"name": "location",
"value": { "column_position": 1, "type" : "string"}
},
{"name": "name",
"value": { "column_position": 5, "type" : "string"}
},
{"name": "name_blob",
"value": { "column_position": 6, "type" : "blob", "dst_type" : "blob" , "encoding":"hex"}
},
{"name": "recent_visit",
"value": { "column_position": 3, "type" : "timestamp", "encoding": "MM/dd/yy", "dst_type": "integer"}
},
{"name": "rating",
"value": { "column_position": 7, "type" : "float"}
},
{
"name": "write_time",
"value": { "column_name": "system_time", "type" : "timestamp", "encoding": "MM/dd/yy HH:mm:ss", "dst_type": "integer"}
}
]
}
]
}
For demonstration purposes, you can use the following data file with the above config file:
IND, userid1, 04/1/2014, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, 37, X10, 583130, 9.3
Data file options
- You can load data from several data files by specifying them in the
asloader
command:
java -cp aerospike-load-*-jar-with-dependencies.jar com.aerospike.load.AerospikeLoad -c /path/to/config.json data1.dsv data2.dsv data3.dsv
- You can specify a directory of data files:
java -cp aerospike-load-*-jar-with-dependencies.jar com.aerospike.load.AerospikeLoad -c /path/to/config.json data/
For more details on the various command line options, refer to Options.