Skip to main content
Loading

Aerospike Loader Examples

This page contains usage examples for the asloader tool.

Prerequisites

Usage

asloader requires two files; a data file and a configuration file. In the following examples, the data file is called data.dsv and the configuration file is called config.json.

Example: data with a header line

You can load delimited data files either with or without a header line with column names. It's also possible to specify which set to add a line of data. It will be added as one of its columns.

The following sample data file includes a header line with column names. The 4th column contains a string which specifies the set to which the record will be added.

user_location, user_id, last_visited, set_name, age, user_name, user_name_blob, user_rating
IND, userid1, 04/1/2014, facebook, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, twitter, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, twitter, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, facebook, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, twitter, 37, X10, 583130, 9.3

The following sample configuration file specifies a comma delimiter for the data file, as well as 8 columns of data and a header line containing field names.

{
"version" : "2.0",
"dsv_config": { "delimiter": "," , "n_columns_datafile": 8, "header_exist": true},

"mappings": [
{
"key": {"column_name":"user_id", "type": "string"},

"set": { "column_name":"set_name", "type": "string"},

"bin_list": [
{"name": "age",
"value": { "column_name": "age", "type" : "integer"}
},
{"name": "location",
"value": { "column_name": "user_location", "type" : "string"}
},
{"name": "name",
"value": { "column_name": "user_name", "type" : "string"}
},
{"name": "name_blob",
"value": { "column_name": "user_name_blob", "type" : "blob", "dst_type" : "blob", "encoding":"hex"}
},
{"name": "recent_visit",
"value": { "column_name": "last_visited", "type" : "timestamp", "encoding":"MM/dd/yy", "dst_type": "integer"}
},
{"name": "rating",
"value": { "column_name": "user_rating", "type" : "float"}
}
]
}
]
}

Config file details

  • delimiter specifies the separator character for the data file.
  • n_columns_datafile specifies the number of columns in the data file.
  • header_exist specifies whether a header line is present in the data file.
  • key specifies the field in each line of data which should be used as the record's key.
  • set specifies the set to which new records should be added.
  • bin_list contains an array of bin mappings. In each bin mapping there are two entries: 1) Aerospike bin name and 2) value, which is the bin content mapping. If one column mapping is absent in the config file then that column will be skipped while loading.
    • Either column_name or column_position can be used to specify the column.
    • Native data types integer and string are stored as-is.
    • Data types other than native types include the additional fields dst_type and encoding.

Run the command

To try out the above setup, copy the data into a file called data.dsv, copy the configuration into a file called config.json, and run the following command:

java -cp aerospike-load-*-jar-with-dependencies.jar com.aerospike.load.AerospikeLoad  data.dsv -h [aerospike-host-id] -c config.json -n test

Replace [aerospike-host-ip] with the IP address of your Aerospike server. See the Options page for additional command-line options.

Example: JSON data.

The scores column in this example is type "json". JSON data maps to corresponding Aerospike CDTs. The "scores" bin will be written to Aerospike as a map with elements "high_score" and the nested list "others".

note

The delimiter used in this file is *, because the , character is reserved for the JSON data.

The following sample data file includes a header line with column names. The 4th column contains a string which specifies the set to which the record will be added.

user_location* user_id* last_visited* set_name* age* user_name* user_name_blob* user_rating* scores
IND* userid1* 04/1/2014* facebook* 20* X20* 583230* 8.1* {"high_score": 26, "others": [12, 8, 20]}
USA* userid2* 03/18/2014* twitter* 27* X2* 5832* 6.4* {"high_score": 18, "others": [11, 8, 9]}
UK* userid3* 01/9/2014* twitter* 21* X3* 5833* 4.3* {"high_score": 30, "others": [10, 18, 21]}
UK* userid4* 01/2/2014* facebook* 16* X9* 5839* 5.9* {"high_score": 27, "others": [9, 8, 13]}
IND* userid5* 08/20/2014* twitter* 37* X10* 583130* 9.3* {"high_score": 14, "others": [7, 4, 12]}

The following sample configuration file specifies a '*' delimiter for the data file, as well as 9 columns of data and a header line containing field names.

{
"version" : "2.0",
"dsv_config": { "delimiter": "*" , "n_columns_datafile": 9, "header_exist": true},

"mappings": [
{
"key": {"column_name":"user_id", "type": "string"},

"set": { "column_name":"set_name", "type": "string"},

"bin_list": [
{"name": "age",
"value": { "column_name": "age", "type" : "integer"}
},
{"name": "location",
"value": { "column_name": "user_location", "type" : "string"}
},
{"name": "name",
"value": { "column_name": "user_name", "type" : "string"}
},
{"name": "name_blob",
"value": { "column_name": "user_name_blob", "type" : "blob", "dst_type" : "blob", "encoding":"hex"}
},
{"name": "recent_visit",
"value": { "column_name": "last_visited", "type" : "timestamp", "encoding":"MM/dd/yy", "dst_type": "integer"}
},
{"name": "rating",
"value": { "column_name": "user_rating", "type" : "float"}
},
{
"name": "scores",
"value": { "column_name": "scores", "type" : "json"}
}
]
}
]
}

Example: data without a header line

The following example uses a data file with no header information in first line. When your data file doesn't have a header line, you should always use column_position for bin mapping.

Data file content

IND, userid1, 04/1/2014, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, 37, X10, 583130, 9.3

Config file content

{
"version" : "2.0",
"dsv_config": { "delimiter": "," , "n_columns_datafile": 7, "header_exist": false},

"mappings": [
{
"key": {"column_position": 2, "type": "string"},

"set": "my_set",

"bin_list": [
{"name": "age",
"value": { "column_position": 4, "type" : "integer"}
},
{"name": "location",
"value": { "column_position": 1, "type" : "string"}
},
{"name": "name",
"value": { "column_position": 5, "type" : "string"}
},
{"name": "name_blob",
"value": { "column_position": 6, "type" : "blob", "dst_type" : "blob" , "encoding":"hex"}
},
{"name": "recent_visit",
"value": { "column_position": 3, "type" : "timestamp", "encoding":"MM/dd/yy", "dst_type": "integer"}
},
{"name": "rating",
"value": { "column_position": 7, "type" : "float"}
}
]
}
]
}

Config file details

  • There is no header information in the data file, so header_exist is false.
  • Each bin mapping is specified with column_position.
  • The set parameter is static. All new records are written to the set my_set.

Example: add a static value

To add a static value to every new record created from a data file, add a new bin to the bin mapping in the config file. In the following example config file, the value "my_value" is added to a bin called static_value in every new record created.

{
"version" : "2.0",
"dsv_config": { "delimiter": "," , "n_columns_datafile": 7, "header_exist": false},

"mappings": [
{
"key": {"column_position": 2, "type": "string"},

"set": "my_set",

"bin_list": [
{"name": "age",
"value": { "column_position": 4, "type" : "integer"}
},
{"name": "location",
"value": { "column_position": 1, "type" : "string"}
},
{"name": "name",
"value": { "column_position": 5, "type" : "string"}
},
{"name": "name_blob",
"value": { "column_position": 6, "type" : "blob", "dst_type" : "blob" , "encoding":"hex"}
},
{"name": "recent_visit",
"value": { "column_position": 3, "type" : "timestamp", "encoding":"MM/dd/yy", "dst_type": "integer"}
},
{"name": "rating",
"value": { "column_position": 7, "type" : "float"}
},
{"name": "static_value",
"value": "my_value"
}
]
}
]
}

For demonstration purposes, you can use the following data file with the above config file:

IND, userid1, 04/1/2014, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, 37, X10, 583130, 9.3

Example: add a timestamp

To add a timestamp with the current system time to every new record created from a data file, add a new bin to the bin mapping in the config file. In the following example config file, a new bin called write_time is added to every new record with the current Unix time since epoch.

{
"version" : "2.0",
"dsv_config": { "delimiter": "," , "n_columns_datafile": 7, "header_exist": false},

"mappings": [
{
"key": {"column_position": 2, "type": "string"},

"set": "my_set",

"bin_list": [
{"name": "age",
"value": { "column_position": 4, "type" : "integer"}
},
{"name": "location",
"value": { "column_position": 1, "type" : "string"}
},
{"name": "name",
"value": { "column_position": 5, "type" : "string"}
},
{"name": "name_blob",
"value": { "column_position": 6, "type" : "blob", "dst_type" : "blob" , "encoding":"hex"}
},
{"name": "recent_visit",
"value": { "column_position": 3, "type" : "timestamp", "encoding": "MM/dd/yy", "dst_type": "integer"}
},
{"name": "rating",
"value": { "column_position": 7, "type" : "float"}
},
{
"name": "write_time",
"value": { "column_name": "system_time", "type" : "timestamp", "encoding": "MM/dd/yy HH:mm:ss", "dst_type": "integer"}
}
]
}
]
}

For demonstration purposes, you can use the following data file with the above config file:

IND, userid1, 04/1/2014, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, 37, X10, 583130, 9.3

Data file options

  • You can load data from several data files by specifying them in the asloader command:
java -cp aerospike-load-*-jar-with-dependencies.jar com.aerospike.load.AerospikeLoad -c /path/to/config.json data1.dsv data2.dsv data3.dsv
  • You can specify a directory of data files:
java -cp aerospike-load-*-jar-with-dependencies.jar com.aerospike.load.AerospikeLoad -c /path/to/config.json data/

For more details on the various command line options, refer to Options.