Skip to main content
Loading

Aerospike Loader Examples

This page contains usage examples for the asloader tool.

Prerequisites

Usage

asloader requires two files; a data file and a configuration file. In the following examples, the data file is called data.dsv and the configuration file is called config.json.

Example 1: data with a header line

You can load delimited data files either with or without a header line with column names. It's also possible to specify which set to add a line of data. It will be added as one of its columns.

The following sample data file includes a header line with column names. The 4th column contains a string which specifies the set to which the record should be added.

user_location, user_id, last_visited, set_name, age, user_name, user_name_blob, user_rating
IND, userid1, 04/1/2014, facebook, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, twitter, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, twitter, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, facebook, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, twitter, 37, X10, 583130, 9.3

The following sample configuration file specifies a comma delimiter for the data file, as well as 8 columns of data and a header line containing field names.

{
"version" : "2.0",
"dsv_config": { "delimiter": "," , "n_columns_datafile": 8, "header_exist": true},

"mappings": [
{
"key": {"column_name":"user_id", "type": "string"},

"set": { "column_name":"set_name", "type": "string"},

"bin_list": [
{"name": "age",
"value": { "column_name": "age", "type" : "integer"}
},
{"name": "location",
"value": { "column_name": "user_location", "type" : "string"}
},
{"name": "name",
"value": { "column_name": "user_name", "type" : "string"}
},
{"name": "name_blob",
"value": { "column_name": "user_name_blob", "type" : "blob", "dst_type" : "blob", "encoding":"hex"}
},
{"name": "recent_visit",
"value": { "column_name": "last_visited", "type" : "timestamp", "encoding":"MM/dd/yy", "dst_type": "integer"}
},
{"name": "rating",
"value": { "column_name": "user_rating", "type" : "float"}
}
]
}
]
}

Config file details

  • delimiter specifies the separator character for the data file.
  • n_columns_datafile specifies the number of columns in the data file.
  • header_exist specifies whether a header line is present in the data file.
  • key specifies the field in each line of data which should be used as the record's key.
  • set specifies the set to which new records should be added.
  • bin_list contains an array of bin mappings. In each bin mapping there are two entries: 1) Aerospike bin name and 2) value, which is the bin content mapping. If one column mapping is absent in the config file then that column will be skipped while loading.
    • Either column_name or column_position can be used to specify the column.
    • Native data types integer and string are stored as-is.
    • Data types other than native types include the additional fields dst_type and encoding.

Run the command

To try out the above setup, copy the data into a file called data.dsv, copy the configuration into a file called config.json, and run the following command:

java -cp aerospike-load-*-jar-with-dependencies.jar com.aerospike.load.AerospikeLoad  data.dsv -h [aerospike-host-id] -c config.json -n test

Replace [aerospike-host-ip] with the IP address of your Aerospike server. See the Options page for additional command-line options.

Example 2: data without a header line

The following example uses a data file with no header information in first line. When your data file doesn't have a header line, you should always use column_position for bin mapping.

Data file content

IND, userid1, 04/1/2014, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, 37, X10, 583130, 9.3

Config file content

{
"version" : "2.0",
"dsv_config": { "delimiter": "," , "n_columns_datafile": 7, "header_exist": false},

"mappings": [
{
"key": {"column_position": 2, "type": "string"},

"set": "my_set",

"bin_list": [
{"name": "age",
"value": { "column_position": 4, "type" : "integer"}
},
{"name": "location",
"value": { "column_position": 1, "type" : "string"}
},
{"name": "name",
"value": { "column_position": 5, "type" : "string"}
},
{"name": "name_blob",
"value": { "column_position": 6, "type" : "blob", "dst_type" : "blob" , "encoding":"hex"}
},
{"name": "recent_visit",
"value": { "column_position": 3, "type" : "timestamp", "encoding":"MM/dd/yy", "dst_type": "integer"}
},
{"name": "rating",
"value": { "column_position": 7, "type" : "float"}
}
]
}
]
}

Config file details

  • There is no header information in the data file, so header_exist is false.
  • Each bin mapping is specified with column_position.
  • The set parameter is static. All new records are written to the set my_set.

Example 3: add a static value

To add a static value to every new record created from a data file, add a new bin to the bin mapping in the config file. In the following example config file, the value "my_value" is added to a bin called static_value in every new record created.

{
"version" : "2.0",
"dsv_config": { "delimiter": "," , "n_columns_datafile": 7, "header_exist": false},

"mappings": [
{
"key": {"column_position": 2, "type": "string"},

"set": "my_set",

"bin_list": [
{"name": "age",
"value": { "column_position": 4, "type" : "integer"}
},
{"name": "location",
"value": { "column_position": 1, "type" : "string"}
},
{"name": "name",
"value": { "column_position": 5, "type" : "string"}
},
{"name": "name_blob",
"value": { "column_position": 6, "type" : "blob", "dst_type" : "blob" , "encoding":"hex"}
},
{"name": "recent_visit",
"value": { "column_position": 3, "type" : "timestamp", "encoding":"MM/dd/yy", "dst_type": "integer"}
},
{"name": "rating",
"value": { "column_position": 7, "type" : "float"}
},
{"name": "static_value",
"value": "my_value"
}
]
}
]
}

For demonstration purposes, you can use the following data file with the above config file:

IND, userid1, 04/1/2014, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, 37, X10, 583130, 9.3

Example 4: add a timestamp

To add a timestamp with the current system time to every new record created from a data file, add a new bin to the bin mapping in the config file. In the following example config file, a new bin called write_time is added to every new record with the current Unix time since epoch.

{
"version" : "2.0",
"dsv_config": { "delimiter": "," , "n_columns_datafile": 7, "header_exist": false},

"mappings": [
{
"key": {"column_position": 2, "type": "string"},

"set": "my_set",

"bin_list": [
{"name": "age",
"value": { "column_position": 4, "type" : "integer"}
},
{"name": "location",
"value": { "column_position": 1, "type" : "string"}
},
{"name": "name",
"value": { "column_position": 5, "type" : "string"}
},
{"name": "name_blob",
"value": { "column_position": 6, "type" : "blob", "dst_type" : "blob" , "encoding":"hex"}
},
{"name": "recent_visit",
"value": { "column_position": 3, "type" : "timestamp", "encoding": "MM/dd/yy", "dst_type": "integer"}
},
{"name": "rating",
"value": { "column_position": 7, "type" : "float"}
},
{
"name": "write_time",
"value": { "column_name": "system_time", "type" : "timestamp", "encoding": "MM/dd/yy HH:mm:ss", "dst_type": "integer"}
}
]
}
]
}

For demonstration purposes, you can use the following data file with the above config file:

IND, userid1, 04/1/2014, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, 37, X10, 583130, 9.3

Data file options

  • You can load data from several data files by specifying them in the asloader command:
java -cp aerospike-load-*-jar-with-dependencies.jar com.aerospike.load.AerospikeLoad -c /path/to/config.json data1.dsv data2.dsv data3.dsv
  • You can specify a directory of data files:
java -cp aerospike-load-*-jar-with-dependencies.jar com.aerospike.load.AerospikeLoad -c /path/to/config.json data/

For more details on the various command line options, refer to Options.