Skip to main content
Loading

Geospatial Index and Query

Use Aerospike geospatial storage and indexing to enable fast queries on points within a region, on a region containing points, and points within a radius.

Underlying technologies

The Aerospike geospatial feature relies on these technologies:

info

The S2 Geometry Library is a spherical geometry library, very useful for manipulating regions on a sphere (commonly on Earth) and indexing geographic data. The S2 blog provides more details about S2, Hilbert Curve, and geospatial mapping.

Use cases

  • Vehicle tracking systems that require high-throughput updates of vehicle location and frequently query vehicles within a region.
  • A mapping application could find different amenities within a certain distance of a given location.
  • Location-targeted bidding transactions to discover persons or devices within the location with an active ad campaign.

See these Aerospike examples.

Geospatial data

Aerospike supports the GeoJSON geospatial data type. All geospatial functionality (indexing and querying) only execute on GeoJSON data types.

GeoJSON data incurs this additional processing on data reads:

  • GeoJSON text is parsed for validity and support (see GeoJSON Parsing).
  • GeoJSON text is converted into S2 CellID coverings.
  • Aerospike saves both the covering CellIDs and the original GeoJSON in the database.
  • Only GeoJSON data is accessible to the application through the client APIs and the UDF subsystem.

Geospatial index

In addition to integers and strings, Aerospike supports Geo2DSphere data types for indexes.

Use asadm to create and manage secondary indexes in an Aerospike cluster. For instructions, see Secondary Index (SI) Query.

The following command creates a secondary index called geo-index using geo2dsphere data on the namespace user-profile using the set name geo-set and the bin geo-bin.

Admin+> manage sindex create geo2dsphere geo-index ns user_profile set geo-set bin geo-bin

Geo2DSphere indexes behave as other index types to:

  • Scan existing records to inspect the indexed bin (geo-bin in the example above) to build an in-memory geospatial index.
  • Create an independent index for data on each node.
  • Update the index on all subsequent data inserts and updates.
  • Rebuild the index when a node restarts.

Geospatial query

Aerospike supports two Geospatial queries:

  • Points exist within a region (including circle)
  • Region contains point

Points-within-region query (circle)

This example Python script is a points-within-a-region query.

def query_pwr(args,client):

"""Construct a GeoJSON region."""
region = aerospike.GeoJSON({
'type': 'Polygon',
'coordinates': [[[-122.500000,37.000000],
[-121.000000, 37.000000],
[-121.000000, 38.080000],
[-122.500000, 38.080000],
[-122.500000, 37.000000]]]})

"""Construct the query predicate."""
query = client.query(args.nspace, args.set)
predicate = aerospike.predicates.geo_within_geojson_region(LOCBIN, region.dumps())
query.where(predicate)

"""Define callback to process query result."""
def callback((key, metadata, record)):
records.append(record)

"""Make the actual query!"""

query.foreach(callback)

Example using AQL

  • Insert data representing a point into Aerospike.
aql> INSERT INTO test.testset (PK, geo_query_bin) VALUES (2, GEOJSON('{"type": "Point", "coordinates": [1,1]}'))
  • Query a region to see if it contains that point.
aql> SELECT * FROM test.testset WHERE geo_query_bin CONTAINS GeoJSON('{"type":"Polygon", "coordinates": [[[0,0], [0, 10], [10, 10], [10, 0], [0,0]]]}'))
+----------------------------------------------------+
| geo_query_bin |
+----------------------------------------------------+
| GeoJSON('{"type": "Point", "coordinates": [1,1]}') |
+----------------------------------------------------+
1 row in set (0.004 secs)

Region-contains-points query

This example C++ script is a region-contains-points query.

// Callback function to process each record response
bool
query_cb(const as_val * valp, void * udata)
{
if (!valp)
return true; // query complete

char const * valstr = NULL;

as_record * recp = as_record_fromval(valp);
if (!recp)
fatal("query callback returned non-as_record object");
valstr = as_record_get_str(recp, g_valbin);

__sync_fetch_and_add(&g_numrecs, 1);

cout << valstr << endl;

return true;
}

// Main query function
void
query_prcp(aerospike * asp, double lat, double lng)
{
char point[1024];

// Construct a GeoJSON point.
snprintf(point, sizeof(point),
"{ \"type\": \"Point\", \"coordinates\": [%0.8f, %0.8f] }",
lng, lat);

// Construct the query object.
as_query query;
as_query_init(&query, g_namespace.c_str(), g_set.c_str());

as_query_where_inita(&query, 1);
as_query_where(&query, g_rgnbin, as_geo_contains(point));

// Make the actual query.
as_error err;
if (aerospike_query_foreach(asp, &err, NULL,
&query, query_cb, NULL) != AEROSPIKE_OK)
throwstream(runtime_error,
"aerospike_query_foreach() returned "
<< err.code << '-' << err.message);

as_query_destroy(&query);
}

Example using AQL

  • Insert data representing a region into Aerospike.
aql> INSERT INTO test.testset (PK, geo_query_bin) VALUES (1, GEOJSON('{"type": "Polygon", "coordinates": [[[0,0], [0, 10], [10, 10], [10, 0], [0,0]]]}'))
  • Query for regions containing a certain point.
AQL> SELECT * FROM test.testset WHERE geo_query_bin CONTAINS GeoJSON('{"type":"Point", "coordinates": [1, 1]}')
+---------------------------------------------------------------------------------------------+
| geo_query_bin |
+---------------------------------------------------------------------------------------------+
| GeoJSON('{"type": "Polygon", "coordinates": [[[0,0], [0, 10], [10, 10], [10, 0], [0,0]]]}') |
+---------------------------------------------------------------------------------------------+
1 row in set (0.017 secs)

Query filters

To extend the capabilities of both queries, use User-Defined Functions (UDFs) to filter the result set.

This example Python script demonstrates using a filter UDF.

def query_circle(args, client):
"""Query for records inside a circle."""
query = client.query(args.nspace, args.set)
predicate = aerospike.predicates.geo_within_radius(LOCBIN,
args.longitude,
args.latitude,
args.radius)
query.where(predicate)

# Search with UDF amenity filter
query.apply('filter_by_amenity', 'apply_filter', [args.amenity,])
query.foreach(print_value)

Where the apply_filter Lua function is the following:

local function select_value(rec)
return rec.val
end

function apply_filter(stream, amen)
local function match_amenity(rec)
return rec.map.amenity and rec.map.amenity == amen
end
return stream : filter(match_amenity) : map(select_value)
end

Index on list/map

It is also possible to index and query on list or map elements with GeoJSON data type:

# create a secondary index for numeric values of test.demo records whose 'points' bin is a list of GeoJSON points
client.index_list_create('test', 'demo', 'points', aerospike.INDEX_GEO2DSPHERE, 'demo_point_nidx')

predicate = aerospike.predicates.geo_within_radius('points',
args.longitude,
args.latitude,
args.radius,
aerospike.INDEX_GEO2DSPHERE)

query = client.query('test', 'demo')
query.where(predicate);

The above creates an index on GeoJSON list elements and constructs the query predicate using the index.

Aerospike GeoJSON extension

Use the Aerospike AeroCircle geometry object to store circles along with regular polygons.

This example script specifies a circle with a radius of 300 meters at longitude/latitude -122.250629, 37.871022.

{"type": "AeroCircle", "coordinates": [[-122.250629, 37.871022], 300]}

GeoJSON parsing

On data insert/update, Aerospike only recognizes Point, Polygon, MultiPolygon, and AeroCircle GeoJSON geometry objects, which are indexable objects. Unsupported GeoJSON objects return an AEROSPIKE_ERR_GEO_INVALID_GEOJSON result code 160 (for example, LineString or MultiLineString fail on insert). Holes can be Polygon objects, per the GeoJSON Format Specification.

note

Polygon loop definitions must wind counter-clockwise.

Aerospike supports the Feature operator, which allows groups of geometry objects and user-specified properties; however, Feature Collection is not supported.

Invalid GeoJSON objects are caught on insert/update. For example, an object defined as point instead of Point fails.

Per the GeoJSON IETF recommendation, the Coordinate System is WGS84. Explicit specification of a coordinate reference system (CRS) is ignored.

Configuration parameters

ParameterDatatypeDefaultDescription
max-cellsInteger8Defines the maximum number of cells used in the approximation. Increasing this value improves accuracy but may affect query performance.
max-levelInteger1Defines the minimum size of the cell to be used in the approximation. Tuning this can make query results more accurate.
min-levelInteger1Defines the size of the maximum cell to be used in the approximation. Should generally be set to 1; increasing too much may cause queries to fail.
earth-radius-metersInteger6371000Specifies Earth's radius in meters. Used for geographical calculations.
level-modInteger1Specifies the multiple for levels to be used, effectively increasing the branching factor of the S2 Cell Id hierarchy.
strictBooleantrueWhen true, performs additional validation on results to ensure they fall within the query region. When false, returns results as-is, which may include points outside the query region.

max-cells visualization

Here’s an example that shows how RegionCoverer covers a specified region with max-cells set to different values. With a higher value of max-cells, the approximation becomes more accurate.

With max-cells = 10:

max-cells 10

With max-cells = 30:

max-cells 30

With max-cells = 100:

max-cells 1000

max-level visualization

Here’s an example to see RegionCoverer covering a specified region and how tuning max-level can make query results more accurate. For this example, min-level is set to 1, and max-cells is set to 10.

With max-level=12,

max-level 12

With max-level=30,

max-level 30

Create a geospatial application

To develop a geospatial application:

  1. Install and configure the Aerospike server.
  2. Create a Geo2DSphere index on a namespace-set-bin combination.
  3. Construct and insert GeoJSON Point data.
  4. Construct a Points-within-Region predicate (where clause), make a query request, and process the records returned.
  5. (alternate) Construct and insert GeoJSON Polygon/MultiPolygon data.
  6. (alternate) Construct a Region-contains-Point predicate, make a query request, and process the records returned.

Known limitations

  • Using UDFs to insert or update GeoJSON data types is not supported.
  • Duplicate records can be returned.
  • For namespaces with data-in-memory true, GeoJSON particles allocate up to 2KB more than the reported particle size, which can lead to high memory consumption in some cases. This problem has been corrected in Aerospike Server versions 4.9.0+, 4.8.0.8, 4.7.0.12, 4.6.0.14, 4.5.3.16, 4.5.2.16, 4.5.1.21, 4.5.0.24.