Creating Spark Applications to Access Aerospike Database

Creating a Spark application that can access an Aerospike database requires downloading the appropriate Spark connector's jar and then adding that to your application's environment.

Prerequisites for using the Spark connector

Ensure that you meet these prerequisites before installing Aerospike Connect for Spark:

Your Spark cluster must be at version 2.4.x ¹, 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, or 3.5.x.
From Spark connector version 4.0.0 and later, multiple scala versions are supported if supported by corresponding Apache Spark versions.

Aerospike Connect for Spark (aka the Spark connector) supported Apache Spark versions

Aerospike Connect for Spark versions	Supported Apache Spark version	Jfrog Artifactory versions
4.4.0	3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, 3.5.x	4.4.0-spark3.5-scala2.13-allshaded 4.4.0-spark3.5-scala2.12-allshaded 4.4.0-spark3.5-scala2.13-clientunshaded 4.4.0-spark3.5-scala2.12-clientunshaded 4.4.0-spark3.4-scala2.13-allshaded 4.4.0-spark3.4-scala2.12-allshaded 4.4.0-spark3.4-scala2.13-clientunshaded 4.4.0-spark3.4-scala2.12-clientunshaded 4.4.0-spark3.3-scala2.13-allshaded 4.4.0-spark3.3-scala2.12-allshaded 4.4.0-spark3.3-scala2.13-clientunshaded 4.4.0-spark3.3-scala2.12-clientunshaded 4.4.0-spark3.2-scala2.13-allshaded 4.4.0-spark3.2-scala2.12-allshaded 4.4.0-spark3.2-scala2.13-clientunshaded 4.4.0-spark3.2-scala2.12-clientunshaded 4.4.0-spark3.1-scala2.12-allshaded 4.4.0-spark3.1-scala2.12-clientunshaded 4.4.0-spark3.0-scala2.12-allshaded 4.4.0-spark3.0-scala2.12-clientunshaded
4.3.1	3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x	4.3.1-spark3.4-scala2.13-allshaded 4.3.1-spark3.4-scala2.12-allshaded 4.3.1-spark3.4-scala2.13-clientunshaded 4.3.1-spark3.4-scala2.12-clientunshaded 4.3.1-spark3.3-scala2.13-allshaded 4.3.1-spark3.3-scala2.12-allshaded 4.3.1-spark3.3-scala2.13-clientunshaded 4.3.1-spark3.3-scala2.12-clientunshaded 4.3.1-spark3.2-scala2.13-allshaded 4.3.1-spark3.2-scala2.12-allshaded 4.3.1-spark3.2-scala2.13-clientunshaded 4.3.1-spark3.2-scala2.12-clientunshaded 4.3.1-spark3.1-scala2.12-allshaded 4.3.1-spark3.1-scala2.12-clientunshaded 4.3.1-spark3.0-scala2.12-allshaded 4.3.1-spark3.0-scala2.12-clientunshaded
4.3.0	3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x	4.3.0-spark3.4-scala2.13-allshaded 4.3.0-spark3.4-scala2.12-allshaded 4.3.0-spark3.4-scala2.13-clientunshaded 4.3.0-spark3.4-scala2.12-clientunshaded 4.3.0-spark3.3-scala2.13-allshaded 4.3.0-spark3.3-scala2.12-allshaded 4.3.0-spark3.3-scala2.13-clientunshaded 4.3.0-spark3.3-scala2.12-clientunshaded 4.3.0-spark3.2-scala2.13-allshaded 4.3.0-spark3.2-scala2.12-allshaded 4.3.0-spark3.2-scala2.13-clientunshaded 4.3.0-spark3.2-scala2.12-clientunshaded 4.3.0-spark3.1-scala2.12-allshaded 4.3.0-spark3.1-scala2.12-clientunshaded 4.3.0-spark3.0-scala2.12-allshaded 4.3.0-spark3.0-scala2.12-clientunshaded
4.2.0	3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x	4.2.0-spark3.4-scala2.13-allshaded 4.2.0-spark3.4-scala2.12-allshaded 4.2.0-spark3.4-scala2.13-clientunshaded 4.2.0-spark3.4-scala2.12-clientunshaded 4.2.0-spark3.3-scala2.13-allshaded 4.2.0-spark3.3-scala2.12-allshaded 4.2.0-spark3.3-scala2.13-clientunshaded 4.2.0-spark3.3-scala2.12-clientunshaded 4.2.0-spark3.2-scala2.13-allshaded 4.2.0-spark3.2-scala2.12-allshaded 4.2.0-spark3.2-scala2.13-clientunshaded 4.2.0-spark3.2-scala2.12-clientunshaded 4.2.0-spark3.1-scala2.12-allshaded 4.2.0-spark3.1-scala2.12-clientunshaded 4.2.0-spark3.0-scala2.12-allshaded 4.2.0-spark3.0-scala2.12-clientunshaded
4.1.0	3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x	4.1.0-spark3.4-scala2.13-allshaded 4.1.0-spark3.4-scala2.12-allshaded 4.1.0-spark3.4-scala2.13-clientunshaded 4.1.0-spark3.4-scala2.12-clientunshaded 4.1.0-spark3.3-scala2.13-allshaded 4.1.0-spark3.3-scala2.12-allshaded 4.1.0-spark3.3-scala2.13-clientunshaded 4.1.0-spark3.3-scala2.12-clientunshaded 4.1.0-spark3.2-scala2.13-allshaded 4.1.0-spark3.2-scala2.12-allshaded 4.1.0-spark3.2-scala2.13-clientunshaded 4.1.0-spark3.2-scala2.12-clientunshaded 4.1.0-spark3.1-scala2.12-allshaded 4.1.0-spark3.1-scala2.12-clientunshaded 4.1.0-spark3.0-scala2.12-allshaded 4.1.0-spark3.0-scala2.12-clientunshaded
4.0.0	3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x	4.0.0-spark3.4-scala2.13-allshaded 4.0.0-spark3.4-scala2.12-allshaded 4.0.0-spark3.4-scala2.13-clientunshaded 4.0.0-spark3.4-scala2.12-clientunshaded 4.0.0-spark3.3-scala2.13-allshaded 4.0.0-spark3.3-scala2.12-allshaded 4.0.0-spark3.3-scala2.13-clientunshaded 4.0.0-spark3.3-scala2.12-clientunshaded 4.0.0-spark3.2-scala2.13-allshaded 4.0.0-spark3.2-scala2.12-allshaded 4.0.0-spark3.2-scala2.13-clientunshaded 4.0.0-spark3.2-scala2.12-clientunshaded 4.0.0-spark3.1-scala2.12-allshaded 4.0.0-spark3.1-scala2.12-clientunshaded 4.0.0-spark3.0-scala2.12-allshaded 4.0.0-spark3.0-scala2.12-clientunshaded
3.5.5	3.0.x, 3.1.x, 3.2.x	3.5.5_spark_3.0_allshaded 3.5.5_spark_3.0_clientunshaded 3.5.5_spark_3.1_allshaded 3.5.5_spark_3.1_clientunshaded 3.5.5_spark_3.2_allshaded 3.5.5_spark_3.2_clientunshaded
3.5.4	3.0.x, 3.1.x, 3.2.x
3.5.3	3.0.x, 3.1.x, 3.2.x
3.5.2	3.0.x, 3.1.x, 3.2.x
3.5.1	3.0.x, 3.1.x, 3.2.x
3.5.0	3.0.x, 3.1.x, 3.2.x
3.4.2	3.0.x, 3.1.x
3.4.1	3.0.x, 3.1.x
3.4.0	3.0.x, 3.1.x
3.3.1_spark3.1	3.1.x
3.3.1_spark3.0	3.0.x
3.2.2	3.0.x
3.2.1	3.0.x
3.2.0	3.0.x
3.1.1	3.0.x
3.1.0	3.0.x
3.0.3	3.0.x
3.0.2	3.0.x
3.0.1	3.0.x
3.0.0	3.0.x
2.9.0	2.4.x ¹
2.8.1	2.4.x ¹

¹ Apache Spark version 2.4.8 is not supported.

caution

2.9.0 is likely to be the last release which is compatible with the Apache Spark 2.4.7 binary. Aerospike has ceased developing new features to support Spark 2.x.y. However, we will make bug fixes available until October 12, 2023. Please plan to move to Apache Spark 3.x and use the corresponding latest connector version.

Jar naming convention:

To support multiple spark versions, we have changed the jar naming convention.

Starting from the 3.3.0 (until the 4.0.0) release, all binaries will be named as aerospike-spark_x_spark_y_z.jar, where x is the connector version, y is the spark version and z can be either allshaded or clientunshaded. The binary name aerospike-spark-3.3.0_spark3.1_allshaded.jar indicates the release version is 3.3.0, the supported spark version is 3.1.x and all the internal libraries are shaded. Similarly, aerospike-spark-3.3.0_spark3.1_clientunshaded.jar indicates that all libraries except the aerospike java client are shaded.
To accommodate support for multiple scala versions, from the 4.0.0 release onward, all the binaries follow the general format [connector-version]-[spark-version]-[supported-scala-version]-[allshaded/clientunshaded]. Please refer to the table above for all the supported versions.

To find out when these different versions of the Spark connector were released, see the "Aerospike Connect for Spark Release Notes".

info

Although Aerospike Connect for Spark was tested with versions of Apache Spark, it should work with the Spark versions available in DataProc in Google Cloud Platform (GCP) and EMR in Amazon Web Services (AWS).

The Java 8 SDK must be installed on the system on which you plan to run Aerospike Connect for Spark. (Tip: If you want to test with different versions of the Java 8 SDK, consider using sdkman to help you manage those versions.)
Your Aerospike Database Enterprise Edition cluster must be at version 5.0 or later if you plan to use Aerospike Connect for Spark version 2.0 or later.
Connector does not bundle Spark and Hadoop related binaries within its jar. This means your production system must have spark and hadoop installed.

Spark connector installation

Install using Jfrog artifactory

In build.sbt file, add Jfrog repository resolver resolvers += "Artifactory Realm" at "https://aerospike.jfrog.io/artifactory/spark-connector"
Specify dependency as "com.aerospike" %% "aerospike-spark" % <<version>> where version is the Jfrog Artifactory versions (listed in the above table) .

Install manually

Download the appropriate version of the connector based on which Apache Spark version (2.x or 3.x) is being used. Apache Spark version 2.4.8 is not supported.
You can download the .jar package from the Aerospike Downloads site.

Add the .jar package to your application's environment

You can do this in either of these ways:

If you plan to create a batch job or address the challenges of real-time business insights by leveraging the streaming job, write a Scala, Java, or Python application by following the interactive code in the Jupyter notebooks. Specify the downloaded JAR as a dependency. Once your Spark application is ready, submit it to the Spark cluster using either spark-submit or spark-shell. See Submitting Applications in the Spark documentation for detailed information.

Example using spark-submit

spark-submit --jars path-to-aerospike-spark-connector-jar --class application-entrypoint application.jar

If you plan to create a Jupyter notebook that uses the Spark connector, add the JAR path to the environment variables.
Example using Python
```
import os
os.environ["PYSPARK_SUBMIT_ARGS"] = '--jars aerospike-spark-assembly-2.7.0.jar pyspark-shell'
```
Example using Scala
```
launcher.jars = ["aerospike-spark-assembly-2.7.0.jar"]  
```
See our notebooks for other examples.

Prerequisites for using the Spark connector​

Spark connector installation​

Install using Jfrog artifactory​

Install manually​

Add the .jar package to your application's environment​

Prerequisites for using the Spark connector

Spark connector installation

Install using Jfrog artifactory

Install manually

Add the .jar package to your application's environment