Scaling Kafka Sink Connectors

When Kafka Connect is running in distributed mode, you can increase the throughput of changes from Kafka to your Aerospike cluster. You can do so by adding more workers to your Kafka Connect cluster, by adding more tasks for running the connector, or by doing both.

tip

Before deciding to use any of these options, ensure that you understand the implications of each on the available capacity in your Kafka Connect cluster.

Check existing distribution

Run this curl command to view how tasks for the connector are currently distributed in your Kafka Connect cluster:

curl -X GET --header "Content-Type:application/json" ${kafkaEndpoint}/connectors/aerospike-sink/status

where kafkaEndpoint is the REST endpoint for the Kafka Connect service. You can make requests to any cluster member; the REST API automatically forwards requests, if required.

This sample output shows that the Kafka inbound connector is running on 192.168.0.1:8083, and that it is divided into two tasks, each running on a separate worker.

HTTP/1.1 200 OK

{
    "name": "aerospike-sink",
    "connector": {
        "state": "RUNNING",
        "worker_id": "192.168.0.1:8083"
    },
    "tasks":
    [
        {
            "id": 0,
            "state": "RUNNING",
            "worker_id": "192.168.0.1:8083"
        },
        {
            "id": 1,
            "state": "RUNNING",
            "worker_id": "192.168.0.2:8083"
        }
    ]
}

Add tasks

If you want to add one or more tasks, follow these steps:

Set tasks.max

Set this variable:

aerosink =
{
  "tasks.max": "<value>"
}

tasks.max: The changed maximum number of tasks that can be created for the connector.

Set kafkaEndpoint

On the same system, set this variable:

kafkaEndpoint="<URI>"

kafkaEndpoint: This is the REST endpoint for the Kafka Connect service. You can make requests to any cluster member; the REST API automatically forwards requests, if required.

Update the tasks

Issue a request to Kafka Connect's REST interface. The request updates all of the connector tasks together.

curl -X PUT --header "Content-Type:application/json" --data ${aerosink} ${kafkaEndpoint}/connectors/aerospike-sink/config

Verify changes

Use this GET request to verify the connector is using the changed configuration:

curl -X GET --header "Content-Type:application/json" ${kafkaEndpoint}/connectors/aerospike-sink/status

Repeat this process if you want to add more tasks.

Add workers

If you want to add more workers to your Kafka Connect cluster, follow these steps.

Launch each worker

Run this command to launch each additional worker in distributed mode.

bin/connect-distributed <path-to-your-Kafka-Connect-config-file>

<path-to-your-Kafka-Connect-config-file>: The path to the file (including the filename and extension) that you are using to configure the workers in Kafka Connect.

Verify changes

After a few minutes, Use this GET request to view how tasks for the connector are now distributed in your Kafka Connect cluster:

curl -X GET --header "Content-Type:application/json" ${kafkaEndpoint}/connectors/aerospike-sink/status

Repeat this process if you want to add more workers.

Check existing distribution​

Add tasks​

Set tasks.max​

Set kafkaEndpoint​

Update the tasks​

Verify changes​

Add workers​

Launch each worker​

Verify changes​

Check existing distribution

Add tasks

Set tasks.max

Set kafkaEndpoint

Update the tasks

Verify changes

Add workers

Launch each worker

Verify changes