Version: Operator 1.x.x

Troubleshooting

Pods stuck in pending state

After an Aerospike cluster has been created or updated if the pods are stuck with "Pending" status like so,

NAME          READY   STATUS      RESTARTS   AGE
aerocluster-0-0     1/1     Pending     0          48s
aerocluster-0-1     1/1     Pending     0          48s

describe the pod to find the reason for scheduling failure.

kubectl -n aerospike describe pod aerocluster-0-0

Under the events section you will find the reason for the pod not being scheduled. For example

QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  9m27s (x3 over 9m31s)  default-scheduler  0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
  Warning  FailedScheduling  20s (x9 over 9m23s)    default-scheduler  0/1 nodes are available: 1 node(s) didn't match Pod's node affinity.

Possible reasons are

Storage class incorrect or not created. Please see persistent storage configuration for details.
1 node(s) didn't match Pod's node affinity - Invalid zone, region, racklabel etc. for the rack configured for this pod.
Insufficient resources, CPU or memory available to schedule more pods.

Pods keep crashing

After an Aerospike cluster has been created or updated if the pods are stuck with "Error" or "CrashLoopBackOff" status like so,

NAME          READY   STATUS      RESTARTS   AGE
aerocluster-0-0     1/1     Error     0          48s
aerocluster-0-1     1/1     CrashLoopBackOff     2          48s

Check the following logs to see if pod initialization failed or the Aerospike Server stopped.

Init logs

kubectl -n aerospike logs aerocluster-0-0 -c aerospike-init

Server logs

kubectl -n aerospike logs aerocluster-0-0 -c aerospike-server

Possible reasons are

Missing or incorrect feature key file - Fix by deleting the Aerospike secret and recreating it with correct feature key file. See Aerospike secrets for details.
Bad Aerospike configuration - The operator tries to validate the configuration before applying it to the cluster. However it's still possible to misconfigure the Aerospike server. The offending paramter will be logged in the server logs and should be fixed and applied again to the cluster. See Aerospike configuration change for details.

Error connecting to the cluster from outside Kubernetes

If the cluster runs fine as verfied by the pod status and asadm (see connecting with asadm), Ensure that firewall allows inbound traffic to the Kubenetes cluster for the Aerospike ports. See Port access for details.