Skip to main content
Loading

Lifecycle of XDR record shipment, with metrics

A record progresses through a lifecycle of various states of XDR shipment. A record's states in its lifecycle are logged with various XDR metrics that indicate progress of the shipment.

You can start writing records as soon as you enable a namespace. The system starts logging in the in-memory transaction queue even if the datacenter (DC) is not connected at that point. After the connection is established, the system starts shipping.

The following diagram shows the lifecycle of a record from a single partition in a single namespace in a single DC. In actuality, these processes occur for all partitions and namespaces.

  • Dashed boxes are processes discussed immediately below the diagram.
  • Metric/state names are bulleted.

Fig. XDR record shipment lifecycle, with metrics



  1. After a successful write in the source partition, the record is submitted to the per-partition in-memory transaction queue of each DC.
  2. The DC thread picks the record from the transaction queue and forwards it to the service thread.
  3. The service thread reads the record locally, prepares the record for shipment, and ships it to the remote destination.
note

In strong consistency mode, if the record is in an UNREPLICATED state, XDR triggers a re-replication by inserting a transaction into the internal transaction queue. A service thread picks up the transaction from the internal transaction queue and checks if the transaction timed out.

  • If it timed out, it will not proceed further with re-replication and XDR will not ship the record to the destination. A future client read/write will trigger re-replication and may succeed in shipping to destination.
  • Otherwise, the service thread re-replicates the record. The re-replication makes an entry into the XDR in-memory transaction queue. This re-replication also may time out and leave the reacord in UNREPLICATED state. However, the entry in the XDR in-memory transaction queue will trigger another round of re-replication immediately.

The sc-replication-wait-ms configuration parameter provides a default delay in strong consistency mode to prevent XDR from attempting to ship records too fast, before their initial replication is complete.

  1. The remote destination attempts to write the record and returns the completion state of the transaction to the source datacenter. The completion state can be success, temporary failure (like key busy/device overload), or permanent error (like record too big).
  2. Based on the response, the service thread might put the record in the retry queue. If a retry is necessary, the shipment process of the record repeats from the retry queue.

Metrics of XDR processes

For precise definitions of these metrics, click the linked metric names.

Record stateMetric nameDescription
Waiting for processing and processing startedin_queue

in_progress
The record is in the XDR in-memory queue and is awaiting shipment.-XDR is actively shipping the record.
Retryingretry_conn_reset

retry_dest

retry_no_node
This is a transient state. The shipment is being retried due to some error. Retries continue until one of the completed states is achieved.
Recoveringrecoveries

recoveries_pending
This is a transient state that indicates internal-to-XDR "housekeeping". If XDR cannot find record keys in its in-memory transaction queue, it has to read the primary index to recover those keys for processing.
Completedsuccess

abandoned

not_found

filtered_out
Shipment is complete. The state of shipment is either success or one that indicates a non-success result.