Monitoring CDL through Grafana Dashboards

You can monitor various activities in CDL through the Grafana dashboard called CDL Dashboard, which is bundled by default. The CDL Dashboard displays the TPS towards CDL for various pods like cdl-endpoint, Slot, Index and the response time taken for each operation.

A sample Grafana CDL Summary dashboard displaying the total records by Type, SliceName, SystemID and so on is shown below:

Grafana CDL Dashboard - Summary

A sample Grafana CDL dashboard displaying the CDL TPS and response time graphs is shown below:

Grafana CDL Dashboard - CDL TPS and Response Time

The Grafana CDL dashboard is enhanced to show the Geo Replication status. A sample Grafana dashboard displaying the Geo Replication status and other details is shown below.

Grafana CDL Dashboard - GR Connection Status

The GR connection, Index replication and slot replication panels, and their descriptions are listed in the table below:

GR Connection Status

Panel

Description

Remote Site connection status

The remote site connections active per endpoint pod (in percentage). If it reaches 0 for more than 5 minutes, an alert is triggered.

Index to Kafka connection status

The average kafka connections currently active from each index pod.

Kafka Pod status

The readiness status of Kafka pod and mirrorMaker pods.

Replication Requests Sent/Local Requests Received %

The ratio of replication requests sent to the remote site vs the local requests received by the NF. If the ratio goes below 90% for more than 5 minutes, an alert is triggered.

Kafka Remote Replication delay per pod

The total delay for replicating records to the partner site via Kafka. If the delay is more than 10 seconds in replicating the records for more than 5 minutes, then an alert is triggered.

Total Remote requests dropped

The total number of remote requests that have been dropped due to the queue being full.

Index Replication

Panel

Description

Kafka Publish TPS per pod

The per index pod total rate of kafka write requests (publish).

Kafka Remote Replication TPS per pod

The per index pod total rate of incoming kafka requests (consume) from the partner site.

Slot Replication

Panel

Description

Slot Geo Replication Requests Sent

The total rate of replication requests from cdl-ep to remote site per operation.

Slot Geo Replication Requests Received

The total rate of replication requests received by the slot pods per operation type.

Slot Checksum Mismatch

The total rate of slot checksum mismatch.

Slot Reconciliation

The total rate of slot reconciliation due to slot data checksum mismatch.