Project

General

Profile

Actions

Feature #52903

open

mgr/prometheus: Add RADOSGW / rgw sync state and related metrics to prometheus output

Added by Christian Rohmann over 2 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

The mgr/prometheus module does not provide any state info on the radosgw sync (multisite)

While there are the metrics with prefixes ceph_data_sync_ or ceph_rgw_ already available, those only cover the performance metrics of rgw and the sync metrics such as ceph_data_sync_from_zone_poll_errors.

The actual state of the sync between sites can only be seen via radosgw-admin:

radosgw-admin sync status
          realm ce992742-88e7-4613-990a-95dad46fd14a (myrealm)
      zonegroup fd9a77a5-f6c6-4144-82a1-307111569c04 (myzonegroup)
           zone 3084933d-a1bb-43ef-870e-8e8f14dab718 (myzone)
  metadata sync no sync (zone is master)
      data sync source: 6c39f3b7-8234-4d7b-89e5-15ba573ed033 (theotherzone)
                        syncing
                        full sync: 81/128 shards
                        full sync: 6 buckets to sync
                        incremental sync: 47/128 shards
                        data is behind on 81 shards
                        behind shards: [0,1,3,4,5,6,8,10,11,12,14,17,20,22,23,24,25,26,27,29,30,31,32,33,34,35,36,37,42,43,44,47,48,49,56,57,58,60,61,62,63,64,66,67,69,70,71,73,74,76,78,80,81,82,85,86,88,90,91,96,97,99,103,105,106,108,109,111,112,113,114,115,116,117,118,119,120,122,123,124,125]

radosgw-admin sync status
          realm ce992742-88e7-4613-990a-95dad46fd14a (myrealm)
      zonegroup fd9a77a5-f6c6-4144-82a1-307111569c04 (myzonegroup)
           zone 3084933d-a1bb-43ef-870e-8e8f14dab718 (myzone)
  metadata sync no sync (zone is master)
      data sync source: 6c39f3b7-8234-4d7b-89e5-15ba573ed033 (theotherzone)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

I'd love to have metrics on general sync state ("caught up with source", "behind", ...) for metadata and data, but also about state of shards and numbers on buckets being synced.

This request is loosely related to https://tracker.ceph.com/issues/52638, but I believe the rgw sync state data is not even available via the mgr (yet).

Actions #1

Updated by Christian Rohmann over 2 years ago

This seems to also be related to https://tracker.ceph.com/issues/39369 which also suffers from unavailable data on the detailed sync status of RGW multisite.

Actions #2

Updated by Christian Rohmann over 2 years ago

Christian Rohmann wrote:

This seems to also be related to https://tracker.ceph.com/issues/39369 which also suffers from unavailable data on the detailed sync status of RGW multisite.

The discussion there on already available data via the admin API seems very valuable to identify which metrics would be required to properly judge the sync state of two zones, such as state per shard, replication lag, ...

Actions #3

Updated by Sebastian Wagner over 2 years ago

  • Project changed from Ceph to mgr
Actions

Also available in: Atom PDF