Project

General

Profile

Feature #39369

Feature #39478: mgr/dashboard: new RGW workflows & RGW enhancements

Feature #39494: mgr/dashboard: Add overview landing page for RGW

mgr/dashboard: show RGW multi-site sync status info

Added by Alfonso Martínez 12 months ago. Updated 4 months ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
dashboard/rgw
Target version:
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

When RGW multi-site is configured,
dashboard should provide the appropriate info related to:

radosgw-admin sync status
radosgw-admin metadata sync status

as well as any related info useful to admin/operator.

History

#1 Updated by Alfonso Martínez 12 months ago

  • Description updated (diff)

#2 Updated by Stephan Müller 12 months ago

  • Status changed from New to Pending Backport

#3 Updated by Stephan Müller 12 months ago

  • Status changed from Pending Backport to New

#4 Updated by Lenz Grimmer 12 months ago

  • Tags set to monitoring

According to http://docs.ceph.com/docs/master/man/8/radosgw-admin/ , the commands are radosgw-admin metadata sync status and radosgw-admin data sync status.
We need to check if this information can be obtained via the RadosGW Admin Ops API.

#5 Updated by Casey Bodley 12 months ago

the intent is for rgw to provide a new admin api that returns a json representation of the information currently available in 'radosgw-admin sync status'. i'll follow up with more detail

#6 Updated by Alfonso Martínez 12 months ago

  • Parent task set to #39478

#7 Updated by Alfonso Martínez 5 months ago

  • Parent task changed from #39478 to #39494

#8 Updated by Alfonso Martínez 5 months ago

  • Assignee changed from Alfonso Martínez to Albin Antony

#9 Updated by Albin Antony 4 months ago

  • Status changed from New to In Progress
  • Pull request ID set to 32206

#10 Updated by Casey Bodley 4 months ago

Radosgw's multisite replication is active-active, meaning that every zone can be syncing from every other zone in its group. To accomplish this, each zone is logging all of the changes that happen locally in its data/metadata logs. Other zones then replay those logs and attempt to apply the same changes, and they store their progress in each of those logs with a 'sync status marker'. These data logs and metadata logs are 'sharded' across several objects, so the sync status tracks a separate marker for each shard.

In an example configuration with three zones (na-1 na-2 na-3) in a single zonegroup (na), the 'radosgw-admin sync status' command run on zone na-3 displays its status against both na-1 and na-2:

[na-3] $ radosgw-admin sync status
          realm 384990e4-6d9e-48f6-8c40-91666f6a795b (dev)
      zonegroup 7da10060-f7d2-404d-a2b1-987b32ef4e58 (na)
           zone 07aabc9c-3326-458d-b88c-31565669365a (na-3)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 08c6d942-05ba-4fc2-9a59-f5fe87c4d521 (na-1)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source
                source: 68a9c64e-cc5f-41b6-a452-6c02b5f694f5 (na-2)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

So the dashboard's status page should similarly be from the perspective of a single target zone, and show its status with respect to each of its peer source zones.

To determine whether sync on a given log shard is behind its source zone, we compare the timestamp from the log entry associated with our marker with the timestamp of the most recent log entry.

So we first have to query the target zone's sync status markers with a request to RGWOp_MDLog_Status/RGWOp_DATALog_Status. Then from the source zone, we make two requests for each shard - one to read the most recent log entry info with RGWOp_MDLog_ShardInfo/RGWOp_DATALog_ShardInfo, and another to list the entry at our associated sync status marker for that shard with RGWOp_MDLog_List/RGWOp_DATALog_List. That gives us the two timestamps for each shard, and we can calculate the difference to see how far it is behind.

This does require a lot of rest requests, because there are 128 data log shards and 64 metadata log shards by default. So I'd recommend two new CompareStatus rest apis (for MDLog and DATALog) that take as input the list of shard markers from RGWOp_*Log_Status, do all of the per-shard work done by the _ShardInfo and _List apis, and return a list of per-shard results (ie the timestamps and latest log marker).

Because we want the dashboard to graph how far behind each shard is, we'll want something to poll for these comparisons regularly. This polling probably belongs in a dedicated manager module that writes all of this data into prometheus.

Then the dashboard page for sync status could start with a simple graph, where each line corresponds to one source zone, whose values are the maximum values of all its log shards. You could then have a detailed page for each source zone showing lines for all of its shards.

#11 Updated by Casey Bodley 4 months ago

  • Pull request ID deleted (32206)

#12 Updated by Ernesto Puerta 4 months ago

Thanks a lot for the detailed explanation, Casey!

The feedback we had from consultants and SAs when we checked with them for the dashboard-RGW multisite work is that customers wanted to have some operator-meaningful metric for the sync status. They mentioned explicitly:
  • number of out-of-sync objects
  • estimated time of completion
  • sync rate in obj/s
I guess that from the raw output provided by "sync status" is impossible to get any of those, right? In that case, and being practical, from simplest to most detailed "cooked" metric, what about the following?:
  1. OK/NOK sync status: this is straight-forward.
  2. Sync skew: in % of shards (I guess this is not a very accurate description of the work ahead, right)?
  3. Estimated time of completion based on the above % evolving over time.
  4. Raw output: in case users want it, we could display the detailed sync status output.

#13 Updated by Casey Bodley 4 months ago

Ernesto Puerta wrote:

The feedback we had from consultants and SAs when we checked with them for the dashboard-RGW multisite work is that customers wanted to have some operator-meaningful metric for the sync status. They mentioned explicitly:
  • number of out-of-sync objects
  • estimated time of completion
  • sync rate in obj/s

The sync rate is the only part we can account on, which comes from the perf counters added in https://github.com/ceph/ceph/pull/26722

I guess that from the raw output provided by "sync status" is impossible to get any of those, right? In that case, and being practical, from simplest to most detailed "cooked" metric, what about the following?:
  1. OK/NOK sync status: this is straight-forward.
  2. Sync skew: in % of shards (I guess this is not a very accurate description of the work ahead, right)?
  3. Estimated time of completion based on the above % evolving over time.
  4. Raw output: in case users want it, we could display the detailed sync status output.

Those sound like reasonable ways to summarize the data provided by radosgw apis. I would be cautious about saying anything about 'completion' of replication, though, because that would imply that the source zone stopped writing anything new. The timestamp comparisons are more meaningful here, because they tell you approximately how long it would take for a new write on the source zone to be replicated on the target zone.

#14 Updated by Ernesto Puerta 4 months ago

Thanks a lot, Casey, for this feedback.

Could we have the timestamp delta avail in the sync status output?

Also available in: Atom PDF