Project

General

Profile

Feature #39369

Feature #39478: mgr/dashboard: new RGW workflows & RGW enhancements

Feature #39494: mgr/dashboard: Add overview landing page for RGW

mgr/dashboard: show RGW multi-site sync status info

Added by Alfonso Martínez almost 5 years ago. Updated over 2 years ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
Component - RGW
Target version:
% Done:

100%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

When RGW multi-site is configured,
dashboard should provide the appropriate info related to:

radosgw-admin sync status
radosgw-admin metadata sync status

as well as any related info useful to admin/operator.


Subtasks

Feature #45310: mgr/dashboard: add grafana dashboards for rgw multisite sync infoResolvedAlfonso Martínez

Bug #45311: rgw: provide right format for rgw sync perf. counters.Closed

History

#1 Updated by Alfonso Martínez almost 5 years ago

  • Description updated (diff)

#2 Updated by Stephan Müller almost 5 years ago

  • Status changed from New to Pending Backport

#3 Updated by Stephan Müller almost 5 years ago

  • Status changed from Pending Backport to New

#4 Updated by Lenz Grimmer almost 5 years ago

  • Tags set to monitoring

According to http://docs.ceph.com/docs/master/man/8/radosgw-admin/ , the commands are radosgw-admin metadata sync status and radosgw-admin data sync status.
We need to check if this information can be obtained via the RadosGW Admin Ops API.

#5 Updated by Casey Bodley almost 5 years ago

the intent is for rgw to provide a new admin api that returns a json representation of the information currently available in 'radosgw-admin sync status'. i'll follow up with more detail

#6 Updated by Alfonso Martínez almost 5 years ago

  • Parent task set to #39478

#7 Updated by Alfonso Martínez over 4 years ago

  • Parent task changed from #39478 to #39494

#8 Updated by Alfonso Martínez over 4 years ago

  • Assignee changed from Alfonso Martínez to Albin Antony

#9 Updated by Albin Antony over 4 years ago

  • Status changed from New to In Progress
  • Pull request ID set to 32206

#10 Updated by Casey Bodley over 4 years ago

Radosgw's multisite replication is active-active, meaning that every zone can be syncing from every other zone in its group. To accomplish this, each zone is logging all of the changes that happen locally in its data/metadata logs. Other zones then replay those logs and attempt to apply the same changes, and they store their progress in each of those logs with a 'sync status marker'. These data logs and metadata logs are 'sharded' across several objects, so the sync status tracks a separate marker for each shard.

In an example configuration with three zones (na-1 na-2 na-3) in a single zonegroup (na), the 'radosgw-admin sync status' command run on zone na-3 displays its status against both na-1 and na-2:

[na-3] $ radosgw-admin sync status
          realm 384990e4-6d9e-48f6-8c40-91666f6a795b (dev)
      zonegroup 7da10060-f7d2-404d-a2b1-987b32ef4e58 (na)
           zone 07aabc9c-3326-458d-b88c-31565669365a (na-3)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 08c6d942-05ba-4fc2-9a59-f5fe87c4d521 (na-1)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source
                source: 68a9c64e-cc5f-41b6-a452-6c02b5f694f5 (na-2)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

So the dashboard's status page should similarly be from the perspective of a single target zone, and show its status with respect to each of its peer source zones.

To determine whether sync on a given log shard is behind its source zone, we compare the timestamp from the log entry associated with our marker with the timestamp of the most recent log entry.

So we first have to query the target zone's sync status markers with a request to RGWOp_MDLog_Status/RGWOp_DATALog_Status. Then from the source zone, we make two requests for each shard - one to read the most recent log entry info with RGWOp_MDLog_ShardInfo/RGWOp_DATALog_ShardInfo, and another to list the entry at our associated sync status marker for that shard with RGWOp_MDLog_List/RGWOp_DATALog_List. That gives us the two timestamps for each shard, and we can calculate the difference to see how far it is behind.

This does require a lot of rest requests, because there are 128 data log shards and 64 metadata log shards by default. So I'd recommend two new CompareStatus rest apis (for MDLog and DATALog) that take as input the list of shard markers from RGWOp_*Log_Status, do all of the per-shard work done by the _ShardInfo and _List apis, and return a list of per-shard results (ie the timestamps and latest log marker).

Because we want the dashboard to graph how far behind each shard is, we'll want something to poll for these comparisons regularly. This polling probably belongs in a dedicated manager module that writes all of this data into prometheus.

Then the dashboard page for sync status could start with a simple graph, where each line corresponds to one source zone, whose values are the maximum values of all its log shards. You could then have a detailed page for each source zone showing lines for all of its shards.

#11 Updated by Casey Bodley over 4 years ago

  • Pull request ID deleted (32206)

#12 Updated by Ernesto Puerta over 4 years ago

Thanks a lot for the detailed explanation, Casey!

The feedback we had from consultants and SAs when we checked with them for the dashboard-RGW multisite work is that customers wanted to have some operator-meaningful metric for the sync status. They mentioned explicitly:
  • number of out-of-sync objects
  • estimated time of completion
  • sync rate in obj/s
I guess that from the raw output provided by "sync status" is impossible to get any of those, right? In that case, and being practical, from simplest to most detailed "cooked" metric, what about the following?:
  1. OK/NOK sync status: this is straight-forward.
  2. Sync skew: in % of shards (I guess this is not a very accurate description of the work ahead, right)?
  3. Estimated time of completion based on the above % evolving over time.
  4. Raw output: in case users want it, we could display the detailed sync status output.

#13 Updated by Casey Bodley over 4 years ago

Ernesto Puerta wrote:

The feedback we had from consultants and SAs when we checked with them for the dashboard-RGW multisite work is that customers wanted to have some operator-meaningful metric for the sync status. They mentioned explicitly:
  • number of out-of-sync objects
  • estimated time of completion
  • sync rate in obj/s

The sync rate is the only part we can account on, which comes from the perf counters added in https://github.com/ceph/ceph/pull/26722

I guess that from the raw output provided by "sync status" is impossible to get any of those, right? In that case, and being practical, from simplest to most detailed "cooked" metric, what about the following?:
  1. OK/NOK sync status: this is straight-forward.
  2. Sync skew: in % of shards (I guess this is not a very accurate description of the work ahead, right)?
  3. Estimated time of completion based on the above % evolving over time.
  4. Raw output: in case users want it, we could display the detailed sync status output.

Those sound like reasonable ways to summarize the data provided by radosgw apis. I would be cautious about saying anything about 'completion' of replication, though, because that would imply that the source zone stopped writing anything new. The timestamp comparisons are more meaningful here, because they tell you approximately how long it would take for a new write on the source zone to be replicated on the target zone.

#14 Updated by Ernesto Puerta over 4 years ago

Thanks a lot, Casey, for this feedback.

Could we have the timestamp delta avail in the sync status output?

#15 Updated by Casey Bodley almost 4 years ago

Casey Bodley wrote:

To determine whether sync on a given log shard is behind its source zone, we compare the timestamp from the log entry associated with our marker with the timestamp of the most recent log entry.

So we first have to query the target zone's sync status markers with a request to RGWOp_MDLog_Status/RGWOp_DATALog_Status. Then from the source zone, we make two requests for each shard - one to read the most recent log entry info with RGWOp_MDLog_ShardInfo/RGWOp_DATALog_ShardInfo, and another to list the entry at our associated sync status marker for that shard with RGWOp_MDLog_List/RGWOp_DATALog_List. That gives us the two timestamps for each shard, and we can calculate the difference to see how far it is behind.

RGWOp_MDLog_Status

GET /admin/log/?type=metadata&status

{
  "info": {
    "status": "sync",
    "num_shards": 64,
    "period": "007cab9b-bed1-49be-8bf9-62fe3c15942f",
    "realm_epoch": 2
  },
  "markers": [
    {
      "key": 0,
      "val": {
        "state": 1,
        "marker": "1_1588266637.903010_76.1",
        "next_step_marker": "",
        "total_entries": 6,
        "pos": 0,
        "timestamp": "2020-04-30T17:10:37.903010Z",
        "realm_epoch": 2
      }
    },
...
    {
      "key": 63,
      "val": {
        "state": 1,
        "marker": "1_1588266662.521750_379.1",
        "next_step_marker": "",
        "total_entries": 0,
        "pos": 0,
        "timestamp": "2020-04-30T17:11:02.521750Z",
        "realm_epoch": 2
      }
    }
  ]
}

RGWOp_MDLog_ShardInfo

GET /admin/log/?type=metadata&id=0&info

{
  "marker":"1_1588266637.903010_76.1",
  "last_update":"2020-04-30T17:10:37.903010Z" 
}

RGWOp_DATALog_Status

GET /admin/log/?type=data&status

{
  "info": {
    "status": "sync",
    "num_shards": 128,
    "instance_id": 3292530701570187300
  },
  "markers": [
    {
      "key": 0,
      "val": {
        "status": "incremental-sync",
        "marker": "1_1588266691.181559_19.1",
        "next_step_marker": "",
        "total_entries": 104,
        "pos": 0,
        "timestamp": "2020-04-30T17:11:31.181559Z" 
      }
    },
...
    {
      "key": 127,
      "val": {
        "status": "incremental-sync",
        "marker": "1_1588266716.830859_117.1",
        "next_step_marker": "",
        "total_entries": 105,
        "pos": 0,
        "timestamp": "2020-04-30T17:11:56.830859Z" 
      }
    }
  ]
}

RGWOp_DATALog_ShardInfo

GET /admin/log/?type=data&id=0&info

{
  "marker":"1_1588266691.181559_19.1",
  "last_update":"2020-04-30T17:11:31.181559Z" 
}

As of nautilus 14.2.7 (https://tracker.ceph.com/issues/43373), the DATALog_Status api should be returning valid timestamps, so the extra requests to MDLog_List/DATALog_List shouldn't be necessary.

#16 Updated by Albin Antony almost 4 years ago

So, Is the timestamp returned by DATALog_Status api current timestamp?

#17 Updated by Casey Bodley almost 4 years ago

Albin Antony wrote:

So, Is the timestamp returned by DATALog_Status api current timestamp?

yes, and the "last_update" returned by the ShardInfo APIs would be the latest timestamp

#18 Updated by Christian Rohmann over 2 years ago

I recently raised the issue https://tracker.ceph.com/issues/52903 about providing this kind of sync status data via (the) Prometheus metrics. Certainly the Ceph dashboard integration of the sync status is important, but please do not leave out external monitoring which people use to be alerted or monitor performance of their systems and their replication.

Also available in: Atom PDF