Project

General

Profile

Backport #58402

multisite replication issue on Quincy

Added by Adam Emerson about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
Release:
quincy
Crash signature (v1):
Crash signature (v2):

Description

We have encountered replication issues in our multisite settings with Quincy v17.2.3.

Our Ceph clusters are brand new. We tore down our clusters and re-deployed fresh Quincy ones before we did our test.
In our environment, we have 3 RGW nodes per site, each node has 2 instances for client traffic and 1 instance dedicated for replication.

Our test was done using cosbench with the following settings:
- 10 rgw users
- 3000 buckets per user
- write only
- 6 different object sizes with the following distribution:
1k: 17%
2k: 48%
3k: 14%
4k: 5%
1M: 13%
8M: 3%
- trying to write 10 million objects per object size bucket per user to avoid writing to the same objects
- no multipart uploads involved
The test ran for about 2 hours roughly from 22:50pm 9/14 to 1:00am 9/15. And after that, the replication tail continued for another roughly 4 hours till 4:50am 9/15 with gradually decreasing replication traffic. Then the replication stopped and nothing has been going on in the clusters since.

While we were verifying the replication status, we found many issues.
1. The sync status shows the clusters are not fully synced. However all the replication traffic has stopped and nothing is going on in the clusters.
Secondary zone:

          realm 8a98f19f-db58-4c09-bde6-ac89560d79b0 (prod-realm)
      zonegroup e041ea69-1e0b-4ad7-92f2-74b20aa3edf3 (prod-zonegroup)
           zone 1dadcf12-f44c-4940-8acc-9623a48b829e (prod-zone-tt)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: b68a526a-ffaa-4058-9903-6e7c6eac35bb (prod-zone-pw)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is behind on 2 shards
                        behind shards: [40,74]

Why the replication stopped even though the clusters are still not in-sync?

2. We can see some buckets are not fully synced and we are able to identified some missing objects in our secondary zone.
Here is an example bucket. This is its sync status in the secondary zone.

          realm 8a98f19f-db58-4c09-bde6-ac89560d79b0 (prod-realm)
      zonegroup e041ea69-1e0b-4ad7-92f2-74b20aa3edf3 (prod-zonegroup)
           zone 1dadcf12-f44c-4940-8acc-9623a48b829e (prod-zone-tt)
         bucket :mixed-5wrks-dev-4k-thisisbcstestload004178[b68a526a-ffaa-4058-9903-6e7c6eac35bb.89152.78])

    source zone b68a526a-ffaa-4058-9903-6e7c6eac35bb (prod-zone-pw)
  source bucket :mixed-5wrks-dev-4k-thisisbcstestload004178[b68a526a-ffaa-4058-9903-6e7c6eac35bb.89152.78])
                full sync: 0/101 shards
                incremental sync: 100/101 shards
                bucket is behind on 1 shards
                behind shards: [78]

3. We can see from the above sync status, the behind shard for the example bucket is not in the list of the behind shards for the system sync status. Why is that?

4. Data sync status for these behind shards doesn't list any "pending_buckets" or "recovering_buckets".
An example:

{
    "shard_id": 74,
    "marker": {
        "status": "incremental-sync",
        "marker": "00000000000000000003:00000000000003381964",
        "next_step_marker": "",
        "total_entries": 0,
        "pos": 0,
        "timestamp": "2022-09-15T00:00:08.718840Z" 
    },
    "pending_buckets": [],
    "recovering_buckets": []
}

Shouldn't the not-yet-in-sync buckets be listed here?

5. The sync status of the primary zone is different from the sync status of the secondary zone with different groups of behind shards. The same for the sync status of the same bucket. Is it legitimate? Please see the item 1 for sync status of the secondary zone, and the item 6 for the primary zone.

6. Why the primary zone has behind shards anyway since the replication is from primary to the secondary?|
Primary Zone:

          realm 8a98f19f-db58-4c09-bde6-ac89560d79b0 (prod-realm)
      zonegroup e041ea69-1e0b-4ad7-92f2-74b20aa3edf3 (prod-zonegroup)
           zone b68a526a-ffaa-4058-9903-6e7c6eac35bb (prod-zone-pw)
  metadata sync no sync (zone is master)
      data sync source: 1dadcf12-f44c-4940-8acc-9623a48b829e (prod-zone-tt)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is behind on 30 shards
                        behind shards: [6,7,26,28,29,37,47,52,55,56,61,67,68,69,74,79,82,91,95,99,101,104,106,111,112,121,122,123,126,127]

7. We have buckets in-sync that show correct sync status in secondary zone but still show behind shards in primary. Why is that?
Secondary Zone:

          realm 8a98f19f-db58-4c09-bde6-ac89560d79b0 (prod-realm)
      zonegroup e041ea69-1e0b-4ad7-92f2-74b20aa3edf3 (prod-zonegroup)
           zone 1dadcf12-f44c-4940-8acc-9623a48b829e (prod-zone-tt)
         bucket :mixed-5wrks-dev-4k-thisisbcstestload008167[b68a526a-ffaa-4058-9903-6e7c6eac35bb.89754.279])

    source zone b68a526a-ffaa-4058-9903-6e7c6eac35bb (prod-zone-pw)
  source bucket :mixed-5wrks-dev-4k-thisisbcstestload008167[b68a526a-ffaa-4058-9903-6e7c6eac35bb.89754.279])
                full sync: 0/101 shards
                incremental sync: 99/101 shards
                bucket is caught up with source

Primary zone:

          realm 8a98f19f-db58-4c09-bde6-ac89560d79b0 (prod-realm)
      zonegroup e041ea69-1e0b-4ad7-92f2-74b20aa3edf3 (prod-zonegroup)
           zone b68a526a-ffaa-4058-9903-6e7c6eac35bb (prod-zone-pw)
         bucket :mixed-5wrks-dev-4k-thisisbcstestload008167[b68a526a-ffaa-4058-9903-6e7c6eac35bb.89754.279])

    source zone 1dadcf12-f44c-4940-8acc-9623a48b829e (prod-zone-tt)
  source bucket :mixed-5wrks-dev-4k-thisisbcstestload008167[b68a526a-ffaa-4058-9903-6e7c6eac35bb.89754.279])
                full sync: 0/101 shards
                incremental sync: 97/101 shards
                bucket is behind on 11 shards
                behind shards: [9,11,14,16,22,31,44,45,67,85,90]

Our primary goals here are:
1. to find out why the replication stopped while the clusters are not in-sync;
2. to understand what we need to do resume the replication, and to make sure it runs to the end without too much lagging;
3. to understand if all the sync status info is correct. Seems to us there are many conflicts, and some doesn't reflect the real status of the clusters at all.

We attached the following info of our system:
- ceph.conf of rgws
- ceph config dump
- ceph versions output
- sync status of cluster, an in-sync bucket, a not-in-sync bucket, and some behind shards
- bucket list and bucket stats of a not-in-sync bucket and stat of a not-in-sync object

replication_issue_pri.zip - info for primary site (33 KB) Jane Zhu, 09/15/2022 02:10 PM

replication_issue_sec.zip - info for secondary site (40.3 KB) Jane Zhu, 09/15/2022 02:10 PM

cosbench_workload.tar.gz (1.51 KB) Jane Zhu, 09/28/2022 04:46 PM

workload_dev_pwdc_write_many_buckets_4h.xml.tar.gz (1.43 KB) Jane Zhu, 10/04/2022 10:41 PM

workload_dev_pwdc_write_many_buckets_10min.xml.tar.gz (217 Bytes) Jane Zhu, 10/04/2022 10:41 PM

Issue57562.tar.gz - log files for comment #10 (377 KB) Oguzhan Ozmen, 10/07/2022 08:26 PM

Comment38_TID72267_Logs.csv View (34.3 KB) Oguzhan Ozmen, 11/03/2022 07:11 PM


Related issues

Copied from rgw - Bug #57562: multisite replication issue on Quincy Resolved

History

#1 Updated by Adam Emerson about 1 year ago

  • Copied from Bug #57562: multisite replication issue on Quincy added

#2 Updated by Adam Emerson about 1 year ago

  • Status changed from New to In Progress

#3 Updated by Adam Emerson about 1 year ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF