Bug #49075: Bucket synchronization works only after disable/enable bucket sync on the bucket. Once finished, it maxes out SSDs nvmes and sync degraded. - rgw - Ceph

Actions

Copy link

Bug #49075

open

Bucket synchronization works only after disable/enable bucket sync on the bucket. Once finished, it maxes out SSDs nvmes and sync degraded.

Added by Ist Gab about 3 years ago. Updated over 2 years ago.

Status:

New

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Tags:

Multisite, sync, osd

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v15.2.7

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Hello,

We have a 3 geo locational freshly installed multisite setup with an upgraded octopus from 15.2.5 to 15.2.7.
We have 6 osd nodes, 3 mon/mgr/rgw in each dc, full SSD, 3 ssd is using 1 nvme for journaling. Each zone backed with 3 RGW, one on each mon/mgr node.
The goal is to replicate 2 (currently) big buckets in the zonegroup but it only works if I disable and reenable the bucket sync.
Big buckets means, one bucket is presharded for 9000 shards (9 billions objects), the 2nd bucket that I'm detailing in this ticket 24000 (24 billions objects) shards.

Once picked up the objects (not all, only the ones that was on the source site at that given time when it was enabled) it will slows down a lot from 100.000 objects / 15 minutes in and 10GB/15 minutes to 50 objects/4 hours.
Once it synchronized after enabled/disabled, it maxing out the osd nodes with NVME/SSD drives with some operation which I don't know what is it. Let me show you the symptoms below.

Let me summarize as much as I can.

We have 1 realm, in this realm we have 1 zonegroup (please help me to check if the sync policies are ok) and in this zonegroup we have 1 cluster in US, 1 in Hong Kong (master) and 1 in Singapore.

Here is the realm, zonegroup and zones definition: https://pastebin.com/raw/pu66tqcf

Let me show you one enable/disable operation when I've disabled on the HKG master site the pix-bucket and enabled it.

In this screenshot: https://i.ibb.co/WNC0gNQ/6nodes6day.png
the highlighted area is when the data sync is running after disable enable. You can see almost no operation. You can see also when sync is not running, the green and yellow is the NVME drive rocksdb+wal drives. The screenshot represents the 6 Singapore nodes SSD/NVME disk utilizations. The first node you can see in the last hours no green and yellow, it's because I've reinstalled in that nodes all the osds to not use NVME.

In the following 1st screenshot you can see the HKG object usage where the user is uploading the objects. 2nd screenshot the SGP one where you can see the highlighted area is the disable/enable operation.
HKG where user upload: https://i.ibb.co/vj2VFYP/pixhkg6d.png
SGP where sync happened: https://i.ibb.co/w41rmQT/pixsgp6d.png

Let me show you some troubleshooting logs regarding bucket sync status, cluster sync status, reshard list (which might be because of previous testing), sync error list

https://pastebin.com/raw/PfURKmX6

The issue might be very similar to this issue:
https://tracker.ceph.com/issues/21591

Where I should move forward or how can I help you to provide more logs to help me please?

Actions

Copy link

Updated by Ist Gab about 3 years ago

Tried to init from SGP:

SGP init script:
radosgw-admin data sync init --source-zone=hkg --bucket=pix-bucket
2021-02-01T17:33:25.586+0700 7fb21d385200 0 RGW-SYNC:data:init_data_sync_status: ERROR: failed to read remote data log shards

Actions

Copy link

Updated by Ist Gab about 3 years ago

Ist Gab wrote:

Hello,

We have a 3 geo locational freshly installed multisite setup with an upgraded octopus from 15.2.5 to 15.2.7.
We have 6 osd nodes, 3 mon/mgr/rgw in each dc, full SSD, 3 ssd is using 1 nvme for journaling. Each zone backed with 3 RGW, one on each mon/mgr node.
The goal is to replicate 2 (currently) big buckets in the zonegroup but it only works if I disable and reenable the bucket sync.
Big buckets means, one bucket is presharded for 9000 shards (9 billions objects), the 2nd bucket that I'm detailing in this ticket 24000 (24 billions objects) shards.

Once picked up the objects (not all, only the ones that was on the source site at that given time when it was enabled) it will slows down a lot from 100.000 objects / 15 minutes in and 10GB/15 minutes to 50 objects/4 hours.
Once it synchronized after enabled/disabled, it maxing out the osd nodes with NVME/SSD drives with some operation which I don't know what is it. Let me show you the symptoms below.

Let me summarize as much as I can.

We have 1 realm, in this realm we have 1 zonegroup (please help me to check if the sync policies are ok) and in this zonegroup we have 1 cluster in US, 1 in Hong Kong (master) and 1 in Singapore.

Here is the realm, zonegroup and zones definition: https://pastebin.com/raw/pu66tqcf

Let me show you one enable/disable operation when I've disabled on the HKG master site the pix-bucket and enabled it.

In this screenshot: https://i.ibb.co/WNC0gNQ/6nodes6day.png
the highlighted area is when the data sync is running after disable enable. You can see almost no operation. You can see also when sync is not running, the green and yellow is the NVME drive rocksdb+wal drives. The screenshot represents the 6 Singapore nodes SSD/NVME disk utilizations. The first node you can see in the last hours no green and yellow, it's because I've reinstalled in that nodes all the osds to not use NVME.

In the following 1st screenshot you can see the HKG object usage where the user is uploading the objects. 2nd screenshot the SGP one where you can see the highlighted area is the disable/enable operation.
HKG where user upload: https://i.ibb.co/vj2VFYP/pixhkg6d.png
SGP where sync happened: https://i.ibb.co/w41rmQT/pixsgp6d.png

Let me show you some troubleshooting logs regarding bucket sync status, cluster sync status, reshard list (which might be because of previous testing), sync error list

https://pastebin.com/raw/TdwiZFC1

The issue might be very similar to this issue:
https://tracker.ceph.com/issues/21591

Where I should move forward or how can I help you to provide more logs to help me please?

Actions

Copy link

Updated by Ist Gab about 3 years ago

Modified the pastebin link from this https://pastebin.com/raw/PfURKmX6 to this: https://pastebin.com/raw/TdwiZFC1

Actions

Copy link

Updated by Christian Rohmann over 2 years ago

We are just syncing between two sites but ran into exactly the same issue:

Some buckets not fully synced (lots of missing objects), but when issuing a radosgw-admin bucket sync --disable and then an --enable on that very bucket the sync (full sync) kicked in immediately maxing out the available bandwidth.

Leading up to this phenomenon we started fresh on the secondary zone. About 20% into the sync (of the whole data of the master zone) I restarted the RADOSGW instance to apply a minor reconfiguration.
After that there was no real sync activity and I issued a radosgw-admin data sync init which lead to a few hours of preparing full sync and then a much slower sync activity.

Maybe the issue is caused by the interruption of an active full bucket sync?

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Bug #49075

Bucket synchronization works only after disable/enable bucket sync on the bucket. Once finished, it maxes out SSDs nvmes and sync degraded.

Updated by Ist Gab about 3 years ago

Updated by Ist Gab about 3 years ago

Updated by Ist Gab about 3 years ago

Updated by Christian Rohmann over 2 years ago