Project

General

Profile

Actions

Bug #24551

closed

RGW Dynamic bucket index resharding keeps resharding all buckets

Added by Sander van Schie almost 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Urgent
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We're into some problems with dynamic bucket index resharding. After an upgrade from Ceph 12.2.2 to 12.2.5, which fixed an issue with the resharding when using tenants (which we do), the cluster was busy resharding for 2 days straight, resharding the same buckets over and over again.

After disabling it and re-enabling it a while later, it resharded all buckets again and then kept quiet for a bit. Later on it started resharding buckets over and over again, even buckets which didn't have any data added in the meantime. In the reshard list it always says 'old_num_shards: 1' for every bucket, even though I can confirm with 'bucket stats' there's already the desired amount of shards present. It looks like the background process which scans buckets doesn't properly recognize the amount of shards a bucket currently has. When I manually add a reshard job, it does properly recognize the current amount of shards.

While Ceph was resharding buckets over and over again, the maximum available storage as reported by 'ceph df' also decreased by about 20%, while usage stayed the same, we have yet to find out where the missing storage went. The decreasing stopped once we disabled resharding.

On a sidenote, we had two buckets in the reshard list which were removed a long while ago. We were unable to cancel the reshard job for those buckets. After recreating the users and buckets we were able to remove them from the list though, so they are no longer present. Probably not relevant, but you never know.


Files

Ceph High IO.png (190 KB) Ceph High IO.png High IO Aleksandr Rudenko, 07/18/2018 08:31 AM

Related issues 1 (0 open1 closed)

Related to rgw - Bug #27219: lock in resharding may expires before the dynamic resharding completesResolvedJ. Eric Ivancich08/24/2018

Actions
Actions

Also available in: Atom PDF