Project

General

Profile

Bug #20934

rgw: bucket index sporadically reshards to 65521 shards

Added by Aleksei Gutikov 3 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Urgent
Target version:
Start date:
08/07/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
luminous, kraken, jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rgw
Release:
master
Needs Doc:
No

Description

luminous v12.1.1

With rgw_dynamic_resharding=true about 10% of buckets` indexes were resharded each to 65521 shards.
Buckets contained 200-400 objects or less.

After disabling dynamic resharding after 2 days of test running anyway one bucket index was resharded, also to 65521 shards.

stats.txt.gz - bucket stats output with very-very long value of "ver" field (430 KB) Aleksei Gutikov, 08/07/2017 09:04 AM

5758983.56.omapvals.txt.gz - omapvals of index objects (237 KB) Aleksei Gutikov, 08/07/2017 09:04 AM

rgw.log.1.gz (91 KB) Aleksei Gutikov, 08/21/2017 08:51 AM

ceph-client.rgw.default.P20B-VR3-R1-CEPH-DS1.log View - rgw log for oritwas/wip-rgw-resharding-debug (171 KB) Aleksei Gutikov, 08/21/2017 01:56 PM


Related issues

Copied to rgw - Backport #21134: kraken: rgw: bucket index sporadically reshards to 65521 shards Rejected
Copied to rgw - Backport #21135: luminous: rgw: bucket index sporadically reshards to 65521 shards Resolved
Copied to rgw - Backport #21136: jewel: rgw: bucket index sporadically reshards to 65521 shards Resolved

History

#1 Updated by Orit Wasserman 3 months ago

  • Assignee set to Orit Wasserman

#2 Updated by Orit Wasserman 2 months ago

  • Status changed from New to Need Review
  • Backport set to luminous

#3 Updated by Sage Weil 2 months ago

  • Priority changed from Normal to Immediate

#4 Updated by Orit Wasserman 2 months ago

  • Status changed from Need Review to In Progress

#5 Updated by Orit Wasserman 2 months ago

  • Priority changed from Immediate to High

#6 Updated by Orit Wasserman 2 months ago

Can you provide rgw logs?

#7 Updated by Aleksei Gutikov 2 months ago

Orit Wasserman wrote:

Can you provide rgw logs?

Unfortunately no logs left for last test run.
We fixed indexes with 'radosgw-admin bucket reshard'.
I think we will take 12.1.2, enable dynamic resharding, and enable debug and repeat test run.
Regarding debug, everywhere I see debug logs for resharding they have level 20, for example RGWRados::check_bucket_shards.
We can't set any debug to 20 because logging on our test cluster will not handle it.
Maybe you can provide a patch with loglevel 0 for logs you need, or point to branch with such changes?

#8 Updated by Orit Wasserman 2 months ago

I will provide you a version next week (going camping).

It may be related to http://tracker.ceph.com/issues/20661(it was fixed in 12.1.2).
Can you try reproducing on 12.1.2 even without logs? so we can rule it out (or not).

#9 Updated by Aleksei Gutikov 2 months ago

Bug reproduced with v12.1.4

Here is in attachment part of logs of radosgw.
I have increased loglevel of messages related to resharding and here is what I see:

Aug 18 14:54:27 P20B-SR4-R1-CEPH-DS06 radosgw[321270]: 2017-08-18 14:54:27.408767 7f98208ab700  0 check_bucket_shards: resharding needed: stats.num_objects=18446744073709551609 shard max_objects=100000
Aug 18 14:54:27 P20B-SR4-R1-CEPH-DS06 radosgw[321270]: 2017-08-18 14:54:27.408793 7f98208ab700  0 check_bucket_shards bucket fast-stream-57 need resharding  old num shards 0 new num shards 2890341191

#10 Updated by Orit Wasserman 2 months ago

I created https://github.com/oritwas/ceph/tree/wip-rgw-resharding-debug
with debug resharding debug you can use.

#11 Updated by Aleksei Gutikov 2 months ago

Log for your branch:

2017-08-21 16:55:40.236138 7f4d3bd44700  0 check_bucket_shards bucket debian.thread-14.s3-test-bucket need resharding 0 old num shards 0 new num shards 45
2017-08-21 16:55:40.283410 7f4d3bd44700  0 check_bucket_shards:  stats.num_objects=4 num_objs 1 num_shards 1 shard max_objects=100000
2017-08-21 16:55:40.283604 7f4d3bd44700  0 check_bucket_shards bucket debian.thread-14.s3-test-bucket need resharding 0 old num shards 0 new num shards 45
2017-08-21 16:55:43.278628 7f4d1b503700  0 check_bucket_shards:  stats.num_objects=18446744073709551613 num_objs 1 num_shards 1 shard max_objects=100000
2017-08-21 16:55:43.278719 7f4d1b503700  0 check_bucket_shards: resharding needed: stats.num_objects=18446744073709551613 shard max_objects=100000
2017-08-21 16:55:43.278736 7f4d1b503700  0 check_bucket_shards bucket debian.thread-55.s3-test-bucket need resharding 1 old num shards 0 new num shards 2890341191
2017-08-21 16:55:43.278753 7f4d1b503700  0 add_bucket_to_reshard bucket =debian.thread-55.s3-test-bucket, orig_num=1, new_num_shards=65521
2017-08-21 16:55:46.182566 7f4d41756700  0 could not get bucket info for bucket=debian.thread-13.s3-test-bucket[71cdbda3-1ff8-470f-a65c-0712ea420854.4157.1]) r=-2
2017-08-21 16:55:46.182574 7f4d41756700  0 WARNING: sync_bucket() returned r=-2
2017-08-21 16:55:46.182849 7f4d41756700  0 could not get bucket info for bucket=debian.thread-14.s3-test-bucket[71cdbda3-1ff8-470f-a65c-0712ea420854.4156.4]) r=-2
2017-08-21 16:55:46.182852 7f4d41756700  0 WARNING: sync_bucket() returned r=-2
2017-08-21 16:55:46.183460 7f4d41756700  0 could not get bucket info for bucket=debian.thread-15.s3-test-bucket[71cdbda3-1ff8-470f-a65c-0712ea420854.4155.5]) r=-2

#13 Updated by Orit Wasserman 2 months ago

  • Backport changed from luminous to luminous, kraken, jewel

#14 Updated by Orit Wasserman 2 months ago

  • Priority changed from High to Urgent

#15 Updated by Yuri Weinstein 2 months ago

Aleksei Gutikov wrote:

https://github.com/ceph/ceph/pull/17116

merged

#16 Updated by Yehuda Sadeh 2 months ago

  • Status changed from In Progress to Pending Backport

#17 Updated by Nathan Cutler about 2 months ago

  • Copied to Backport #21134: kraken: rgw: bucket index sporadically reshards to 65521 shards added

#18 Updated by Nathan Cutler about 2 months ago

  • Copied to Backport #21135: luminous: rgw: bucket index sporadically reshards to 65521 shards added

#19 Updated by Nathan Cutler about 2 months ago

  • Copied to Backport #21136: jewel: rgw: bucket index sporadically reshards to 65521 shards added

#20 Updated by Nathan Cutler about 1 month ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF