Project

General

Profile

Actions

Bug #20934

closed

rgw: bucket index sporadically reshards to 65521 shards

Added by Aleksei Gutikov over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Urgent
Target version:
% Done:

0%

Source:
Tags:
Backport:
luminous, kraken, jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rgw
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

luminous v12.1.1

With rgw_dynamic_resharding=true about 10% of buckets` indexes were resharded each to 65521 shards.
Buckets contained 200-400 objects or less.

After disabling dynamic resharding after 2 days of test running anyway one bucket index was resharded, also to 65521 shards.


Files

stats.txt.gz (430 KB) stats.txt.gz bucket stats output with very-very long value of "ver" field Aleksei Gutikov, 08/07/2017 09:04 AM
5758983.56.omapvals.txt.gz (237 KB) 5758983.56.omapvals.txt.gz omapvals of index objects Aleksei Gutikov, 08/07/2017 09:04 AM
rgw.log.1.gz (91 KB) rgw.log.1.gz Aleksei Gutikov, 08/21/2017 08:51 AM
ceph-client.rgw.default.P20B-VR3-R1-CEPH-DS1.log (171 KB) ceph-client.rgw.default.P20B-VR3-R1-CEPH-DS1.log rgw log for oritwas/wip-rgw-resharding-debug Aleksei Gutikov, 08/21/2017 01:56 PM

Related issues 3 (0 open3 closed)

Copied to rgw - Backport #21134: kraken: rgw: bucket index sporadically reshards to 65521 shardsRejectedActions
Copied to rgw - Backport #21135: luminous: rgw: bucket index sporadically reshards to 65521 shardsResolvedNathan CutlerActions
Copied to rgw - Backport #21136: jewel: rgw: bucket index sporadically reshards to 65521 shardsResolvedPavan RallabhandiActions
Actions #1

Updated by Orit Wasserman over 6 years ago

  • Assignee set to Orit Wasserman
Actions #2

Updated by Orit Wasserman over 6 years ago

  • Status changed from New to Fix Under Review
  • Backport set to luminous
Actions #3

Updated by Sage Weil over 6 years ago

  • Priority changed from Normal to Immediate
Actions #4

Updated by Orit Wasserman over 6 years ago

  • Status changed from Fix Under Review to In Progress
Actions #5

Updated by Orit Wasserman over 6 years ago

  • Priority changed from Immediate to High
Actions #6

Updated by Orit Wasserman over 6 years ago

Can you provide rgw logs?

Actions #7

Updated by Aleksei Gutikov over 6 years ago

Orit Wasserman wrote:

Can you provide rgw logs?

Unfortunately no logs left for last test run.
We fixed indexes with 'radosgw-admin bucket reshard'.
I think we will take 12.1.2, enable dynamic resharding, and enable debug and repeat test run.
Regarding debug, everywhere I see debug logs for resharding they have level 20, for example RGWRados::check_bucket_shards.
We can't set any debug to 20 because logging on our test cluster will not handle it.
Maybe you can provide a patch with loglevel 0 for logs you need, or point to branch with such changes?

Actions #8

Updated by Orit Wasserman over 6 years ago

I will provide you a version next week (going camping).

It may be related to http://tracker.ceph.com/issues/20661(it was fixed in 12.1.2).
Can you try reproducing on 12.1.2 even without logs? so we can rule it out (or not).

Actions #9

Updated by Aleksei Gutikov over 6 years ago

Bug reproduced with v12.1.4

Here is in attachment part of logs of radosgw.
I have increased loglevel of messages related to resharding and here is what I see:

Aug 18 14:54:27 P20B-SR4-R1-CEPH-DS06 radosgw[321270]: 2017-08-18 14:54:27.408767 7f98208ab700  0 check_bucket_shards: resharding needed: stats.num_objects=18446744073709551609 shard max_objects=100000
Aug 18 14:54:27 P20B-SR4-R1-CEPH-DS06 radosgw[321270]: 2017-08-18 14:54:27.408793 7f98208ab700  0 check_bucket_shards bucket fast-stream-57 need resharding  old num shards 0 new num shards 2890341191

Actions #10

Updated by Orit Wasserman over 6 years ago

I created https://github.com/oritwas/ceph/tree/wip-rgw-resharding-debug
with debug resharding debug you can use.

Actions #11

Updated by Aleksei Gutikov over 6 years ago

Log for your branch:

2017-08-21 16:55:40.236138 7f4d3bd44700  0 check_bucket_shards bucket debian.thread-14.s3-test-bucket need resharding 0 old num shards 0 new num shards 45
2017-08-21 16:55:40.283410 7f4d3bd44700  0 check_bucket_shards:  stats.num_objects=4 num_objs 1 num_shards 1 shard max_objects=100000
2017-08-21 16:55:40.283604 7f4d3bd44700  0 check_bucket_shards bucket debian.thread-14.s3-test-bucket need resharding 0 old num shards 0 new num shards 45
2017-08-21 16:55:43.278628 7f4d1b503700  0 check_bucket_shards:  stats.num_objects=18446744073709551613 num_objs 1 num_shards 1 shard max_objects=100000
2017-08-21 16:55:43.278719 7f4d1b503700  0 check_bucket_shards: resharding needed: stats.num_objects=18446744073709551613 shard max_objects=100000
2017-08-21 16:55:43.278736 7f4d1b503700  0 check_bucket_shards bucket debian.thread-55.s3-test-bucket need resharding 1 old num shards 0 new num shards 2890341191
2017-08-21 16:55:43.278753 7f4d1b503700  0 add_bucket_to_reshard bucket =debian.thread-55.s3-test-bucket, orig_num=1, new_num_shards=65521
2017-08-21 16:55:46.182566 7f4d41756700  0 could not get bucket info for bucket=debian.thread-13.s3-test-bucket[71cdbda3-1ff8-470f-a65c-0712ea420854.4157.1]) r=-2
2017-08-21 16:55:46.182574 7f4d41756700  0 WARNING: sync_bucket() returned r=-2
2017-08-21 16:55:46.182849 7f4d41756700  0 could not get bucket info for bucket=debian.thread-14.s3-test-bucket[71cdbda3-1ff8-470f-a65c-0712ea420854.4156.4]) r=-2
2017-08-21 16:55:46.182852 7f4d41756700  0 WARNING: sync_bucket() returned r=-2
2017-08-21 16:55:46.183460 7f4d41756700  0 could not get bucket info for bucket=debian.thread-15.s3-test-bucket[71cdbda3-1ff8-470f-a65c-0712ea420854.4155.5]) r=-2

Actions #13

Updated by Orit Wasserman over 6 years ago

  • Backport changed from luminous to luminous, kraken, jewel
Actions #14

Updated by Orit Wasserman over 6 years ago

  • Priority changed from High to Urgent
Actions #15

Updated by Yuri Weinstein over 6 years ago

Aleksei Gutikov wrote:

https://github.com/ceph/ceph/pull/17116

merged

Actions #16

Updated by Yehuda Sadeh over 6 years ago

  • Status changed from In Progress to Pending Backport
Actions #17

Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #21134: kraken: rgw: bucket index sporadically reshards to 65521 shards added
Actions #18

Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #21135: luminous: rgw: bucket index sporadically reshards to 65521 shards added
Actions #19

Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #21136: jewel: rgw: bucket index sporadically reshards to 65521 shards added
Actions #20

Updated by Nathan Cutler over 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF