Project

General

Profile

Bug #52917

rgw-multisite: bucket sync checkpoint for a bucket lists out very high value/incorrect for local gen.

Added by Vidushi Mishra over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
% Done:

0%

Source:
Q/A
Tags:
multisite-reshard
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

radosgw-admin bucket sync checkpoint lists out a very high value for local gen.

sniipet of the issue:
-----------------------
[ceph: root@magna017 /]# radosgw-admin bucket sync checkpoint --bucket tx/ss-bkt-v3
2021-10-13T12:15:34.408+0000 7fe2c6bdc340 1 Realm: data (d62bd711-d486-47be-9c3e-193e49334862)
2021-10-13T12:15:34.408+0000 7fe2c6bdc340 1 ZoneGroup: us (8f3b29b1-ffc6-4c90-9d0c-9bd258028cd8)
2021-10-13T12:15:34.408+0000 7fe2c6bdc340 1 Zone: west (3a571642-9f5e-46d8-8186-9eca8cc79ac6)
2021-10-13T12:15:34.408+0000 7fe2c6bdc340 1 using period configuration: dd132fae-4457-4f49-88b9-55ca2f8adff9:2

2021-10-13T12:15:34.664+0000 7fe2c6bdc340 1 bucket sync caught up with source:
local gen: 93994490322120
remote gen: 1
2021-10-13T12:15:34.664+0000 7fe2c6bdc340 0 bucket checkpoint complete
[ceph: root@magna017 /]#

The issue is observed for the buckets that were created on a single site and filled with some objects and resharded dynamically.

steps:

1. create a single site cluster with realm's name 'data' and zonegroup's name 'us' and zone's name 'east'.
2. Create 2 bucket ss-bkt-v1 and ss-bkt-v3 under the tenant 'tx'
3. Upload around 2K objects on each bucket with the threshold objects_per_shard value as 100 objects.
4. Let the buckets be resharded dynamically on the single site.
5. Establish multisite by doing a realm pull and period pull and create a secondary zone as 'west'.
6. Wait for the sync to complete.
7. Make sure the threshold value rgw_max_obj_per_shard is also tuned to 100 objects on the slave site.
8. Let the buckets be resharded dynamically on the slave site as well.
9. On performing a 'radosgw-admin bucket sync checkpoint --bucket tx/ss-bkt-v3, we observed a very high value for local gen

ceph version 17.0.0-8051-g15b54dc9 (15b54dc9eaa835e95809e32e8ddf109d416320c9) quincy (dev)

History

#1 Updated by Vidushi Mishra over 2 years ago

On the master site :
[ceph: root@clara001 /]# radosgw-admin bucket sync checkpoint --bucket tx/ss-bkt-v3
2021-10-13T15:47:35.490+0000 7f40e5622340 1 Realm: data (d62bd711-d486-47be-9c3e-193e49334862)
2021-10-13T15:47:35.490+0000 7f40e5622340 1 ZoneGroup: us (8f3b29b1-ffc6-4c90-9d0c-9bd258028cd8)
2021-10-13T15:47:35.490+0000 7f40e5622340 1 Zone: east (5d32949e-6245-422c-b315-9048855d3a9a)
2021-10-13T15:47:35.490+0000 7f40e5622340 1 using period configuration: dd132fae-4457-4f49-88b9-55ca2f8adff9:2
2021-10-13T15:47:36.536+0000 7f40e5622340 1 bucket sync caught up with empty source
2021-10-13T15:47:36.536+0000 7f40e5622340 0 bucket checkpoint complete
[ceph: root@clara001 /]#

#2 Updated by Casey Bodley over 2 years ago

  • Assignee set to Casey Bodley

#3 Updated by Casey Bodley over 2 years ago

Vidushi Mishra wrote:

radosgw-admin bucket sync checkpoint lists out a very high value for local gen.

sniipet of the issue:
-----------------------
[ceph: root@magna017 /]# radosgw-admin bucket sync checkpoint --bucket tx/ss-bkt-v3
2021-10-13T12:15:34.408+0000 7fe2c6bdc340 1 Realm: data (d62bd711-d486-47be-9c3e-193e49334862)
2021-10-13T12:15:34.408+0000 7fe2c6bdc340 1 ZoneGroup: us (8f3b29b1-ffc6-4c90-9d0c-9bd258028cd8)
2021-10-13T12:15:34.408+0000 7fe2c6bdc340 1 Zone: west (3a571642-9f5e-46d8-8186-9eca8cc79ac6)
2021-10-13T12:15:34.408+0000 7fe2c6bdc340 1 using period configuration: dd132fae-4457-4f49-88b9-55ca2f8adff9:2

2021-10-13T12:15:34.664+0000 7fe2c6bdc340 1 bucket sync caught up with source:
local gen: 93994490322120
remote gen: 1

i was able to find one place where the generation number was uninitialized (fixed in https://github.com/ceph/ceph/pull/39002/commits/9d2d0810539e181cb364faf13e57e9225f8c5936), but the only place that happens is if an upgraded secondary zone was talking to an un-upgraded rgw in the primary zone

i wasn't able to reproduce this issue when both zones were running this wip-rgw-multisite-reshard branch

#4 Updated by Casey Bodley over 2 years ago

  • Status changed from New to Need More Info

#5 Updated by Casey Bodley over 2 years ago

  • Status changed from Need More Info to Closed

Also available in: Atom PDF