Bug #37792
multisite: overwrites in versioning-suspended buckets fail to sync
0%
Description
steps to reproduce in a two-zone multisite configuration:
- create a bucket
- upload an object "obj"
- enable versioning on the bucket
- reupload the same object "obj"
- suspend versioning on the bucket
- reupload the same object "obj"
the third upload will repeatedly fail to sync with errors like "cls_rgw_bucket_link_olh() returned r=-125" in the rgw log, and errors like "NOTICE: op.olh_tag (zxopy27aag3jjr38ddtow7517gdpgz4c) != olh.tag (bne5h7ou7gingobf89ae5crr2p3p284y)" in the osd log. this happens because, in this specific case, fetch_remote_obj() takes the source zone's olh attributes and writes them directly to the head object, instead of first fetching from the current head object in rados
Related issues
History
#1 Updated by Casey Bodley about 5 years ago
- Status changed from In Progress to Fix Under Review
please backport both pull requests:
https://github.com/ceph/ceph/pull/25794 (fixes original bug)
https://github.com/ceph/ceph/pull/26157 (repairs damage caused by bug)
#2 Updated by Casey Bodley about 5 years ago
- Status changed from Fix Under Review to Pending Backport
#3 Updated by Casey Bodley about 5 years ago
- Copied to Backport #38080: mimic: multisite: overwrites in versioning-suspended buckets fail to sync added
#4 Updated by Casey Bodley about 5 years ago
- Copied to Backport #38081: luminous: multisite: overwrites in versioning-suspended buckets fail to sync added
#5 Updated by Casey Bodley almost 5 years ago
- Related to Bug #39118: rgw: remove_olh_pending_entries() does not limit the number of xattrs to remove added
#6 Updated by Nathan Cutler over 4 years ago
- Pull request ID set to 25974
#7 Updated by duc pham over 4 years ago
I have the same issue. My version cluster is 13.2.6. When I suspend versioning on a site then not reupload the same obj from another site.
#8 Updated by duc pham over 4 years ago
When reenable from a site, which could not reupload obj, I got the error:
20 RGWWQ: empty 20 cr:s=0x5629729e0000:op=0x562972a00c00:21RGWRadosSetOmapKeysCR: operate() 20 cr:s=0x5629729e0000:op=0x562972a00c00:21RGWRadosSetOmapKeysCR: operate() 20 cr:s=0x5629729e0000:op=0x562972a00c00:21RGWRadosSetOmapKeysCR: operate() 20 cr:s=0x5629729e0000:op=0x562972a00c00:21RGWRadosSetOmapKeysCR: operate() 20 cr:s=0x5629729e0000:op=0x562971ffc600:13RGWOmapAppend: operate() 15 stack 0x5629729e0000 end 20 run: stack=0x5629729e0000 is done 20 cr:s=0x5629717c5440:op=0x562971da9600:18RGWDataSyncShardCR: operate() 20 collect(): s=0x5629717c5440 stack=0x5629729e0a20 is still running 20 collect(): s=0x5629717c5440 stack=0x5629729e0000 is complete 20 run: stack=0x5629717c5440 is_blocked_by_stack()=0 is_sleeping=0 waiting_for_child()=1 20 cr:s=0x5629729e0a20:op=0x562972537200:22RGWSimpleRadosUnlockCR: operate() 20 cr:s=0x5629729e0a20:op=0x562972537200:22RGWSimpleRadosUnlockCR: operate() 20 cr:s=0x5629729e0a20:op=0x562972537200:22RGWSimpleRadosUnlockCR: operate() 20 cr:s=0x5629729e0a20:op=0x562972537200:22RGWSimpleRadosUnlockCR: operate() 20 cr:s=0x5629729e0a20:op=0x562971f12000:20RGWContinuousLeaseCR: operate() 15 stack 0x5629729e0a20 end 20 run: stack=0x5629729e0a20 is done 20 cr:s=0x5629717c5440:op=0x562971da9600:18RGWDataSyncShardCR: operate() 20 collect(): s=0x5629717c5440 stack=0x5629729e0a20 is complete 20 cr:s=0x5629717c5440:op=0x562971da9600:18RGWDataSyncShardCR: operate() 10 RGW-SYNC:data:sync:shard[80]: incremental sync failed (r=-2) 20 cr:s=0x5629717c5440:op=0x562971da9600:18RGWDataSyncShardCR: operate() returned r=-2 20 cr:s=0x5629717c5440:op=0x562971cae000:25RGWDataSyncShardControlCR: operate() 5 data sync: Sync:11b2b871:data:DataShard:datalog.sync-status.shard.11b2b871-89ec-4d8d-b72f-8057b2dbf1ec.80:finish 0 meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2 20 run: stack=0x5629717c5440 is io blocked 20 cr:s=0x5629727565a0:op=0x56297204fe00:20RGWSimpleRadosLockCR: operate() 20 cr:s=0x5629727565a0:op=0x56297204fe00:20RGWSimpleRadosLockCR: operate() 20 cr:s=0x5629727565a0:op=0x56297204fe00:20RGWSimpleRadosLockCR: operate() 20 cr:s=0x5629727565a0:op=0x56297204fe00:20RGWSimpleRadosLockCR: operate() 20 cr:s=0x5629727565a0:op=0x5629717bc700:20RGWContinuousLeaseCR: operate() 20 run: stack=0x5629727565a0 is io blocked 20 cr:s=0x562971cd6360:op=0x562971e30d00:18RGWDataSyncShardCR: operate() 10 RGW-SYNC:data:sync:shard[105]: took lease 5 data sync: Sync:11b2b871:data:DataShard:datalog.sync-status.shard.11b2b871-89ec-4d8d-b72f-8057b2dbf1ec.105:inc sync 20 cr:s=0x5629729e0a20:op=0x56297204fe00:13RGWOmapAppend: operate() 20 run: stack=0x5629729e0a20 is_blocked_by_stack()=0 is_sleeping=1 waiting_for_child()=0 20 cr:s=0x562971cd6360:op=0x562972537200:21RGWRadosGetOmapKeysCR: operate() 20 cr:s=0x562971cd6360:op=0x562972537200:21RGWRadosGetOmapKeysCR: operate() 20 run: stack=0x562971cd6360 is io blocked
#9 Updated by Nathan Cutler over 4 years ago
- Status changed from Pending Backport to Resolved
#10 Updated by J. Eric Ivancich over 4 years ago
- Pull request ID changed from 25974 to 25794
Updated pr id, which had transposed two digits.
#11 Updated by Casey Bodley over 2 years ago
- Duplicated by Bug #21210: rgw:multisite: put obj in a version-suspended bucket when sync to slave zone, the list_index cannot added corretlly added