Bug #37792
closed
multisite: overwrites in versioning-suspended buckets fail to sync
Added by Casey Bodley over 5 years ago.
Updated over 4 years ago.
Tags:
multisite versioning
Description
steps to reproduce in a two-zone multisite configuration:
- create a bucket
- upload an object "obj"
- enable versioning on the bucket
- reupload the same object "obj"
- suspend versioning on the bucket
- reupload the same object "obj"
the third upload will repeatedly fail to sync with errors like "cls_rgw_bucket_link_olh() returned r=-125" in the rgw log, and errors like "NOTICE: op.olh_tag (zxopy27aag3jjr38ddtow7517gdpgz4c) != olh.tag (bne5h7ou7gingobf89ae5crr2p3p284y)" in the osd log. this happens because, in this specific case, fetch_remote_obj() takes the source zone's olh attributes and writes them directly to the head object, instead of first fetching from the current head object in rados
- Status changed from In Progress to Fix Under Review
- Status changed from Fix Under Review to Pending Backport
- Copied to Backport #38080: mimic: multisite: overwrites in versioning-suspended buckets fail to sync added
- Copied to Backport #38081: luminous: multisite: overwrites in versioning-suspended buckets fail to sync added
- Related to Bug #39118: rgw: remove_olh_pending_entries() does not limit the number of xattrs to remove added
- Pull request ID set to 25974
I have the same issue. My version cluster is 13.2.6. When I suspend versioning on a site then not reupload the same obj from another site.
When reenable from a site, which could not reupload obj, I got the error:
20 RGWWQ: empty
20 cr:s=0x5629729e0000:op=0x562972a00c00:21RGWRadosSetOmapKeysCR: operate()
20 cr:s=0x5629729e0000:op=0x562972a00c00:21RGWRadosSetOmapKeysCR: operate()
20 cr:s=0x5629729e0000:op=0x562972a00c00:21RGWRadosSetOmapKeysCR: operate()
20 cr:s=0x5629729e0000:op=0x562972a00c00:21RGWRadosSetOmapKeysCR: operate()
20 cr:s=0x5629729e0000:op=0x562971ffc600:13RGWOmapAppend: operate()
15 stack 0x5629729e0000 end
20 run: stack=0x5629729e0000 is done
20 cr:s=0x5629717c5440:op=0x562971da9600:18RGWDataSyncShardCR: operate()
20 collect(): s=0x5629717c5440 stack=0x5629729e0a20 is still running
20 collect(): s=0x5629717c5440 stack=0x5629729e0000 is complete
20 run: stack=0x5629717c5440 is_blocked_by_stack()=0 is_sleeping=0 waiting_for_child()=1
20 cr:s=0x5629729e0a20:op=0x562972537200:22RGWSimpleRadosUnlockCR: operate()
20 cr:s=0x5629729e0a20:op=0x562972537200:22RGWSimpleRadosUnlockCR: operate()
20 cr:s=0x5629729e0a20:op=0x562972537200:22RGWSimpleRadosUnlockCR: operate()
20 cr:s=0x5629729e0a20:op=0x562972537200:22RGWSimpleRadosUnlockCR: operate()
20 cr:s=0x5629729e0a20:op=0x562971f12000:20RGWContinuousLeaseCR: operate()
15 stack 0x5629729e0a20 end
20 run: stack=0x5629729e0a20 is done
20 cr:s=0x5629717c5440:op=0x562971da9600:18RGWDataSyncShardCR: operate()
20 collect(): s=0x5629717c5440 stack=0x5629729e0a20 is complete
20 cr:s=0x5629717c5440:op=0x562971da9600:18RGWDataSyncShardCR: operate()
10 RGW-SYNC:data:sync:shard[80]: incremental sync failed (r=-2)
20 cr:s=0x5629717c5440:op=0x562971da9600:18RGWDataSyncShardCR: operate() returned r=-2
20 cr:s=0x5629717c5440:op=0x562971cae000:25RGWDataSyncShardControlCR: operate()
5 data sync: Sync:11b2b871:data:DataShard:datalog.sync-status.shard.11b2b871-89ec-4d8d-b72f-8057b2dbf1ec.80:finish
0 meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2
20 run: stack=0x5629717c5440 is io blocked
20 cr:s=0x5629727565a0:op=0x56297204fe00:20RGWSimpleRadosLockCR: operate()
20 cr:s=0x5629727565a0:op=0x56297204fe00:20RGWSimpleRadosLockCR: operate()
20 cr:s=0x5629727565a0:op=0x56297204fe00:20RGWSimpleRadosLockCR: operate()
20 cr:s=0x5629727565a0:op=0x56297204fe00:20RGWSimpleRadosLockCR: operate()
20 cr:s=0x5629727565a0:op=0x5629717bc700:20RGWContinuousLeaseCR: operate()
20 run: stack=0x5629727565a0 is io blocked
20 cr:s=0x562971cd6360:op=0x562971e30d00:18RGWDataSyncShardCR: operate()
10 RGW-SYNC:data:sync:shard[105]: took lease
5 data sync: Sync:11b2b871:data:DataShard:datalog.sync-status.shard.11b2b871-89ec-4d8d-b72f-8057b2dbf1ec.105:inc sync
20 cr:s=0x5629729e0a20:op=0x56297204fe00:13RGWOmapAppend: operate()
20 run: stack=0x5629729e0a20 is_blocked_by_stack()=0 is_sleeping=1 waiting_for_child()=0
20 cr:s=0x562971cd6360:op=0x562972537200:21RGWRadosGetOmapKeysCR: operate()
20 cr:s=0x562971cd6360:op=0x562972537200:21RGWRadosGetOmapKeysCR: operate()
20 run: stack=0x562971cd6360 is io blocked
- Status changed from Pending Backport to Resolved
- Pull request ID changed from 25974 to 25794
Updated pr id, which had transposed two digits.
- Has duplicate Bug #21210: rgw:multisite: put obj in a version-suspended bucket when sync to slave zone, the list_index cannot added corretlly added
Also available in: Atom
PDF