Bug #56997
closedbucket lifecycle policy updates breaking metadata sync
0%
Description
2022-08-01T19:11:10.659+0300 7fdfd6ffd640 20 rgw async rados processor: remove lc config for hnjmrt-1 2022-08-01T19:11:10.660+0300 7fdfd6ffd640 10 lifecycle: RGWRados::convert_old_bucket_info(): bucket=:hnjmrt-1[2bd5e553-5d25-4207-8b94-0b96c5d80301.4146.1]) 2022-08-01T19:11:10.660+0300 7fdfd6ffd640 10 lifecycle: cache get: name=a2.rgw.meta+root+hnjmrt-1 : miss 2022-08-01T19:11:10.660+0300 7fdfd6ffd640 20 lifecycle: rados->read ofs=0 len=0 2022-08-01T19:11:10.660+0300 7fdf577fe640 20 rgw rados thread: cr:s=0x7fdf4406dc20:op=0x7fdf44173b40:20RGWMetaRemoveEntryCR: operate() 2022-08-01T19:11:10.660+0300 7fdfd6ffd640 1 -- 10.46.10.90:0/2917887852 --> [v2:10.46.10.90:6808/3488456,v1:10.46.10.90:6809/3488456] -- osd_op(unknown.0.0:3109 4.0 4:5ad72b23:root::hnjmrt-1:head [call version.read in=11b,read 0~0,getxattrs] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e16) v8 -- 0x7fdfbc018f60 con 0x55c905bfacc0 2022-08-01T19:11:10.660+0300 7fdf577fe640 20 rgw rados thread: cr:s=0x7fdf4406dc20:op=0x7fdf44173b40:20RGWMetaRemoveEntryCR: operate() 2022-08-01T19:11:10.660+0300 7fdf577fe640 20 rgw rados thread: cr:s=0x7fdf4406dc20:op=0x7fdf44173b40:20RGWMetaRemoveEntryCR: operate() 2022-08-01T19:11:10.660+0300 7fdf577fe640 20 rgw rados thread: cr:s=0x7fdf4406dc20:op=0x7fdf44173b40:20RGWMetaRemoveEntryCR: operate() 2022-08-01T19:11:10.660+0300 7fdf577fe640 20 rgw rados thread: cr:s=0x7fdf4406dc20:op=0x7fdf44147a80:24RGWMetaSyncSingleEntryCR: operate() 2022-08-01T19:11:10.660+0300 7fdf577fe640 20 rgw rados thread: cr:s=0x7fdf4406dc20:op=0x7fdf44147a80:24RGWMetaSyncSingleEntryCR: operate() 2022-08-01T19:11:10.660+0300 7fdf577fe640 10 RGW-SYNC:meta:shard[10]:entry[bucket:hnjmrt-1]: success 2022-08-01T19:11:10.660+0300 7fdf577fe640 15 stack 0x7fdf4406dc20 end 2022-08-01T19:11:10.660+0300 7fdf577fe640 20 run: stack=0x7fdf4406dc20 is done 2022-08-01T19:11:10.660+0300 7fe049ffb640 1 -- 10.46.10.90:0/2917887852 <== osd.0 v2:10.46.10.90:6808/3488456 4310 ==== osd_op_reply(3109 hnjmrt-1 [call,read 0~0,getxattrs] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 236+0+0 (crc 0 0 0) 0x7fe03406b020 con 0x55c905bfacc0 2022-08-01T19:11:10.661+0300 7fdfd6ffd640 20 lifecycle: rados_obj.operate() r=-2 bl.length=0 2022-08-01T19:11:10.661+0300 7fdfd6ffd640 10 lifecycle: cache put: name=a2.rgw.meta+root+hnjmrt-1 info.flags=0x0 2022-08-01T19:11:10.661+0300 7fdfd6ffd640 10 lifecycle: adding a2.rgw.meta+root+hnjmrt-1 to cache LRU end 2022-08-01T19:11:10.661+0300 7fdfd6ffd640 0 lifecycle: ERROR: get_bucket_entrypoint_info() returned -2 bucket=:hnjmrt-1[2bd5e553-5d25-4207-8b94-0b96c5d80301.4146.1]) 2022-08-01T19:11:10.661+0300 7fdfd6ffd640 0 lifecycle: ERROR: failed converting old bucket info: -2 2022-08-01T19:11:10.661+0300 7fdfd6ffd640 0 lifecycle: RGWLC::RGWDeleteLC() failed to set attrs on bucket=hnjmrt-1 returned err=-2 2022-08-01T19:11:10.661+0300 7fdfd6ffd640 0 rgw async rados processor: put_post failed to remove lc config for hnjmrt-1 2022-08-01T19:11:10.661+0300 7fdfd6ffd640 0 rgw async rados processor: ERROR: can't store key: bucket.instance:hnjmrt-1:2bd5e553-5d25-4207-8b94-0b96c5d80301.4146.1 ret=-2 2022-08-01T19:11:10.661+0300 7fdf577fe640 20 rgw rados thread: cr:s=0x7fdf44171170:op=0x7fdf440d8fe0:19RGWMetaStoreEntryCR: operate() 2022-08-01T19:11:10.661+0300 7fdf577fe640 20 rgw rados thread: cr:s=0x7fdf44171170:op=0x7fdf440d8fe0:19RGWMetaStoreEntryCR: operate() returned r=-2 2022-08-01T19:11:10.661+0300 7fdf577fe640 20 rgw rados thread: cr:s=0x7fdf44171170:op=0x7fdf44065540:24RGWMetaSyncSingleEntryCR: operate() 2022-08-01T19:11:10.661+0300 7fdf577fe640 20 rgw rados thread: cr:s=0x7fdf44171170:op=0x7fdf44065540:24RGWMetaSyncSingleEntryCR: failed to store metadata entry: bucket.instance:hnjmrt-1:2bd5e553-5d25-4207-8b94-0b96c5d80301.4146.1, got retcode=-2, will retry
this ENOENT error originates from RGWMetadataHandlerPut_BucketInstance::put_post()
, where https://github.com/ceph/ceph/pull/46928 recently added new logic to update the lc list
get_bucket_entrypoint_info()
at the bottom of this call stack:
- RGWLC::remove_bucket_config()
- RadosBucket::merge_and_store_attrs()
- RGWBucketCtl::set_bucket_instance_attrs()
- RGWBucketCtl::convert_old_bucket_info()
metadata sync can't make any guarantees about the ordering of these sync events. so when it needs to sync a piece of bucket instance metadata, that sync must not depend on the existence of its entrypoint metadata. in this case, metadata sync had just removed this entrypoint metadata because it was deleted on the master zone
ultimately, i'm not sure why convert_old_bucket_info()
is being called here. but RGWLC::remove_bucket_config()
shouldn't be calling merge_and_store_attrs()
to remove RGW_ATTR_LC
, because RGWMetadataHandlerPut_BucketInstance::put_post()
already saw that the attribute isn't there. this code path should only need to call guard_lc_modify()
->sal_lc->rm_entry()
Updated by Matt Benjamin over 1 year ago
- Status changed from New to In Progress
- Assignee set to Matt Benjamin
Updated by Matt Benjamin over 1 year ago
From what I can make out, the reasons why the metadata sync must not call merge_and_store_attrs(...) are essentially a layering violation--and more importantly, it certainly looks like this call path, if taken, should recover cleanly from the no-entrypoint error.
That said, it's easy to avoid this call path from remove_bucket_config() in more or less the same way set_bucket_config() does, so for now, let's do that.
Matt
Updated by Matt Benjamin over 1 year ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 47411
Updated by Casey Bodley over 1 year ago
- Has duplicate Bug #57129: rgw: multisite tests are failing on "meta checkpoint" checks added
Updated by Casey Bodley over 1 year ago
- Status changed from Fix Under Review to Resolved