Bug #21506
closedmultisite: sync of bucket entrypoints fail with ENOENT
0%
Description
A regression in https://github.com/ceph/ceph/pull/12704 is causing RGWMetaStoreEntryCR for new bucket entrypoint metadata to fail with ENOENT.
2017-09-22 11:20:11.578852 7f37427fc700 10 RGW-SYNC:meta:shard[0]:entry[bucket:BUCKET0]: storing local metadata entry 2017-09-22 11:20:11.578879 7f37427fc700 20 cr:s=0x7f3728071b90:op=0x7f372802af20:19RGWMetaStoreEntryCR: operate() 2017-09-22 11:20:11.578886 7f37427fc700 20 cr:s=0x7f3728071b90:op=0x7f372802af20:19RGWMetaStoreEntryCR: operate() 2017-09-22 11:20:11.578907 7f37427fc700 20 enqueued request req=0x7f372808dad0 2017-09-22 11:20:11.578910 7f37427fc700 20 RGWWQ: 2017-09-22 11:20:11.578911 7f37427fc700 20 req: 0x7f372808dad0 2017-09-22 11:20:11.578918 7f37427fc700 20 run: stack=0x7f3728071b90 is io blocked 2017-09-22 11:20:11.578944 7f37e27fc700 20 dequeued request req=0x7f372808dad0 2017-09-22 11:20:11.578951 7f37e27fc700 20 RGWWQ: empty 2017-09-22 11:20:11.579925 7f37e27fc700 20 get_system_obj_state: rctx=0x7f37e27f7a00 obj=na-2.rgw.meta:root:BUCKET0 state=0x7f37c8009280 s->prefetch_data=0 2017-09-22 11:20:11.579941 7f37e27fc700 10 cache get: name=na-2.rgw.meta+root+BUCKET0 : miss 2017-09-22 11:20:11.580068 7f37e27fc700 1 -- 10.17.151.111:0/1502963065 --> 10.17.151.111:6804/30421 -- osd_op(unknown.0.0:349 3.1 3:8da203c3:root::BUCKET0:head [call version.read,getxattrs,stat] snapc 0=[] ondisk+read+known_if_redirected e17) v8 -- 0x7f37c800d5d0 con 0 2017-09-22 11:20:11.580656 7f3819d79700 1 -- 10.17.151.111:0/1502963065 <== osd.0 10.17.151.111:6804/30421 402 ==== osd_op_reply(349 BUCKET0 [call,getxattrs,stat] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 235+0+0 (3490804246 0 0) 0x7f38080398f0 con 0x55f9367f58f0
This shows RGWMetaStoreEntryCR calling RGWBucketMetadataHandler::put() to write the bucket entrypoint metadata for a new BUCKET0. The osd_op is from store->get_bucket_entrypoint_info() trying to stat the bucket entrypoint object. Because the osd returns ENOENT, it updates 'RGWObjectCtx obj_ctx' with exists=false.
2017-09-22 11:20:11.580817 7f37e27fc700 10 cache put: name=na-2.rgw.meta+root+BUCKET0 info.flags=0x0 2017-09-22 11:20:11.580837 7f37e27fc700 10 adding na-2.rgw.meta+root+BUCKET0 to cache LRU end 2017-09-22 11:20:11.580945 7f37e27fc700 1 -- 10.17.151.111:0/1502963065 --> 10.17.151.111:6804/30421 -- osd_op(unknown.0.0:350 3.1 3:8da203c3:root::BUCKET0:head [delete,create,call version.set,writefull 0~285] snapc 0=[] ondisk+write+known_if_redirected e17) v8 -- 0x7f37c8008ce0 con 0 2017-09-22 11:20:11.727255 7f3819d79700 1 -- 10.17.151.111:0/1502963065 <== osd.0 10.17.151.111:6804/30421 403 ==== osd_op_reply(347 gc.10 [call] v17'12 uv12 ondisk = 0) v8 ==== 149+0+0 (842746504 0 0) 0x7f3808039b00 con 0x55f9367f58f0 2017-09-22 11:20:11.727831 7f37e37fe700 1 -- 10.17.151.111:0/1502963065 --> 10.17.151.111:6804/30421 -- osd_op(unknown.0.0:351 4.3 4:c8333aab:gc::gc.11:head [call lock.lock] snapc 0=[] ondisk+write+known_if_redirected e17) v8 -- 0x7f37d000c240 con 0 2017-09-22 11:20:11.875368 7f3819d79700 1 -- 10.17.151.111:0/1502963065 <== osd.0 10.17.151.111:6804/30421 404 ==== osd_op_reply(348 obj_delete_at_hint.0000000022 [call] v17'18 uv18 ondisk = 0) v8 ==== 173+0+0 (480712473 0 0) 0x7f380803a150 con 0x55f9367f58f0 2017-09-22 11:20:11.875625 7f3819d79700 1 -- 10.17.151.111:0/1502963065 <== osd.0 10.17.151.111:6804/30421 405 ==== osd_op_reply(350 BUCKET0 [delete,create,call,writefull 0~285] v17'3 uv3 ondisk = 0) v8 ==== 277+0+0 (2438057665 0 0) 0x7f380803a150 con 0x55f9367f58f0
This shows a successful call to store->put_bucket_entrypoint_info(). The function doesn't take an RGWObjectCtx argument, so we still have 'exists=false' in the cache.
2017-09-22 11:20:11.875754 7f37e2ffd700 20 proceeding shard = obj_delete_at_hint.0000000023 2017-09-22 11:20:11.875791 7f37e27fc700 10 cache put: name=na-2.rgw.meta+root+BUCKET0 info.flags=0x17 2017-09-22 11:20:11.875801 7f37e27fc700 10 moving na-2.rgw.meta+root+BUCKET0 to cache LRU end 2017-09-22 11:20:11.875832 7f37e27fc700 10 distributing notification oid=notify.1 bl.length()=457 2017-09-22 11:20:11.875893 7f37e2ffd700 1 -- 10.17.151.111:0/1502963065 --> 10.17.151.111:6804/30421 -- osd_op(unknown.0.0:352 4.3 4:d201668e:::obj_delete_at_hint.0000000023:head [call lock.lock] snapc 0=[] ondisk+write+known_if_redirected e17) v8 -- 0x7f37d4003380 con 0 2017-09-22 11:20:11.876123 7f37e27fc700 1 -- 10.17.151.111:0/1502963065 --> 10.17.151.111:6804/30421 -- osd_op(unknown.0.0:353 2.0 2:05bf5b68:::notify.1:head [notify cookie 139877555417600] snapc 0=[] ondisk+read+known_if_redirected e17) v8 -- 0x7f37c80123c0 con 0 2017-09-22 11:20:11.876877 7f3819d79700 1 -- 10.17.151.111:0/1502963065 <== osd.0 10.17.151.111:6804/30421 406 ==== watch-notify(notify (1) cookie 94528849631936 notify 73014444067 ret 0) v3 ==== 499+0+0 (220046761 0 0) 0x7f380803a3b0 con 0x55f9367f58f0 2017-09-22 11:20:11.876950 7f38017fa700 10 RGWWatcher::handle_notify() notify_id 73014444067 cookie 94528849631936 notifier 4104 bl.length()=457 2017-09-22 11:20:11.876977 7f38017fa700 10 cache put: name=na-2.rgw.meta+root+BUCKET0 info.flags=0x17 2017-09-22 11:20:11.876969 7f3819d79700 1 -- 10.17.151.111:0/1502963065 <== osd.0 10.17.151.111:6804/30421 407 ==== osd_op_reply(353 notify.1 [notify cookie 139877555417600] v0'0 uv4 ondisk = 0) v8 ==== 152+0+8 (1149695467 0 3798333813) 0x7f380803a7f0 con 0x55f9367f58f0 2017-09-22 11:20:11.876983 7f38017fa700 10 moving na-2.rgw.meta+root+BUCKET0 to cache LRU end 2017-09-22 11:20:11.877070 7f38017fa700 1 -- 10.17.151.111:0/1502963065 --> 10.17.151.111:6804/30421 -- osd_op(unknown.0.0:354 2.0 2:05bf5b68:::notify.1:head [notify-ack cookie 0] snapc 0=[] ondisk+read+known_if_redirected e17) v8 -- 0x7f37e800ead0 con 0 2017-09-22 11:20:11.877663 7f3819d79700 1 -- 10.17.151.111:0/1502963065 <== osd.0 10.17.151.111:6804/30421 408 ==== watch-notify(notify_complete (2) cookie 139877555417600 notify 73014444067 ret 0) v3 ==== 42+0+28 (4209997224 0 9641197) 0x7f3808039a40 con 0x55f9367f58f0 2017-09-22 11:20:11.877783 7f3819d79700 1 -- 10.17.151.111:0/1502963065 <== osd.0 10.17.151.111:6804/30421 409 ==== osd_op_reply(354 notify.1 [notify-ack cookie 0] v0'0 uv4 ondisk = 0) v8 ==== 152+0+0 (3339349030 0 0) 0x7f380803ad00 con 0x55f9367f58f0
The same RGWObjectCtx is passed to store->get_bucket_info(), so get_system_obj_state() sees 'exists=false' and returns -ENOENT immediately.
2017-09-22 11:20:11.877889 7f37e27fc700 20 get_system_obj_state: rctx=0x7f37e27f7a00 obj=na-2.rgw.meta:root:BUCKET0 state=0x7f37c8009280 s->prefetch_data=0 2017-09-22 11:20:11.877956 7f37e27fc700 0 meta sync: ERROR: can't store key: bucket:BUCKET0 ret=-2 2017-09-22 11:20:11.878037 7f37427fc700 20 cr:s=0x7f3728071b90:op=0x7f372802af20:19RGWMetaStoreEntryCR: operate() 2017-09-22 11:20:11.878056 7f37427fc700 20 cr:s=0x7f3728071b90:op=0x7f372802af20:19RGWMetaStoreEntryCR: operate() returned r=-2 2017-09-22 11:20:11.878079 7f37427fc700 20 cr:s=0x7f3728071b90:op=0x7f3728071d80:24RGWMetaSyncSingleEntryCR: operate() 2017-09-22 11:20:11.878085 7f37427fc700 10 RGW-SYNC:meta:shard[0]:entry[bucket:BUCKET0]: failed, status=-2 2017-09-22 11:20:11.878087 7f37427fc700 20 cr:s=0x7f3728071b90:op=0x7f3728071d80:24RGWMetaSyncSingleEntryCR: operate() returned r=-2 2017-09-22 11:20:11.878093 7f37427fc700 20 stack->operate() returned ret=-2 2017-09-22 11:20:11.878096 7f37427fc700 20 run: stack=0x7f3728071b90 is done
Updated by Casey Bodley over 6 years ago
- Status changed from 12 to Fix Under Review
Updated by Yuri Weinstein over 6 years ago
Updated by Casey Bodley over 6 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Casey Bodley over 6 years ago
- Related to Backport #21460: luminous: rgw: missing support for per storage-policy usage statistics added
Updated by Nathan Cutler over 6 years ago
- Copied to Backport #21649: luminous: multisite: sync of bucket entrypoints fail with ENOENT added
Updated by Kefu Chai over 6 years ago
- Status changed from Pending Backport to Resolved