Project

General

Profile

Bug #21506

multisite: sync of bucket entrypoints fail with ENOENT

Added by Casey Bodley 3 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
Start date:
09/22/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

A regression in https://github.com/ceph/ceph/pull/12704 is causing RGWMetaStoreEntryCR for new bucket entrypoint metadata to fail with ENOENT.

2017-09-22 11:20:11.578852 7f37427fc700 10 RGW-SYNC:meta:shard[0]:entry[bucket:BUCKET0]: storing local metadata entry
2017-09-22 11:20:11.578879 7f37427fc700 20 cr:s=0x7f3728071b90:op=0x7f372802af20:19RGWMetaStoreEntryCR: operate()
2017-09-22 11:20:11.578886 7f37427fc700 20 cr:s=0x7f3728071b90:op=0x7f372802af20:19RGWMetaStoreEntryCR: operate()
2017-09-22 11:20:11.578907 7f37427fc700 20 enqueued request req=0x7f372808dad0
2017-09-22 11:20:11.578910 7f37427fc700 20 RGWWQ:
2017-09-22 11:20:11.578911 7f37427fc700 20 req: 0x7f372808dad0
2017-09-22 11:20:11.578918 7f37427fc700 20 run: stack=0x7f3728071b90 is io blocked
2017-09-22 11:20:11.578944 7f37e27fc700 20 dequeued request req=0x7f372808dad0
2017-09-22 11:20:11.578951 7f37e27fc700 20 RGWWQ: empty
2017-09-22 11:20:11.579925 7f37e27fc700 20 get_system_obj_state: rctx=0x7f37e27f7a00 obj=na-2.rgw.meta:root:BUCKET0 state=0x7f37c8009280 s->prefetch_data=0
2017-09-22 11:20:11.579941 7f37e27fc700 10 cache get: name=na-2.rgw.meta+root+BUCKET0 : miss
2017-09-22 11:20:11.580068 7f37e27fc700  1 -- 10.17.151.111:0/1502963065 --> 10.17.151.111:6804/30421 -- osd_op(unknown.0.0:349 3.1 3:8da203c3:root::BUCKET0:head [call version.read,getxattrs,stat] snapc 0=[] ondisk+read+known_if_redirected e17) v8 -- 0x7f37c800d5d0 con 0
2017-09-22 11:20:11.580656 7f3819d79700  1 -- 10.17.151.111:0/1502963065 <== osd.0 10.17.151.111:6804/30421 402 ==== osd_op_reply(349 BUCKET0 [call,getxattrs,stat] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 235+0+0 (3490804246 0 0) 0x7f38080398f0 con 0x55f9367f58f0

This shows RGWMetaStoreEntryCR calling RGWBucketMetadataHandler::put() to write the bucket entrypoint metadata for a new BUCKET0. The osd_op is from store->get_bucket_entrypoint_info() trying to stat the bucket entrypoint object. Because the osd returns ENOENT, it updates 'RGWObjectCtx obj_ctx' with exists=false.

2017-09-22 11:20:11.580817 7f37e27fc700 10 cache put: name=na-2.rgw.meta+root+BUCKET0 info.flags=0x0
2017-09-22 11:20:11.580837 7f37e27fc700 10 adding na-2.rgw.meta+root+BUCKET0 to cache LRU end
2017-09-22 11:20:11.580945 7f37e27fc700  1 -- 10.17.151.111:0/1502963065 --> 10.17.151.111:6804/30421 -- osd_op(unknown.0.0:350 3.1 3:8da203c3:root::BUCKET0:head [delete,create,call version.set,writefull 0~285] snapc 0=[] ondisk+write+known_if_redirected e17) v8 -- 0x7f37c8008ce0 con 0
2017-09-22 11:20:11.727255 7f3819d79700  1 -- 10.17.151.111:0/1502963065 <== osd.0 10.17.151.111:6804/30421 403 ==== osd_op_reply(347 gc.10 [call] v17'12 uv12 ondisk = 0) v8 ==== 149+0+0 (842746504 0 0) 0x7f3808039b00 con 0x55f9367f58f0
2017-09-22 11:20:11.727831 7f37e37fe700  1 -- 10.17.151.111:0/1502963065 --> 10.17.151.111:6804/30421 -- osd_op(unknown.0.0:351 4.3 4:c8333aab:gc::gc.11:head [call lock.lock] snapc 0=[] ondisk+write+known_if_redirected e17) v8 -- 0x7f37d000c240 con 0
2017-09-22 11:20:11.875368 7f3819d79700  1 -- 10.17.151.111:0/1502963065 <== osd.0 10.17.151.111:6804/30421 404 ==== osd_op_reply(348 obj_delete_at_hint.0000000022 [call] v17'18 uv18 ondisk = 0) v8 ==== 173+0+0 (480712473 0 0) 0x7f380803a150 con 0x55f9367f58f0
2017-09-22 11:20:11.875625 7f3819d79700  1 -- 10.17.151.111:0/1502963065 <== osd.0 10.17.151.111:6804/30421 405 ==== osd_op_reply(350 BUCKET0 [delete,create,call,writefull 0~285] v17'3 uv3 ondisk = 0) v8 ==== 277+0+0 (2438057665 0 0) 0x7f380803a150 con 0x55f9367f58f0

This shows a successful call to store->put_bucket_entrypoint_info(). The function doesn't take an RGWObjectCtx argument, so we still have 'exists=false' in the cache.

2017-09-22 11:20:11.875754 7f37e2ffd700 20 proceeding shard = obj_delete_at_hint.0000000023
2017-09-22 11:20:11.875791 7f37e27fc700 10 cache put: name=na-2.rgw.meta+root+BUCKET0 info.flags=0x17
2017-09-22 11:20:11.875801 7f37e27fc700 10 moving na-2.rgw.meta+root+BUCKET0 to cache LRU end
2017-09-22 11:20:11.875832 7f37e27fc700 10 distributing notification oid=notify.1 bl.length()=457
2017-09-22 11:20:11.875893 7f37e2ffd700  1 -- 10.17.151.111:0/1502963065 --> 10.17.151.111:6804/30421 -- osd_op(unknown.0.0:352 4.3 4:d201668e:::obj_delete_at_hint.0000000023:head [call lock.lock] snapc 0=[] ondisk+write+known_if_redirected e17) v8 -- 0x7f37d4003380 con 0
2017-09-22 11:20:11.876123 7f37e27fc700  1 -- 10.17.151.111:0/1502963065 --> 10.17.151.111:6804/30421 -- osd_op(unknown.0.0:353 2.0 2:05bf5b68:::notify.1:head [notify cookie 139877555417600] snapc 0=[] ondisk+read+known_if_redirected e17) v8 -- 0x7f37c80123c0 con 0
2017-09-22 11:20:11.876877 7f3819d79700  1 -- 10.17.151.111:0/1502963065 <== osd.0 10.17.151.111:6804/30421 406 ==== watch-notify(notify (1) cookie 94528849631936 notify 73014444067 ret 0) v3 ==== 499+0+0 (220046761 0 0) 0x7f380803a3b0 con 0x55f9367f58f0
2017-09-22 11:20:11.876950 7f38017fa700 10 RGWWatcher::handle_notify()  notify_id 73014444067 cookie 94528849631936 notifier 4104 bl.length()=457
2017-09-22 11:20:11.876977 7f38017fa700 10 cache put: name=na-2.rgw.meta+root+BUCKET0 info.flags=0x17
2017-09-22 11:20:11.876969 7f3819d79700  1 -- 10.17.151.111:0/1502963065 <== osd.0 10.17.151.111:6804/30421 407 ==== osd_op_reply(353 notify.1 [notify cookie 139877555417600] v0'0 uv4 ondisk = 0) v8 ==== 152+0+8 (1149695467 0 3798333813) 0x7f380803a7f0 con 0x55f9367f58f0
2017-09-22 11:20:11.876983 7f38017fa700 10 moving na-2.rgw.meta+root+BUCKET0 to cache LRU end
2017-09-22 11:20:11.877070 7f38017fa700  1 -- 10.17.151.111:0/1502963065 --> 10.17.151.111:6804/30421 -- osd_op(unknown.0.0:354 2.0 2:05bf5b68:::notify.1:head [notify-ack cookie 0] snapc 0=[] ondisk+read+known_if_redirected e17) v8 -- 0x7f37e800ead0 con 0
2017-09-22 11:20:11.877663 7f3819d79700  1 -- 10.17.151.111:0/1502963065 <== osd.0 10.17.151.111:6804/30421 408 ==== watch-notify(notify_complete (2) cookie 139877555417600 notify 73014444067 ret 0) v3 ==== 42+0+28 (4209997224 0 9641197) 0x7f3808039a40 con 0x55f9367f58f0
2017-09-22 11:20:11.877783 7f3819d79700  1 -- 10.17.151.111:0/1502963065 <== osd.0 10.17.151.111:6804/30421 409 ==== osd_op_reply(354 notify.1 [notify-ack cookie 0] v0'0 uv4 ondisk = 0) v8 ==== 152+0+0 (3339349030 0 0) 0x7f380803ad00 con 0x55f9367f58f0

The same RGWObjectCtx is passed to store->get_bucket_info(), so get_system_obj_state() sees 'exists=false' and returns -ENOENT immediately.

2017-09-22 11:20:11.877889 7f37e27fc700 20 get_system_obj_state: rctx=0x7f37e27f7a00 obj=na-2.rgw.meta:root:BUCKET0 state=0x7f37c8009280 s->prefetch_data=0
2017-09-22 11:20:11.877956 7f37e27fc700  0 meta sync: ERROR: can't store key: bucket:BUCKET0 ret=-2
2017-09-22 11:20:11.878037 7f37427fc700 20 cr:s=0x7f3728071b90:op=0x7f372802af20:19RGWMetaStoreEntryCR: operate()
2017-09-22 11:20:11.878056 7f37427fc700 20 cr:s=0x7f3728071b90:op=0x7f372802af20:19RGWMetaStoreEntryCR: operate() returned r=-2
2017-09-22 11:20:11.878079 7f37427fc700 20 cr:s=0x7f3728071b90:op=0x7f3728071d80:24RGWMetaSyncSingleEntryCR: operate()
2017-09-22 11:20:11.878085 7f37427fc700 10 RGW-SYNC:meta:shard[0]:entry[bucket:BUCKET0]: failed, status=-2
2017-09-22 11:20:11.878087 7f37427fc700 20 cr:s=0x7f3728071b90:op=0x7f3728071d80:24RGWMetaSyncSingleEntryCR: operate() returned r=-2
2017-09-22 11:20:11.878093 7f37427fc700 20 stack->operate() returned ret=-2
2017-09-22 11:20:11.878096 7f37427fc700 20 run: stack=0x7f3728071b90 is done

Related issues

Related to rgw - Backport #21460: luminous: rgw: missing support for per storage-policy usage statistics Resolved
Copied to rgw - Backport #21649: luminous: multisite: sync of bucket entrypoints fail with ENOENT Resolved

History

#1 Updated by Casey Bodley 3 months ago

  • Status changed from Verified to Need Review

#2 Updated by Yuri Weinstein 3 months ago

Casey Bodley wrote:

https://github.com/ceph/ceph/pull/17916

merged

#3 Updated by Casey Bodley 3 months ago

  • Status changed from Need Review to Pending Backport

#4 Updated by Casey Bodley 3 months ago

  • Related to Backport #21460: luminous: rgw: missing support for per storage-policy usage statistics added

#5 Updated by Nathan Cutler 2 months ago

  • Copied to Backport #21649: luminous: multisite: sync of bucket entrypoints fail with ENOENT added

#6 Updated by Kefu Chai 2 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF