Bug #40721
backfill caught in loop from block
0%
Description
ing mbc={0={},1={},2={}} trimq=[2d7~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] recover_replicas(10) ing mbc={0={},1={},2={}} trimq=[2d7~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] recover_primary recovering 0 in pg, missing missing(0 may_include_deletes = 1) ing mbc={0={},1={},2={}} trimq=[2d7~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] recover_primary {} ing mbc={0={},1={},2={}} trimq=[2d7~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] recover_backfill (10) bft=7(2) last_backfill_started 2:f7b829c8:::smithi16613012-16:2db ing mbc={0={},1={},2={}} trimq=[2d7~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] peer osd.7(2) info 2.fs2( v 874'1021 lc 856'1013 (464'631,874'1021] lb 2:f7b829c8:::smithi16613012-16:2db local-lis/les=833/834 n=6 ec=20/20 lis/c 833/680 les/c/f 834/681/0 833/833/619) ing mbc={0={},1={},2={}} trimq=[2d7~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] update_range: bi is current ing mbc={0={},1={},2={}} trimq=[2d7~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] my backfill interval BackfillInfo(2:f7b829c8:::smithi16613012-16:head-MAX 1 objects {2:f7b829c8:::smithi16613012-16:head=824'1009}) ing mbc={0={},1={},2={}} trimq=[2d7~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] peer shard 7(2) backfill BackfillInfo(MAX-MAX 0 objects) ing mbc={0={},1={},2={}} trimq=[2d7~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] get_object_context: found obc in cache: 0x55b491e44940 ing mbc={0={},1={},2={}} trimq=[2d7~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] get_object_context: 0x55b491e44940 2:f7b829c8:::smithi16613012-16:head rwstate(write n=1 w=0) oi: 2:f7b829c8:::smithi16613012-16:head(824'1009 client.4308.0:8771 dirty|data_digest s 1386 ing mbc={0={},1={},2={}} trimq=[2d7~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] backfill blocking on 2:f7b829c8:::smithi16613012-16:head; could not get rw_manager lock ing mbc={0={},1={},2={}} trimq=[2d7~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] backfill_pos is 2:f7b829c8:::smithi16613012-16:head ing mbc={0={},1={},2={}} trimq=[2d7~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] starting new_last_backfill at 2:f7b829c8:::smithi16613012-16:2db ing mbc={0={},1={},2={}} trimq=[2d7~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] possible new_last_backfill at 2:f7b829c8:::smithi16613012-16:2db ing mbc={0={},1={},2={}} trimq=[2d7~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] final new_last_backfill at 2:f7b829c8:::smithi16613012-16:2db ing mbc={0={},1={},2={}} trimq=[2d7~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] started 0 ... ~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] recover_replicas(10) ~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] recover_primary recovering 0 in pg, missing missing(0 may_include_deletes = 1) ~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] recover_primary {} ~1,2da~1,2dc~1,2e1~2,2e4~3,2e8~1,2ea~1,2ed~2,2f1~2,2f4~2,2f7~1,2f9~2,2fd~2,301~1,307~2,30b~2]] recover_backfill (10) bft=7(2) last_backfill_started 2:f7b829c8:::smithi16613012-16:2db
and it loops.
/a/sage-2019-07-10_01:52:27-rados-wip-sage-testing-2019-07-09-1801-distro-basic-smithi/4107563 osd.4 pg 2.f
History
#1 Updated by Sage Weil over 4 years ago
actually, this retry is triggered on every osdmap.
#2 Updated by Sage Weil over 4 years ago
original blocked request is
2d7~1,2da~1,2dc~1,2e1~1] ps=36] get_object_context: creating obc from disk: 0x55b491e44940 2d7~1,2da~1,2dc~1,2e1~1] ps=36] get_object_context: 0x55b491e44940 2:f7b829c8:::smithi16613012-16:head rwstate(none n=0 w=0) oi: 2:f7b829c8:::smithi16613012-16:head(824'1009 client.4308.0:8771 dirty|data_digest s 13869056 uv 1220 dd 97d4a27b alloc_hint [0 0 0]) exists: 1 ssc: 0x55b492ade2c0 snapset: 2db=[]:{2db=[2d 2d7~1,2da~1,2dc~1,2e1~1] ps=36] find_object_context 2:f7b829c8:::smithi16613012-16:head @head oi=2:f7b829c8:::smithi16613012-16:head(824'1009 client.4308.0:8771 dirty|data_digest s 13869056 uv 1220 dd 97d4a27b alloc_hint [0 0 0]) 2d7~1,2da~1,2dc~1,2e1~1] ps=36] do_op oi 2:f7b829c8:::smithi16613012-16:head(824'1009 client.4308.0:8771 dirty|data_digest s 13869056 uv 1220 dd 97d4a27b alloc_hint [0 0 0]) 2d7~1,2da~1,2dc~1,2e1~1] ps=36] do_op obc obc(2:f7b829c8:::smithi16613012-16:head rwstate(write n=1 w=0)) 2d7~1,2da~1,2dc~1,2e1~1] ps=36] execute_ctx 0x55b493246900 2d7~1,2da~1,2dc~1,2e1~1] ps=36] execute_ctx 2:f7b829c8:::smithi16613012-16:head [copy-from ver 1207] ov 824'1009 av 834'1012 snapc 2e2=[2e2] snapset 2db=[]:{2db=[2da,2d7]} 2d7~1,2da~1,2dc~1,2e1~1] ps=36] do_osd_op 2:f7b829c8:::smithi16613012-16:head [copy-from ver 1207] 2d7~1,2da~1,2dc~1,2e1~1] ps=36] do_osd_op copy-from ver 1207 2d7~1,2da~1,2dc~1,2e1~1] ps=36] start_copy 2:f7b829c8:::smithi16613012-16:head from 2:4854f1b1:::smithi16613012-25:head @2 v1207 flags 0 2d7~1,2da~1,2dc~1,2e1~1] ps=36] _copy_some obc(2:f7b829c8:::smithi16613012-16:head rwstate(write n=1 w=0)) 0x55b49482ed10
#3 Updated by Greg Farnum over 4 years ago
- Assignee set to Samuel Just
#4 Updated by Samuel Just over 4 years ago
Unfortunately, I think the job number is wrong -- I don't see that object in the log (smithi19817795-* objects are included instead).
#5 Updated by Samuel Just over 4 years ago
Or, I guess the directory is probably correct in that the teuthology.log output is consistent with the above, but the ceph-osd.?.log files appear to be from something else possibly.
#6 Updated by Samuel Just over 4 years ago
The copy_from on that object lasted until the end of the test. It did succeed, but presumably during shutdown once the interval changed. Need to find a way for lb=2:f7b829c8:::smithi16613012-16:2db to block the completion of the copyfrom.
#7 Updated by Samuel Just over 4 years ago
Based on the snapcontext, make_writeable should have created a clone.
#8 Updated by Samuel Just over 4 years ago
- Priority changed from Urgent to High
I don't think I can make further progress without more logs, I'm marking this need more info for the time being. As it's a problem with copyfrom and apparently not occuring all that often, I'm also dropping it to high.
#9 Updated by Samuel Just over 4 years ago
- Status changed from 12 to Need More Info
#10 Updated by Neha Ojha over 3 years ago
- Status changed from Need More Info to Can't reproduce