Project

General

Profile

Actions

Bug #19552

closed

LibRadosMiscPool.PoolCreationRace occasionally hangs

Added by Sage Weil about 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/sage-2017-04-07_17:36:56-rados:thrash-wip-snapset---basic-smithi/997255

2017-04-07 18:29:11.963241 7f5e94575740 15 client.4198.objecter _session_op_assign -1 510
2017-04-07 18:29:11.963268 7f5e94575740  1 -- 172.21.15.137:0/1893951626 --> 172.21.15.137:6789/0 -- mon_get_version(what=osdmap handle=513) v1 -- ?+0 0x7f5e95b2db90 con 0x7f5e54002ac0
2017-04-07 18:29:11.963285 7f5e94575740  5 client.4198.objecter 510 in flight
2017-04-07 18:29:11.963326 7f5e94575740 10 client.4198.objecter _op_submit op 0x7f5e95b2e440
2017-04-07 18:29:11.963350 7f5e94575740 20 client.4198.objecter _calc_target epoch 477 base PoolCreationRaceObj @9 precalc_pgid 0 pgid 0.0 is_read
2017-04-07 18:29:11.963355 7f5e94575740 20 client.4198.objecter _get_session osd=-1 returning homeless
2017-04-07 18:29:11.963581 7f5e69ffb700  1 -- 172.21.15.137:0/1893951626 <== mon.0 172.21.15.137:6789/0 435 ==== mon_get_version_reply(handle=513 version=477) v2 ==== 24+0+0 (1027138385 0 0) 0x7f5e5c000a00 con 0x7f5e54002ac0
2017-04-07 18:29:11.963623 7f5e637fe700 10 op_map_latest r=0 tid=510 latest 477
2017-04-07 18:29:11.963629 7f5e637fe700 20 op_map_latest op 0x7f5e95b2e120
2017-04-07 18:29:11.963630 7f5e637fe700 10 client.4198.objecter check_op_pool_dne tid 510 current 477 map_dne_bound 477
2017-04-07 18:29:11.963632 7f5e637fe700 10 client.4198.objecter check_op_pool_dne tid 510 concluding pool 0 dne
2017-04-07 18:29:11.963635 7f5e637fe700 15 client.4198.objecter finish_op 510
2017-04-07 18:29:11.963638 7f5e637fe700 15 client.4198.objecter _session_op_remove -1 510
2017-04-07 18:29:12.251577 7f5e69ffb700  1 -- 172.21.15.137:0/1893951626 <== mon.0 172.21.15.137:6789/0 436 ==== osd_map(478..478 src has 1..478) v3 ==== 222+0+0 (2142394961 0 0) 0x7f5e5c001830 con 0x7f5e54002ac0
2017-04-07 18:29:12.251619 7f5e69ffb700 10 client.4198.objecter ms_dispatch 0x7f5e95a95a40 osd_map(478..478 src has 1..478) v3
2017-04-07 18:29:12.251640 7f5e69ffb700  3 client.4198.objecter handle_osd_map got epochs [478,478] > 477
2017-04-07 18:29:12.251645 7f5e69ffb700  3 client.4198.objecter handle_osd_map decoding incremental epoch 478
2017-04-07 18:29:12.251685 7f5e69ffb700 20 client.4198.objecter dump_active .. 0 homeless
2017-04-07 18:29:12.963462 7f5e94575740 10 client.4198.objecter _op_submit relock raced with osdmap, recalc target
2017-04-07 18:29:12.963482 7f5e94575740 20 client.4198.objecter _calc_target epoch 478 base PoolCreationRaceObj @9 precalc_pgid 0 pgid 0.0 is_read
2017-04-07 18:29:12.963488 7f5e94575740 20 client.4198.objecter _get_session osd=-1 returning homeless
2017-04-07 18:29:12.963492 7f5e94575740 10 client.4198.objecter _maybe_request_map subscribing (onetime) to next osd map
2017-04-07 18:29:12.963516 7f5e94575740  1 -- 172.21.15.137:0/1893951626 --> 172.21.15.137:6789/0 -- mon_subscribe({osdmap=479}) v2 -- ?+0 0x7f5e95b2e0a0 con 0x7f5e54002ac0
2017-04-07 18:29:12.963533 7f5e94575740 10 client.4198.objecter _op_submit oid PoolCreationRaceObj '@9' '@-1' [read 0~100] tid 511 osd.-1
2017-04-07 18:29:12.963539 7f5e94575740 15 client.4198.objecter _session_op_assign -1 511
2017-04-07 18:29:12.963549 7f5e94575740  1 -- 172.21.15.137:0/1893951626 --> 172.21.15.137:6789/0 -- mon_get_version(what=osdmap handle=514) v1 -- ?+0 0x7f5e95b2db90 con 0x7f5e54002ac0
2017-04-07 18:29:12.963557 7f5e94575740  5 client.4198.objecter 511 in flight
2017-04-07 18:29:12.963585 7f5e94575740 10 client.4198.objecter _op_submit op 0x7f5e95b2e680
2017-04-07 18:29:12.963592 7f5e94575740 20 client.4198.objecter _calc_target epoch 478 base PoolCreationRaceObj @9 precalc_pgid 0 pgid 0.0 is_read
2017-04-07 18:29:12.963595 7f5e94575740 20 client.4198.objecter _get_session osd=-1 returning homeless
2017-04-07 18:29:12.963836 7f5e69ffb700  1 -- 172.21.15.137:0/1893951626 <== mon.0 172.21.15.137:6789/0 437 ==== mon_get_version_reply(handle=514 version=478) v2 ==== 24+0+0 (3858543297 0 0) 0x7f5e5c000a00 con 0x7f5e54002ac0
2017-04-07 18:29:12.963882 7f5e637fe700 10 op_map_latest r=0 tid=511 latest 478
2017-04-07 18:29:12.963891 7f5e637fe700 20 op_map_latest op 0x7f5e95b2e440
2017-04-07 18:29:12.963892 7f5e637fe700 10 client.4198.objecter check_op_pool_dne tid 511 current 478 map_dne_bound 478
2017-04-07 18:29:12.963894 7f5e637fe700 10 client.4198.objecter check_op_pool_dne tid 511 concluding pool 0 dne
2017-04-07 18:29:12.963897 7f5e637fe700 15 client.4198.objecter finish_op 511
2017-04-07 18:29:12.963901 7f5e637fe700 15 client.4198.objecter _session_op_remove -1 511
2017-04-07 18:29:13.963677 7f5e94575740 10 client.4198.objecter _maybe_request_map subscribing (onetime) to next osd map
2017-04-07 18:29:13.963701 7f5e94575740 10 client.4198.objecter _op_submit oid PoolCreationRaceObj '@9' '@-1' [read 0~100] tid 512 osd.-1
2017-04-07 18:29:13.963712 7f5e94575740 15 client.4198.objecter _session_op_assign -1 512
2017-04-07 18:29:13.963736 7f5e94575740  1 -- 172.21.15.137:0/1893951626 --> 172.21.15.137:6789/0 -- mon_get_version(what=osdmap handle=515) v1 -- ?+0 0x7f5e95b2db90 con 0x7f5e54002ac0
2017-04-07 18:29:13.963752 7f5e94575740  5 client.4198.objecter 512 in flight
2017-04-07 18:29:13.964117 7f5e69ffb700  1 -- 172.21.15.137:0/1893951626 <== mon.0 172.21.15.137:6789/0 438 ==== mon_get_version_reply(handle=515 version=478) v2 ==== 24+0+0 (1605595215 0 0) 0x7f5e5c000a00 con 0x7f5e54002ac0
2017-04-07 18:29:13.964152 7f5e637fe700 10 op_map_latest r=0 tid=512 latest 478
2017-04-07 18:29:13.964169 7f5e637fe700 20 op_map_latest op 0x7f5e95b2e680
2017-04-07 18:29:13.964171 7f5e637fe700 10 client.4198.objecter check_op_pool_dne tid 512 current 478 map_dne_bound 478
2017-04-07 18:29:13.964175 7f5e637fe700 10 client.4198.objecter check_op_pool_dne tid 512 concluding pool 0 dne
2017-04-07 18:29:13.964179 7f5e637fe700 15 client.4198.objecter finish_op 512
2017-04-07 18:29:13.964185 7f5e637fe700 15 client.4198.objecter _session_op_remove -1 512
2017-04-07 18:29:14.344098 7f5e84f74700 10 client.4189.objecter tick
2017-04-07 18:29:14.364349 7f5e6a7fc700 10 client.4198.objecter tick
2017-04-07 18:29:14.726543 7f5e69ffb700  1 -- 172.21.15.137:0/1893951626 <== mon.0 172.21.15.137:6789/0 439 ==== osd_map(479..479 src has 1..479) v3 ==== 222+0+0 (92996278 0 0) 0x7f5e5c001830 con 0x7f5e54002ac0
2017-04-07 18:29:14.726565 7f5e69ffb700 10 client.4198.objecter ms_dispatch 0x7f5e95a95a40 osd_map(479..479 src has 1..479) v3
2017-04-07 18:29:14.726576 7f5e69ffb700  3 client.4198.objecter handle_osd_map got epochs [479,479] > 478
2017-04-07 18:29:14.726578 7f5e69ffb700  3 client.4198.objecter handle_osd_map decoding incremental epoch 479
2017-04-07 18:29:14.726604 7f5e69ffb700 20 client.4198.objecter dump_active .. 0 homeless
2017-04-07 18:29:19.344216 7f5e84f74700 10 client.4189.objecter tick
2017-04-07 18:29:19.364438 7f5e6a7fc700 10 client.4198.objecter tick
2017-04-07 18:29:24.344297 7f5e84f74700 10 client.4189.objecter tick
2017-04-07 18:29:24.364530 7f5e6a7fc700 10 client.4198.objecter tick
2017-04-07 18:29:29.344388 7f5e84f74700 10 client.4189.objecter tick
2017-04-07 18:29:29.364612 7f5e6a7fc700 10 client.4198.objecter tick
2017-04-07 18:29:34.344503 7f5e84f74700 10 client.4189.objecter tick
2017-04-07 18:29:34.364714 7f5e6a7fc700 10 client.4198.objecter tick
Actions #1

Updated by Sage Weil about 7 years ago

ceph-client.admin.2616.log

Actions #2

Updated by Sage Weil about 7 years ago

  • Priority changed from Urgent to Immediate

/a/sage-2017-04-09_20:30:53-rados-wip-sage-testing---basic-smithi/1005460

Actions #3

Updated by Sage Weil almost 7 years ago

/a/sage-2017-04-20_18:45:07-upgrade:jewel-x-master---basic-smithi/1049796

Actions #4

Updated by Sage Weil almost 7 years ago

/a/sage-2017-04-21_23:56:14-rados-wip-sage-testing---basic-smithi/1054287

Actions #5

Updated by Sage Weil almost 7 years ago

  • Status changed from New to In Progress
  • Assignee set to Sage Weil

This happens when there is another pool created between the poolrace pools. Updating the test to skip when that happens.

Actions #6

Updated by Sage Weil almost 7 years ago

  • Status changed from In Progress to Fix Under Review
Actions #7

Updated by Sage Weil almost 7 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF