Actions
Bug #20465
closedRace seen between pool creation and wait_for_clean(): seen in test-erasure-eio.sh
Status:
Resolved
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I suspect that if creating a pool takes too long before the initial pgs exist in "creating" state then calling wait_for_clean() will return immediately without actually waiting for the PGs to be created.
The error "Invalid poolpool-jerasure" means that injectdataerr didn't find the pool named "pool-jerasure." The whole purpose of the wait_for_clean() is that the pool exists and all PGs for it are ready.
/home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:261: TEST_rados_get_subread_eio_shard_1: local poolname=pool-jerasure /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:262: TEST_rados_get_subread_eio_shard_1: create_erasure_coded_pool pool-jerasure /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:56: create_erasure_coded_pool: local poolname=pool-jerasure /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:58: create_erasure_coded_pool: ceph osd erasure-code-profile set myprofile plugin=jerasure k=2 m=1 ruleset-failure-domain=osd /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:62: create_erasure_coded_pool: ceph osd pool create pool-jerasure 1 1 erasure myprofile pool 'pool-jerasure' created /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:64: create_erasure_coded_pool: wait_for_clean /home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1251: wait_for_clean: local num_active_clean=-1 /home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1252: wait_for_clean: local cur_active_clean /home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1253: wait_for_clean: delays=($(get_timeout_delays $TIMEOUT .1)) //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1253: wait_for_clean: get_timeout_delays 300 .1 ///home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1207: get_timeout_delays: shopt -q -o xtrace ///home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1207: get_timeout_delays: echo true //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1207: get_timeout_delays: local trace=true //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1208: get_timeout_delays: true //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1208: get_timeout_delays: shopt -u -o xtrace /home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1253: wait_for_clean: local -a delays /home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1254: wait_for_clean: local -i loop=0 //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1256: wait_for_clean: get_num_pgs //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1121: get_num_pgs: ceph --format json status //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1121: get_num_pgs: jq .pgmap.num_pgs /home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1256: wait_for_clean: test 4 == 0 /home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1260: wait_for_clean: true //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1264: wait_for_clean: get_num_active_clean //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1091: get_num_active_clean: local expression //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1092: get_num_active_clean: expression+='select(contains("active") and contains("clean")) | ' //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1093: get_num_active_clean: expression+='select(contains("stale") | not)' //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1094: get_num_active_clean: ceph --format json pg dump pgs //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1095: get_num_active_clean: jq '[.[] | .state | select(contains("active") and contains("clean")) | select(contains("stale") | not)] | length' /home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1264: wait_for_clean: cur_active_clean=4 //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1265: wait_for_clean: get_num_pgs //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1121: get_num_pgs: ceph --format json status //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1121: get_num_pgs: jq .pgmap.num_pgs /home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1265: wait_for_clean: test 4 = 4 /home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1265: wait_for_clean: break /home/dzafman/ceph/qa/workunits/ceph-helpers.sh:1278: wait_for_clean: return 0 /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:264: TEST_rados_get_subread_eio_shard_1: local shard_id=1 /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:265: TEST_rados_get_subread_eio_shard_1: rados_get_data_eio td/test-erasure-eio 1 /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:161: rados_get_data_eio: local dir=td/test-erasure-eio /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:162: rados_get_data_eio: shift /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:163: rados_get_data_eio: local shard_id=1 /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:164: rados_get_data_eio: shift /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:165: rados_get_data_eio: local recovery= /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:166: rados_get_data_eio: shift /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:170: rados_get_data_eio: local poolname=pool-jerasure /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:171: rados_get_data_eio: local objname=obj-eio-32508-1 /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:172: rados_get_data_eio: inject_eio obj-eio-32508-1 td/test-erasure-eio 1 /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:145: inject_eio: local objname=obj-eio-32508-1 /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:146: inject_eio: shift /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:147: inject_eio: local dir=td/test-erasure-eio /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:148: inject_eio: shift /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:149: inject_eio: local shard_id=1 /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:150: inject_eio: shift /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:152: inject_eio: local poolname=pool-jerasure /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:153: inject_eio: initial_osds=($(get_osds $poolname $objname)) //home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:153: inject_eio: get_osds pool-jerasure obj-eio-32508-1 //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:765: get_osds: local poolname=pool-jerasure //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:766: get_osds: local objectname=obj-eio-32508-1 ///home/dzafman/ceph/qa/workunits/ceph-helpers.sh:769: get_osds: ceph --format json osd map pool-jerasure obj-eio-32508-1 ///home/dzafman/ceph/qa/workunits/ceph-helpers.sh:769: get_osds: jq '.acting | .[]' //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:769: get_osds: local 'osds=3 1 2' //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:771: get_osds: echo 3 1 2 /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:153: inject_eio: local -a initial_osds /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:154: inject_eio: local osd_id=1 /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:155: inject_eio: set_config osd 1 filestore_debug_inject_read_err true /home/dzafman/ceph/qa/workunits/ceph-helpers.sh:902: set_config: local daemon=osd /home/dzafman/ceph/qa/workunits/ceph-helpers.sh:903: set_config: local id=1 /home/dzafman/ceph/qa/workunits/ceph-helpers.sh:904: set_config: local config=filestore_debug_inject_read_err /home/dzafman/ceph/qa/workunits/ceph-helpers.sh:905: set_config: local value=true //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:909: set_config: env CEPH_ARGS= ceph --format json daemon td/test-erasure-eio/ceph-osd.1.asok config set filestore_debug_inject_read_err true //home/dzafman/ceph/qa/workunits/ceph-helpers.sh:909: set_config: jq 'has("success")' /home/dzafman/ceph/qa/workunits/ceph-helpers.sh:909: set_config: test true == true /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:156: inject_eio: CEPH_ARGS= /home/dzafman/ceph/src/test/erasure-code/test-erasure-eio.sh:156: inject_eio: ceph --admin-daemon td/test-erasure-eio/ceph-osd.1.asok injectdataerr pool-jerasure obj-eio-32508-1 1 Invalid poolpool-jerasure
Actions