Bug #56034
closedqa/standalone/osd/divergent-priors.sh fails in test TEST_divergent_3()
100%
Description
/a/yuriw-2022-06-13_16:36:31-rados-wip-yuri7-testing-2022-06-13-0706-distro-default-smithi/6876516
Also historically seen in:
/a/yuriw-2022-03-18_00:42:20-rados-wip-yuri6-testing-2022-03-17-1547-distro-default-smithi/6743671
Failure Reason:
Command failed (workunit test osd/divergent-priors.sh) on smithi121 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=8f8ee34a603512f9baef4f6eff0e1fc5deb2cda4 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/divergent-priors.sh'
Logs of Interest:
During the beginning of test TEST_divergent_3(), after the pool 'test' is created and the 'wait_for_clean' stage, the 'pg dump' output shows a bunch of PGs in unknown state with 'up_primary' field is set to -1. The current test code just picks the first PG from the pg dump output which turns out to be a PG in 'unknown' state and with 'up_primary' set to -1. The test subsequently fails because of this invalid osd id (-1). The logic in the test perhaps needs to pick the correct PG in the active+clean state i.e. pg 1.0 in this case. See logs below:
2022-06-13T17:15:18.882 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1667: wait_for_clean: get_num_pgs 2022-06-13T17:15:18.883 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1443: get_num_pgs: ceph --format json status 2022-06-13T17:15:18.883 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1443: get_num_pgs: jq .pgmap.num_pgs 2022-06-13T17:15:19.392 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1667: wait_for_clean: test 1 == 0 2022-06-13T17:15:19.393 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1671: wait_for_clean: true 2022-06-13T17:15:19.394 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1675: wait_for_clean: get_num_active_clean 2022-06-13T17:15:19.394 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1382: get_num_active_clean: local expression 2022-06-13T17:15:19.394 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1383: get_num_active_clean: expression+='select(contains("active") and contains("clean")) | ' 2022-06-13T17:15:19.395 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1384: get_num_active_clean: expression+='select(contains("stale") | not)' 2022-06-13T17:15:19.395 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1385: get_num_active_clean: ceph --format json pg dump pgs 2022-06-13T17:15:19.395 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1386: get_num_active_clean: jq '.pg_stats | [.[] | .state | select(contains("active") and contains("clean")) | select(contains("stale") | not)] | length' 2022-06-13T17:15:19.820 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1675: wait_for_clean: cur_active_clean=1 2022-06-13T17:15:19.822 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1676: wait_for_clean: get_num_pgs 2022-06-13T17:15:19.823 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1443: get_num_pgs: ceph --format json status 2022-06-13T17:15:19.823 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1443: get_num_pgs: jq .pgmap.num_pgs 2022-06-13T17:15:20.348 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1676: wait_for_clean: test 1 = 1 2022-06-13T17:15:20.349 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1676: wait_for_clean: break 2022-06-13T17:15:20.349 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1691: wait_for_clean: return 0 2022-06-13T17:15:20.351 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/divergent-priors.sh:657: TEST_divergent_3: ceph pg dump pgs --format=json 2022-06-13T17:15:20.351 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/divergent-priors.sh:657: TEST_divergent_3: jq '.pg_stats[0].up_primary' 2022-06-13T17:15:20.738 INFO:tasks.workunit.client.0.smithi121.stderr:dumped pgs 2022-06-13T17:15:20.747 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/divergent-priors.sh:657: TEST_divergent_3: local divergent=-1 2022-06-13T17:15:20.748 INFO:tasks.workunit.client.0.smithi121.stdout:primary and soon to be divergent is -1 2022-06-13T17:15:20.748 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/divergent-priors.sh:658: TEST_divergent_3: echo 'primary and soon to be divergent is -1' 2022-06-13T17:15:20.749 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/divergent-priors.sh:659: TEST_divergent_3: ceph pg dump pgs 2022-06-13T17:15:21.142 INFO:tasks.workunit.client.0.smithi121.stdout:PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN LAST_SCRUB_DURATION SCRUB_SCHEDULING OBJECTS_SCRUBBED OBJECTS_TRIMMED 2022-06-13T17:15:21.142 INFO:tasks.workunit.client.0.smithi121.stdout:1.1f 0 0 0 0 0 0 0 0 0 0 unknown 2022-06-13T17:15:19.879154+0000 0'0 0:0 [] -1 [] -1 0'0 2022-06-13T17:15:19.879154+0000 0'0 2022-06-13T17:15:19.879154+0000 0 0 -- 0 0 ... ... 2022-06-13T17:15:21.149 INFO:tasks.workunit.client.0.smithi121.stdout:1.0 0 0 0 0 0 0 0 0 0 0 active+clean 2022-06-13T17:15:08.240142+0000 0'0 28:12 [1,0,2] 1 [1,0,2] 1 0'0 2022-06-13T17:15:07.094983+0000 0'0 2022-06-13T17:15:07.094983+0000 0 0 periodic scrub scheduled @ 2022-06-15T01:14:09.779610+0000 0 0 2022-06-13T17:15:21.149 INFO:tasks.workunit.client.0.smithi121.stdout:1.d 0 0 0 0 0 0 0 0 0 0 unknown 2022-06-13T17:15:19.879154+0000 0'0 0:0 [] -1 [] -1 0'0 2022-06-13T17:15:19.879154+0000 0'0 2022-06-13T17:15:19.879154+0000 0 0 -- 0 0 ...
Other Observations:
During the creation of pools in the tests, the logs indicate that the pool already exists:
2022-06-13T17:05:55.189 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/divergent-priors.sh:75: TEST_divergent: create_pool test 1 1 2022-06-13T17:05:55.190 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:541: create_pool: ceph osd pool create test 1 1 2022-06-13T17:05:55.858 INFO:tasks.workunit.client.0.smithi121.stderr:pool 'test' already exists ...
But at the end of the test when the same pool is attempted to be deleted as part of the clean-up, the test reports that the pool doesn't exist!
2022-06-13T17:08:10.273 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/divergent-priors.sh:221: TEST_divergent: echo success 2022-06-13T17:08:10.273 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/divergent-priors.sh:223: TEST_divergent: delete_pool test 2022-06-13T17:08:10.273 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:546: delete_pool: local poolname=test 2022-06-13T17:08:10.273 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:547: delete_pool: ceph osd pool delete test test --yes-i-really-really-mean-it 2022-06-13T17:08:10.829 INFO:tasks.workunit.client.0.smithi121.stderr:pool 'test' does not exist ...