Project

General

Profile

Actions

Bug #56034

closed

qa/standalone/osd/divergent-priors.sh fails in test TEST_divergent_3()

Added by Sridhar Seshasayee almost 2 years ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Category:
Tests
Target version:
% Done:

100%

Source:
Tags:
backport_processed
Backport:
reef,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2022-06-13_16:36:31-rados-wip-yuri7-testing-2022-06-13-0706-distro-default-smithi/6876516

Also historically seen in:
/a/yuriw-2022-03-18_00:42:20-rados-wip-yuri6-testing-2022-03-17-1547-distro-default-smithi/6743671

Failure Reason:

Command failed (workunit test osd/divergent-priors.sh) on smithi121 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=8f8ee34a603512f9baef4f6eff0e1fc5deb2cda4 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/divergent-priors.sh'


Logs of Interest:

During the beginning of test TEST_divergent_3(), after the pool 'test' is created and the 'wait_for_clean' stage, the 'pg dump' output shows a bunch of PGs in unknown state with 'up_primary' field is set to -1. The current test code just picks the first PG from the pg dump output which turns out to be a PG in 'unknown' state and with 'up_primary' set to -1. The test subsequently fails because of this invalid osd id (-1). The logic in the test perhaps needs to pick the correct PG in the active+clean state i.e. pg 1.0 in this case. See logs below:

2022-06-13T17:15:18.882 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1667: wait_for_clean:  get_num_pgs
2022-06-13T17:15:18.883 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1443: get_num_pgs:  ceph --format json status
2022-06-13T17:15:18.883 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1443: get_num_pgs:  jq .pgmap.num_pgs
2022-06-13T17:15:19.392 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1667: wait_for_clean:  test 1 == 0
2022-06-13T17:15:19.393 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1671: wait_for_clean:  true
2022-06-13T17:15:19.394 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1675: wait_for_clean:  get_num_active_clean
2022-06-13T17:15:19.394 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1382: get_num_active_clean:  local expression
2022-06-13T17:15:19.394 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1383: get_num_active_clean:  expression+='select(contains("active") and contains("clean")) | '
2022-06-13T17:15:19.395 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1384: get_num_active_clean:  expression+='select(contains("stale") | not)'
2022-06-13T17:15:19.395 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1385: get_num_active_clean:  ceph --format json pg dump pgs
2022-06-13T17:15:19.395 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1386: get_num_active_clean:  jq '.pg_stats | [.[] | .state | select(contains("active") and contains("clean")) | select(contains("stale") | not)] | length'
2022-06-13T17:15:19.820 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1675: wait_for_clean:  cur_active_clean=1
2022-06-13T17:15:19.822 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1676: wait_for_clean:  get_num_pgs
2022-06-13T17:15:19.823 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1443: get_num_pgs:  ceph --format json status
2022-06-13T17:15:19.823 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1443: get_num_pgs:  jq .pgmap.num_pgs
2022-06-13T17:15:20.348 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1676: wait_for_clean:  test 1 = 1
2022-06-13T17:15:20.349 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1676: wait_for_clean:  break
2022-06-13T17:15:20.349 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:1691: wait_for_clean:  return 0
2022-06-13T17:15:20.351 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/divergent-priors.sh:657: TEST_divergent_3:  ceph pg dump pgs --format=json
2022-06-13T17:15:20.351 INFO:tasks.workunit.client.0.smithi121.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/divergent-priors.sh:657: TEST_divergent_3:  jq '.pg_stats[0].up_primary'
2022-06-13T17:15:20.738 INFO:tasks.workunit.client.0.smithi121.stderr:dumped pgs
2022-06-13T17:15:20.747 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/divergent-priors.sh:657: TEST_divergent_3:  local divergent=-1
2022-06-13T17:15:20.748 INFO:tasks.workunit.client.0.smithi121.stdout:primary and soon to be divergent is -1
2022-06-13T17:15:20.748 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/divergent-priors.sh:658: TEST_divergent_3:  echo 'primary and soon to be divergent is -1'
2022-06-13T17:15:20.749 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/divergent-priors.sh:659: TEST_divergent_3:  ceph pg dump pgs
2022-06-13T17:15:21.142 INFO:tasks.workunit.client.0.smithi121.stdout:PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED  UNFOUND  BYTES  OMAP_BYTES*  OMAP_KEYS*  LOG  DISK_LOG  STATE         STATE_STAMP                      VERSION  REPORTED  UP       UP_PRIMARY  ACTING   ACTING_PRIMARY  LAST_SCRUB  SCRUB_STAMP                      LAST_DEEP_SCRUB  DEEP_SCRUB_STAMP                 SNAPTRIMQ_LEN  LAST_SCRUB_DURATION  SCRUB_SCHEDULING                                            OBJECTS_SCRUBBED  OBJECTS_TRIMMED
2022-06-13T17:15:21.142 INFO:tasks.workunit.client.0.smithi121.stdout:1.1f           0                   0         0          0        0      0            0           0    0         0       unknown  2022-06-13T17:15:19.879154+0000      0'0       0:0       []          -1       []              -1         0'0  2022-06-13T17:15:19.879154+0000              0'0  2022-06-13T17:15:19.879154+0000              0                    0  --                                                                         0                0

...
...

2022-06-13T17:15:21.149 INFO:tasks.workunit.client.0.smithi121.stdout:1.0            0                   0         0          0        0      0            0           0    0         0  active+clean  2022-06-13T17:15:08.240142+0000      0'0     28:12  [1,0,2]           1  [1,0,2]               1         0'0  2022-06-13T17:15:07.094983+0000              0'0  2022-06-13T17:15:07.094983+0000              0                    0  periodic scrub scheduled @ 2022-06-15T01:14:09.779610+0000                 0                0
2022-06-13T17:15:21.149 INFO:tasks.workunit.client.0.smithi121.stdout:1.d            0                   0         0          0        0      0            0           0    0         0       unknown  2022-06-13T17:15:19.879154+0000      0'0       0:0       []          -1       []              -1         0'0  2022-06-13T17:15:19.879154+0000              0'0  2022-06-13T17:15:19.879154+0000              0                    0  --                                                                         0                0

...

Other Observations:

During the creation of pools in the tests, the logs indicate that the pool already exists:

2022-06-13T17:05:55.189 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/divergent-priors.sh:75: TEST_divergent:  create_pool test 1 1
2022-06-13T17:05:55.190 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:541: create_pool:  ceph osd pool create test 1 1
2022-06-13T17:05:55.858 INFO:tasks.workunit.client.0.smithi121.stderr:pool 'test' already exists
...

But at the end of the test when the same pool is attempted to be deleted as part of the clean-up, the test reports that the pool doesn't exist!

2022-06-13T17:08:10.273 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/divergent-priors.sh:221: TEST_divergent:  echo success
2022-06-13T17:08:10.273 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/divergent-priors.sh:223: TEST_divergent:  delete_pool test
2022-06-13T17:08:10.273 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:546: delete_pool:  local poolname=test
2022-06-13T17:08:10.273 INFO:tasks.workunit.client.0.smithi121.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:547: delete_pool:  ceph osd pool delete test test --yes-i-really-really-mean-it
2022-06-13T17:08:10.829 INFO:tasks.workunit.client.0.smithi121.stderr:pool 'test' does not exist
...

Related issues 2 (0 open2 closed)

Copied to RADOS - Backport #62251: reef: qa/standalone/osd/divergent-priors.sh fails in test TEST_divergent_3()ResolvedNitzan MordechaiActions
Copied to RADOS - Backport #62252: quincy: qa/standalone/osd/divergent-priors.sh fails in test TEST_divergent_3()ResolvedNitzan MordechaiActions
Actions

Also available in: Atom PDF