Project

General

Profile

Bug #38359

Updated by Sage Weil about 5 years ago

<pre>
2019-02-16T12:22:09.693 INFO:teuthology.orchestra.run.smithi138:> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=597cd0800d5525c39d588f536bfb01afed545bdb TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/ec-error-rollforward.sh
...
2019-02-16T14:19:28.360 INFO:tasks.workunit.client.0.smithi138.stderr:osd.0 is already down.
2019-02-16T14:19:28.382 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:49: markdown_N_impl: sleep 10
2019-02-16T14:19:38.384 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:97: TEST_markdown_boot: sleep 15
2019-02-16T14:19:53.386 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:98: TEST_markdown_boot: ceph osd tree
2019-02-16T14:19:53.386 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:98: TEST_markdown_boot: grep up
2019-02-16T14:19:53.387 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:98: TEST_markdown_boot: grep osd.0
2019-02-16T14:19:53.998 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:98: TEST_markdown_boot: return 1
</pre>
on the osd,
<pre>
2019-02-16 14:19:28.767 7fcafeb29700 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.0 down, but it is still running
2019-02-16 14:19:28.767 7fcafeb29700 0 log_channel(cluster) log [DBG] : map e36 wrongly marked me down at e34
2019-02-16 14:19:28.767 7fcafeb29700 0 osd.0 36 _committed_osd_maps marked down 4 > osd_max_markdown_count 3 in last 120.000000 seconds, shutting down
</pre>
becaues the client sent *2* down commands (due to teh dup), and the mon wasn't able to combine them (race condition), and the osd was marked down 4 times instead of 3.

workunit.py unconditionally sets the dup flag:
<pre>
run.Raw('CEPH_CLI_TEST_DUP_COMMAND=1'),
</pre>
but that will occasionally make the markdown test fail.

/a/kchai-2019-02-16_11:36:29-rados-wip-sage-testing-2019-02-16-1748-distro-basic-smithi/3601212

Back