Bug #38359
osd-markdown.sh can fail with CLI_DUP_COMMAND=1
Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
luminous,mimic,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2019-02-16T12:22:09.693 INFO:teuthology.orchestra.run.smithi138:> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=597cd0800d5525c39d588f536bfb01afed545bdb TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/ec-error-rollforward.sh ... 2019-02-16T14:19:28.360 INFO:tasks.workunit.client.0.smithi138.stderr:osd.0 is already down. 2019-02-16T14:19:28.382 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:49: markdown_N_impl: sleep 10 2019-02-16T14:19:38.384 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:97: TEST_markdown_boot: sleep 15 2019-02-16T14:19:53.386 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:98: TEST_markdown_boot: ceph osd tree 2019-02-16T14:19:53.386 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:98: TEST_markdown_boot: grep up 2019-02-16T14:19:53.387 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:98: TEST_markdown_boot: grep osd.0 2019-02-16T14:19:53.998 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:98: TEST_markdown_boot: return 1
on the osd,
2019-02-16 14:19:28.767 7fcafeb29700 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.0 down, but it is still running 2019-02-16 14:19:28.767 7fcafeb29700 0 log_channel(cluster) log [DBG] : map e36 wrongly marked me down at e34 2019-02-16 14:19:28.767 7fcafeb29700 0 osd.0 36 _committed_osd_maps marked down 4 > osd_max_markdown_count 3 in last 120.000000 seconds, shutting down
becaues the client sent 2 down commands (due to teh dup), and the mon wasn't able to combine them (race condition), and the osd was marked down 4 times instead of 3.
workunit.py unconditionally sets the dup flag:
run.Raw('CEPH_CLI_TEST_DUP_COMMAND=1'),
but that will occasionally make the markdown test fail.
/a/kchai-2019-02-16_11:36:29-rados-wip-sage-testing-2019-02-16-1748-distro-basic-smithi/3601212
Related issues
History
#1 Updated by Sage Weil about 5 years ago
- Description updated (diff)
#2 Updated by Sage Weil about 5 years ago
- Status changed from 12 to Fix Under Review
#3 Updated by Sage Weil about 5 years ago
/a/sage-2019-02-18_23:58:27-rados-wip-sage-testing-2019-02-18-1341-distro-basic-smithi/3609330
#4 Updated by Sage Weil about 5 years ago
- Status changed from Fix Under Review to Pending Backport
- Priority changed from Urgent to High
- Backport set to luminous,mimic
#5 Updated by Greg Farnum about 5 years ago
- Duplicated by Bug #38356: standalone/osd/osd-markdown.sh fails in TEST_markdown_boot added
#6 Updated by Nathan Cutler about 5 years ago
- Copied to Backport #38442: luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1 added
#7 Updated by Nathan Cutler about 5 years ago
- Copied to Backport #38443: mimic: osd-markdown.sh can fail with CLI_DUP_COMMAND=1 added
#8 Updated by Sage Weil almost 5 years ago
- Backport changed from luminous,mimic to luminous,mimic,nautilus
npoe, that didn't fix it:
/a/sage-2019-04-10_15:25:57-rados-wip-sage4-testing-2019-04-10-0709-distro-basic-smithi/3830871
this should:
https://github.com/ceph/ceph/pull/27499
#9 Updated by Sage Weil almost 5 years ago
- Status changed from Pending Backport to Fix Under Review
#10 Updated by Sage Weil almost 5 years ago
- Status changed from Fix Under Review to Pending Backport
#11 Updated by Nathan Cutler almost 5 years ago
- Copied to Backport #39275: nautilus: osd-markdown.sh can fail with CLI_DUP_COMMAND=1 added
#12 Updated by Greg Farnum over 4 years ago
- Status changed from Pending Backport to Resolved