Project

General

Profile

Bug #38359

osd-markdown.sh can fail with CLI_DUP_COMMAND=1

Added by Sage Weil about 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous,mimic,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2019-02-16T12:22:09.693 INFO:teuthology.orchestra.run.smithi138:> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=597cd0800d5525c39d588f536bfb01afed545bdb TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/ec-error-rollforward.sh
...
2019-02-16T14:19:28.360 INFO:tasks.workunit.client.0.smithi138.stderr:osd.0 is already down.
2019-02-16T14:19:28.382 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:49: markdown_N_impl:  sleep 10
2019-02-16T14:19:38.384 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:97: TEST_markdown_boot:  sleep 15
2019-02-16T14:19:53.386 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:98: TEST_markdown_boot:  ceph osd tree
2019-02-16T14:19:53.386 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:98: TEST_markdown_boot:  grep up
2019-02-16T14:19:53.387 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:98: TEST_markdown_boot:  grep osd.0
2019-02-16T14:19:53.998 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:98: TEST_markdown_boot:  return 1

on the osd,
2019-02-16 14:19:28.767 7fcafeb29700  0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.0 down, but it is still running
2019-02-16 14:19:28.767 7fcafeb29700  0 log_channel(cluster) log [DBG] : map e36 wrongly marked me down at e34
2019-02-16 14:19:28.767 7fcafeb29700  0 osd.0 36 _committed_osd_maps marked down 4 > osd_max_markdown_count 3 in last 120.000000 seconds, shutting down

becaues the client sent 2 down commands (due to teh dup), and the mon wasn't able to combine them (race condition), and the osd was marked down 4 times instead of 3.

workunit.py unconditionally sets the dup flag:

                    run.Raw('CEPH_CLI_TEST_DUP_COMMAND=1'),

but that will occasionally make the markdown test fail.

/a/kchai-2019-02-16_11:36:29-rados-wip-sage-testing-2019-02-16-1748-distro-basic-smithi/3601212


Related issues

Duplicated by RADOS - Bug #38356: standalone/osd/osd-markdown.sh fails in TEST_markdown_boot Duplicate 02/16/2019
Copied to RADOS - Backport #38442: luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1 Resolved
Copied to RADOS - Backport #38443: mimic: osd-markdown.sh can fail with CLI_DUP_COMMAND=1 Resolved
Copied to RADOS - Backport #39275: nautilus: osd-markdown.sh can fail with CLI_DUP_COMMAND=1 Resolved

History

#1 Updated by Sage Weil about 5 years ago

  • Description updated (diff)

#2 Updated by Sage Weil about 5 years ago

  • Status changed from 12 to Fix Under Review

#3 Updated by Sage Weil about 5 years ago

/a/sage-2019-02-18_23:58:27-rados-wip-sage-testing-2019-02-18-1341-distro-basic-smithi/3609330

#4 Updated by Sage Weil about 5 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Priority changed from Urgent to High
  • Backport set to luminous,mimic

#5 Updated by Greg Farnum about 5 years ago

  • Duplicated by Bug #38356: standalone/osd/osd-markdown.sh fails in TEST_markdown_boot added

#6 Updated by Nathan Cutler about 5 years ago

  • Copied to Backport #38442: luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1 added

#7 Updated by Nathan Cutler about 5 years ago

  • Copied to Backport #38443: mimic: osd-markdown.sh can fail with CLI_DUP_COMMAND=1 added

#8 Updated by Sage Weil almost 5 years ago

  • Backport changed from luminous,mimic to luminous,mimic,nautilus

npoe, that didn't fix it:
/a/sage-2019-04-10_15:25:57-rados-wip-sage4-testing-2019-04-10-0709-distro-basic-smithi/3830871

this should:
https://github.com/ceph/ceph/pull/27499

#9 Updated by Sage Weil almost 5 years ago

  • Status changed from Pending Backport to Fix Under Review

#10 Updated by Sage Weil almost 5 years ago

  • Status changed from Fix Under Review to Pending Backport

#11 Updated by Nathan Cutler almost 5 years ago

  • Copied to Backport #39275: nautilus: osd-markdown.sh can fail with CLI_DUP_COMMAND=1 added

#12 Updated by Greg Farnum over 4 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF