Project

General

Profile

Actions

Bug #38359

closed

osd-markdown.sh can fail with CLI_DUP_COMMAND=1

Added by Sage Weil about 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous,mimic,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2019-02-16T12:22:09.693 INFO:teuthology.orchestra.run.smithi138:> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=597cd0800d5525c39d588f536bfb01afed545bdb TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/ec-error-rollforward.sh
...
2019-02-16T14:19:28.360 INFO:tasks.workunit.client.0.smithi138.stderr:osd.0 is already down.
2019-02-16T14:19:28.382 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:49: markdown_N_impl:  sleep 10
2019-02-16T14:19:38.384 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:97: TEST_markdown_boot:  sleep 15
2019-02-16T14:19:53.386 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:98: TEST_markdown_boot:  ceph osd tree
2019-02-16T14:19:53.386 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:98: TEST_markdown_boot:  grep up
2019-02-16T14:19:53.387 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:98: TEST_markdown_boot:  grep osd.0
2019-02-16T14:19:53.998 INFO:tasks.workunit.client.0.smithi138.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-markdown.sh:98: TEST_markdown_boot:  return 1

on the osd,
2019-02-16 14:19:28.767 7fcafeb29700  0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.0 down, but it is still running
2019-02-16 14:19:28.767 7fcafeb29700  0 log_channel(cluster) log [DBG] : map e36 wrongly marked me down at e34
2019-02-16 14:19:28.767 7fcafeb29700  0 osd.0 36 _committed_osd_maps marked down 4 > osd_max_markdown_count 3 in last 120.000000 seconds, shutting down

becaues the client sent 2 down commands (due to teh dup), and the mon wasn't able to combine them (race condition), and the osd was marked down 4 times instead of 3.

workunit.py unconditionally sets the dup flag:

                    run.Raw('CEPH_CLI_TEST_DUP_COMMAND=1'),

but that will occasionally make the markdown test fail.

/a/kchai-2019-02-16_11:36:29-rados-wip-sage-testing-2019-02-16-1748-distro-basic-smithi/3601212


Related issues 4 (0 open4 closed)

Has duplicate RADOS - Bug #38356: standalone/osd/osd-markdown.sh fails in TEST_markdown_bootDuplicate02/16/2019

Actions
Copied to RADOS - Backport #38442: luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1ResolvedNathan CutlerActions
Copied to RADOS - Backport #38443: mimic: osd-markdown.sh can fail with CLI_DUP_COMMAND=1ResolvedNathan CutlerActions
Copied to RADOS - Backport #39275: nautilus: osd-markdown.sh can fail with CLI_DUP_COMMAND=1ResolvedNathan CutlerActions
Actions #1

Updated by Sage Weil about 5 years ago

  • Description updated (diff)
Actions #2

Updated by Sage Weil about 5 years ago

  • Status changed from 12 to Fix Under Review
Actions #3

Updated by Sage Weil about 5 years ago

/a/sage-2019-02-18_23:58:27-rados-wip-sage-testing-2019-02-18-1341-distro-basic-smithi/3609330

Actions #4

Updated by Sage Weil about 5 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Priority changed from Urgent to High
  • Backport set to luminous,mimic
Actions #5

Updated by Greg Farnum about 5 years ago

  • Has duplicate Bug #38356: standalone/osd/osd-markdown.sh fails in TEST_markdown_boot added
Actions #6

Updated by Nathan Cutler about 5 years ago

  • Copied to Backport #38442: luminous: osd-markdown.sh can fail with CLI_DUP_COMMAND=1 added
Actions #7

Updated by Nathan Cutler about 5 years ago

  • Copied to Backport #38443: mimic: osd-markdown.sh can fail with CLI_DUP_COMMAND=1 added
Actions #8

Updated by Sage Weil about 5 years ago

  • Backport changed from luminous,mimic to luminous,mimic,nautilus

npoe, that didn't fix it:
/a/sage-2019-04-10_15:25:57-rados-wip-sage4-testing-2019-04-10-0709-distro-basic-smithi/3830871

this should:
https://github.com/ceph/ceph/pull/27499

Actions #9

Updated by Sage Weil about 5 years ago

  • Status changed from Pending Backport to Fix Under Review
Actions #10

Updated by Sage Weil about 5 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #11

Updated by Nathan Cutler about 5 years ago

  • Copied to Backport #39275: nautilus: osd-markdown.sh can fail with CLI_DUP_COMMAND=1 added
Actions #12

Updated by Greg Farnum over 4 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF