Project

General

Profile

Actions

Bug #9620

closed

tests: qa/workunits/cephtool/test.sh race condition

Added by Loïc Dachary over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
qa
Target version:
-
% Done:

100%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

osd are marked down and a loop checking there are no osd down immediately follows and uses osd dump. The following happened:

 test_mon_osd: 600: ceph osd dump
 test_mon_osd: 600: grep 'osd.0 up'
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
osd.0 up   in  weight 1 up_from 143 up_thru 143 down_at 140 last_clean_interval [6,142) 127.0.0.1:6800/17838 127.0.0.1:6815/1017838 127.0.0.1:6816/1017838 127.0.0.1:6817/1017838 exists,up 16d58ecc-f79f-43cd-ad7f-074cc384e12b
 test_mon_osd: 602: ceph osd thrash 10
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
will thrash map for 10 epochs
  test_mon_osd: 603: seq 0 31
 test_mon_osd: 603: ceph osd down 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
marked down osd.0. osd.1 is already down. osd.2 is already down. osd.3 does not exist. osd.4 does not exist. osd.5 does not exist. osd.6 does not exist. osd.7 does not exist. osd.8 does not exist. osd.9 does not exist. osd.10 does not exist. osd.11 does not exist. osd.12 does not exist. osd.13 does not exist. osd.14 does not exist. osd.15 does not exist. osd.16 does not exist. osd.17 does not exist. osd.18 does not exist. osd.19 does not exist. osd.20 does not exist. osd.21 does not exist. osd.22 does not exist. osd.23 does not exist. osd.24 does not exist. osd.25 does not exist. osd.26 does not exist. osd.27 does not exist. osd.28 does not exist. osd.29 does not exist. osd.30 does not exist. osd.31 does not exist. 
 test_mon_osd: 604: wait_no_osd_down
  wait_no_osd_down: 15: seq 1 300
 wait_no_osd_down: 15: for i in '$(seq 1 300)'
 wait_no_osd_down: 16: check_no_osd_down
 check_no_osd_down: 10: ceph osd dump
 check_no_osd_down: 10: grep ' down '
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
osd.0 down out weight 0 up_from 143 up_thru 145 down_at 147 last_clean_interval [6,142) 127.0.0.1:6800/17838 127.0.0.1:6815/1017838 127.0.0.1:6816/1017838 127.0.0.1:6817/1017838 exists 16d58ecc-f79f-43cd-ad7f-074cc384e12b
osd.2 down in  weight 1 up_from 12 up_thru 143 down_at 146 last_clean_interval [0,0) 127.0.0.1:6810/18282 127.0.0.1:6811/18282 127.0.0.1:6812/18282 127.0.0.1:6813/18282 exists c9d035f4-f848-45fd-8f56-16d5935d2d49
 wait_no_osd_down: 17: echo 'waiting for osd(s) to come back up'
waiting for osd(s) to come back up
 wait_no_osd_down: 18: sleep 1
 wait_no_osd_down: 15: for i in '$(seq 1 300)'
 wait_no_osd_down: 16: check_no_osd_down
 check_no_osd_down: 10: ceph osd dump
 check_no_osd_down: 10: grep ' down '
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
osd.0 down out weight 0 up_from 143 up_thru 145 down_at 147 last_clean_interval [6,142) 127.0.0.1:6800/17838 127.0.0.1:6815/1017838 127.0.0.1:6816/1017838 127.0.0.1:6817/1017838 exists 16d58ecc-f79f-43cd-ad7f-074cc384e12b
osd.1 down in  weight 1 up_from 148 up_thru 148 down_at 150 last_clean_interval [0,0) :/0 :/0 :/0 :/0 exists 4d383cb1-db68-4fa1-a94b-3f8a9931943c
osd.2 down out weight 0 up_from 149 up_thru 149 down_at 150 last_clean_interval [0,0) :/0 :/0 :/0 :/0 exists c9d035f4-f848-45fd-8f56-16d5935d2d49
 wait_no_osd_down: 17: echo 'waiting for osd(s) to come back up'
waiting for osd(s) to come back up
 wait_no_osd_down: 18: sleep 1
 wait_no_osd_down: 15: for i in '$(seq 1 300)'
 wait_no_osd_down: 16: check_no_osd_down
 check_no_osd_down: 10: ceph osd dump
 check_no_osd_down: 10: grep ' down '
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
 wait_no_osd_down: 20: break
 wait_no_osd_down: 23: check_no_osd_down
 check_no_osd_down: 10: ceph osd dump
 check_no_osd_down: 10: grep ' down '
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
osd.2 down in  weight 1 up_from 151 up_thru 151 down_at 155 last_clean_interval [0,0) :/0 :/0 :/0 :/0 exists c9d035f4-f848-45fd-8f56-16d5935d2d49

Actions #1

Updated by Loïc Dachary over 9 years ago

The following sequence happens:

  • ceph osd dump finds 3 osd "down"
  • ceph osd dump finds no osd "down"
  • ceph osd dump finds one osd "down"

could it be a side effect of ceph osd thrash 10 that happened a few lines above ?

Actions #2

Updated by Loïc Dachary over 9 years ago

  • Status changed from New to 12
  • Assignee set to Loïc Dachary

The ceph osd thrash command will randomly mark osds down and up which explains the above.

Actions #3

Updated by Loïc Dachary over 9 years ago

  • Status changed from 12 to Fix Under Review
  • % Done changed from 0 to 80
Actions #4

Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #5

Updated by Loïc Dachary over 9 years ago

  • Status changed from Pending Backport to Fix Under Review
Actions #6

Updated by Loïc Dachary over 9 years ago

gitbuilder running

Actions #7

Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Resolved

i jumped the gun and merged, oops!

Actions #8

Updated by Loïc Dachary over 9 years ago

  • Status changed from Resolved to 7
Actions #9

Updated by Loïc Dachary over 9 years ago

I will verify the result when they are ready but I'm not too concerned ;-)

Actions #10

Updated by Loïc Dachary over 9 years ago

  • Status changed from 7 to Resolved
  • % Done changed from 80 to 100
Actions

Also available in: Atom PDF