Project

General

Profile

Bug #9620

tests: qa/workunits/cephtool/test.sh race condition

Added by Loïc Dachary almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
qa
Target version:
-
% Done:

100%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

osd are marked down and a loop checking there are no osd down immediately follows and uses osd dump. The following happened:

 test_mon_osd: 600: ceph osd dump
 test_mon_osd: 600: grep 'osd.0 up'
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
osd.0 up   in  weight 1 up_from 143 up_thru 143 down_at 140 last_clean_interval [6,142) 127.0.0.1:6800/17838 127.0.0.1:6815/1017838 127.0.0.1:6816/1017838 127.0.0.1:6817/1017838 exists,up 16d58ecc-f79f-43cd-ad7f-074cc384e12b
 test_mon_osd: 602: ceph osd thrash 10
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
will thrash map for 10 epochs
  test_mon_osd: 603: seq 0 31
 test_mon_osd: 603: ceph osd down 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
marked down osd.0. osd.1 is already down. osd.2 is already down. osd.3 does not exist. osd.4 does not exist. osd.5 does not exist. osd.6 does not exist. osd.7 does not exist. osd.8 does not exist. osd.9 does not exist. osd.10 does not exist. osd.11 does not exist. osd.12 does not exist. osd.13 does not exist. osd.14 does not exist. osd.15 does not exist. osd.16 does not exist. osd.17 does not exist. osd.18 does not exist. osd.19 does not exist. osd.20 does not exist. osd.21 does not exist. osd.22 does not exist. osd.23 does not exist. osd.24 does not exist. osd.25 does not exist. osd.26 does not exist. osd.27 does not exist. osd.28 does not exist. osd.29 does not exist. osd.30 does not exist. osd.31 does not exist. 
 test_mon_osd: 604: wait_no_osd_down
  wait_no_osd_down: 15: seq 1 300
 wait_no_osd_down: 15: for i in '$(seq 1 300)'
 wait_no_osd_down: 16: check_no_osd_down
 check_no_osd_down: 10: ceph osd dump
 check_no_osd_down: 10: grep ' down '
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
osd.0 down out weight 0 up_from 143 up_thru 145 down_at 147 last_clean_interval [6,142) 127.0.0.1:6800/17838 127.0.0.1:6815/1017838 127.0.0.1:6816/1017838 127.0.0.1:6817/1017838 exists 16d58ecc-f79f-43cd-ad7f-074cc384e12b
osd.2 down in  weight 1 up_from 12 up_thru 143 down_at 146 last_clean_interval [0,0) 127.0.0.1:6810/18282 127.0.0.1:6811/18282 127.0.0.1:6812/18282 127.0.0.1:6813/18282 exists c9d035f4-f848-45fd-8f56-16d5935d2d49
 wait_no_osd_down: 17: echo 'waiting for osd(s) to come back up'
waiting for osd(s) to come back up
 wait_no_osd_down: 18: sleep 1
 wait_no_osd_down: 15: for i in '$(seq 1 300)'
 wait_no_osd_down: 16: check_no_osd_down
 check_no_osd_down: 10: ceph osd dump
 check_no_osd_down: 10: grep ' down '
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
osd.0 down out weight 0 up_from 143 up_thru 145 down_at 147 last_clean_interval [6,142) 127.0.0.1:6800/17838 127.0.0.1:6815/1017838 127.0.0.1:6816/1017838 127.0.0.1:6817/1017838 exists 16d58ecc-f79f-43cd-ad7f-074cc384e12b
osd.1 down in  weight 1 up_from 148 up_thru 148 down_at 150 last_clean_interval [0,0) :/0 :/0 :/0 :/0 exists 4d383cb1-db68-4fa1-a94b-3f8a9931943c
osd.2 down out weight 0 up_from 149 up_thru 149 down_at 150 last_clean_interval [0,0) :/0 :/0 :/0 :/0 exists c9d035f4-f848-45fd-8f56-16d5935d2d49
 wait_no_osd_down: 17: echo 'waiting for osd(s) to come back up'
waiting for osd(s) to come back up
 wait_no_osd_down: 18: sleep 1
 wait_no_osd_down: 15: for i in '$(seq 1 300)'
 wait_no_osd_down: 16: check_no_osd_down
 check_no_osd_down: 10: ceph osd dump
 check_no_osd_down: 10: grep ' down '
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
 wait_no_osd_down: 20: break
 wait_no_osd_down: 23: check_no_osd_down
 check_no_osd_down: 10: ceph osd dump
 check_no_osd_down: 10: grep ' down '
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
osd.2 down in  weight 1 up_from 151 up_thru 151 down_at 155 last_clean_interval [0,0) :/0 :/0 :/0 :/0 exists c9d035f4-f848-45fd-8f56-16d5935d2d49

Associated revisions

Revision beade63a (diff)
Added by Loic Dachary almost 9 years ago

qa/workunits/cephtool/test.sh: fix thrash (ultimate)

Keep the osd trash test to ensure it is a valid command but make it a
noop by giving it a zero argument (meaning thrash 0 OSD maps).

Remove the loops that were added after the command in an attempt to wait
for the cluster to recover and not pollute the rest of the tests. Actual
testing of osd thrash would require a dedicated cluster because it the
side effects are random and it is unnecessarily difficult to ensure they
are finished.

http://tracker.ceph.com/issues/9620 Fixes: #9620

Signed-off-by: Loic Dachary <>

Revision 76341b0b (diff)
Added by Loic Dachary almost 9 years ago

qa/workunits/cephtool/test.sh: fix thrash (ultimate)

Keep the osd trash test to ensure it is a valid command but make it a
noop by giving it a zero argument (meaning thrash 0 OSD maps).

Remove the loops that were added after the command in an attempt to wait
for the cluster to recover and not pollute the rest of the tests. Actual
testing of osd thrash would require a dedicated cluster because it the
side effects are random and it is unnecessarily difficult to ensure they
are finished.

http://tracker.ceph.com/issues/9620 Fixes: #9620

Signed-off-by: Loic Dachary <>
(cherry picked from commit beade63a17db2e6fc68d1f55332d602f8f7cb93a)

Conflicts:
qa/workunits/cephtool/test.sh

History

#1 Updated by Loïc Dachary almost 9 years ago

The following sequence happens:

  • ceph osd dump finds 3 osd "down"
  • ceph osd dump finds no osd "down"
  • ceph osd dump finds one osd "down"

could it be a side effect of ceph osd thrash 10 that happened a few lines above ?

#2 Updated by Loïc Dachary almost 9 years ago

  • Status changed from New to 12
  • Assignee set to Loïc Dachary

The ceph osd thrash command will randomly mark osds down and up which explains the above.

#3 Updated by Loïc Dachary almost 9 years ago

  • Status changed from 12 to Fix Under Review
  • % Done changed from 0 to 80

#4 Updated by Sage Weil almost 9 years ago

  • Status changed from Fix Under Review to Pending Backport

#5 Updated by Loïc Dachary almost 9 years ago

  • Status changed from Pending Backport to Fix Under Review

#6 Updated by Loïc Dachary almost 9 years ago

gitbuilder running

#7 Updated by Sage Weil almost 9 years ago

  • Status changed from Fix Under Review to Resolved

i jumped the gun and merged, oops!

#8 Updated by Loïc Dachary almost 9 years ago

  • Status changed from Resolved to 7

#9 Updated by Loïc Dachary almost 9 years ago

I will verify the result when they are ready but I'm not too concerned ;-)

#10 Updated by Loïc Dachary almost 9 years ago

  • Status changed from 7 to Resolved
  • % Done changed from 80 to 100

Also available in: Atom PDF