Project

General

Profile

Actions

Bug #9074

closed

gitbuilder: make check does not complete, sometimes

Added by Loïc Dachary over 9 years ago. Updated over 9 years ago.

Status:
Duplicate
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

It looks like i386 build fails because a timeout interrupts it before it gets a chance to complete.

It could be that the timeout is too short. If the i386 build machines are slower than the others, it would explain why it happens more on this build.

I've experienced that, locally, on master every now and then test.sh get stuck somewhere around # make sure everything gets back up+in.

http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=5808d6a6a514a7c7e9cd094a0e047585ac66a161


Files

osd.0.log (448 KB) osd.0.log GOOD osd.0.log Loïc Dachary, 08/13/2014 01:02 AM
mon.a.log (3.78 MB) mon.a.log GOOD mon.a.log Loïc Dachary, 08/13/2014 01:03 AM
osd.0.log (403 KB) osd.0.log BAD osd.0.log Loïc Dachary, 08/13/2014 01:03 AM
mon.a.log (3.54 MB) mon.a.log BAD mon.a.log Loïc Dachary, 08/13/2014 01:03 AM
Actions #2

Updated by Loïc Dachary over 9 years ago

  • File osd.0.log osd.0.log added
  • Status changed from New to 12
  • Assignee set to Loïc Dachary
  • Priority changed from Normal to High

test.sh fails to complete (~50% of the time) when testing noup":https://github.com/ceph/ceph/blob/ea731ae14216bb479eff1f86ed6bd4a7cb71fb56/qa/workunits/cephtool/test.sh#L517 with the following trace:

....
pool 0 'rbd' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1 flags hashpspool stripe_width 0
max_osd 3
osd.0 down in  weight 1 up_from 4 up_thru 108 down_at 140 last_clean_interval [0,0) 127.0.0.1:6800/31456 127.0.0.1:6801/31456 127.0.0.1:6802/31456 127.0.0.1:6803/31456 exists 5141c944-afcb-42b8-90d3-e7344a6fb169
osd.1 up   in  weight 1 up_from 8 up_thru 140 down_at 0 last_clean_interval [0,0) 127.0.0.1:6805/31667 127.0.0.1:6806/31667 127.0.0.1:6807/31667 127.0.0.1:6808/31667 exists,up 30553181-6a93-466b-9372-08baf202abd5
osd.2 up   in  weight 1 up_from 13 up_thru 140 down_at 0 last_clean_interval [0,0) 127.0.0.1:6810/31901 127.0.0.1:6811/31901 127.0.0.1:6812/31901 127.0.0.1:6813/31901 exists,up 23ab6473-d56c-4b9e-91f0-4f237e2bb7d0
 test_mon_osd: 519: ceph osd dump
 test_mon_osd: 519: grep 'osd.0 down'
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
osd.0 down in  weight 1 up_from 4 up_thru 108 down_at 140 last_clean_interval [0,0) 127.0.0.1:6800/31456 127.0.0.1:6801/31456 127.0.0.1:6802/31456 127.0.0.1:6803/31456 exists 5141c944-afcb-42b8-90d3-e7344a6fb169
 test_mon_osd: 520: ceph osd unset noup
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
unset noup
 test_mon_osd: 521: (( i=0 ))
 test_mon_osd: 521: (( i < 100 ))
 test_mon_osd: 522: grep 'osd.0 up'
 test_mon_osd: 522: ceph osd dump
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
 test_mon_osd: 523: echo 'waiting for osd.0 to come back up'
waiting for osd.0 to come back up
 test_mon_osd: 524: sleep 10
 test_mon_osd: 521: (( i++ ))
 test_mon_osd: 521: (( i < 100 ))
 test_mon_osd: 522: ceph osd dump
 test_mon_osd: 522: grep 'osd.0 up'
...

Attached are logs of the mon + osd 0 when it is ok and when it is not, for comparison.

Actions #4

Updated by Loïc Dachary over 9 years ago

Wrong diagnostic, the error is not from here. It loops while waiting for osds to come back up a few lines below I was confused because the error messages are similar

Actions #5

Updated by Loïc Dachary over 9 years ago

  • Status changed from 12 to Duplicate
Actions

Also available in: Atom PDF