Bug #9074
closedgitbuilder: make check does not complete, sometimes
0%
Description
It looks like i386 build fails because a timeout interrupts it before it gets a chance to complete.
It could be that the timeout is too short. If the i386 build machines are slower than the others, it would explain why it happens more on this build.
I've experienced that, locally, on master every now and then test.sh get stuck somewhere around # make sure everything gets back up+in.
Files
Updated by Loïc Dachary over 9 years ago
Updated by Loïc Dachary over 9 years ago
- File osd.0.log osd.0.log added
- Status changed from New to 12
- Assignee set to Loïc Dachary
- Priority changed from Normal to High
test.sh fails to complete (~50% of the time) when testing noup":https://github.com/ceph/ceph/blob/ea731ae14216bb479eff1f86ed6bd4a7cb71fb56/qa/workunits/cephtool/test.sh#L517 with the following trace:
.... pool 0 'rbd' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1 flags hashpspool stripe_width 0 max_osd 3 osd.0 down in weight 1 up_from 4 up_thru 108 down_at 140 last_clean_interval [0,0) 127.0.0.1:6800/31456 127.0.0.1:6801/31456 127.0.0.1:6802/31456 127.0.0.1:6803/31456 exists 5141c944-afcb-42b8-90d3-e7344a6fb169 osd.1 up in weight 1 up_from 8 up_thru 140 down_at 0 last_clean_interval [0,0) 127.0.0.1:6805/31667 127.0.0.1:6806/31667 127.0.0.1:6807/31667 127.0.0.1:6808/31667 exists,up 30553181-6a93-466b-9372-08baf202abd5 osd.2 up in weight 1 up_from 13 up_thru 140 down_at 0 last_clean_interval [0,0) 127.0.0.1:6810/31901 127.0.0.1:6811/31901 127.0.0.1:6812/31901 127.0.0.1:6813/31901 exists,up 23ab6473-d56c-4b9e-91f0-4f237e2bb7d0 test_mon_osd: 519: ceph osd dump test_mon_osd: 519: grep 'osd.0 down' *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** osd.0 down in weight 1 up_from 4 up_thru 108 down_at 140 last_clean_interval [0,0) 127.0.0.1:6800/31456 127.0.0.1:6801/31456 127.0.0.1:6802/31456 127.0.0.1:6803/31456 exists 5141c944-afcb-42b8-90d3-e7344a6fb169 test_mon_osd: 520: ceph osd unset noup *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** unset noup test_mon_osd: 521: (( i=0 )) test_mon_osd: 521: (( i < 100 )) test_mon_osd: 522: grep 'osd.0 up' test_mon_osd: 522: ceph osd dump *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** test_mon_osd: 523: echo 'waiting for osd.0 to come back up' waiting for osd.0 to come back up test_mon_osd: 524: sleep 10 test_mon_osd: 521: (( i++ )) test_mon_osd: 521: (( i < 100 )) test_mon_osd: 522: ceph osd dump test_mon_osd: 522: grep 'osd.0 up' ...
Attached are logs of the mon + osd 0 when it is ok and when it is not, for comparison.
Updated by Loïc Dachary over 9 years ago
Updated by Loïc Dachary over 9 years ago
Wrong diagnostic, the error is not from here. It loops while waiting for osds to come back up a few lines below I was confused because the error messages are similar
Updated by Loïc Dachary over 9 years ago
- Status changed from 12 to Duplicate
It happens because of http://tracker.ceph.com/issues/9096