Bug #7356
closedKill all while loops that will never end....
0%
Description
Ok maybe with the one exception of one of mine that is for VPS creation... If the host machine is down then it will just go forever until someone fixes the host machine and I think hung jobs in that situation is good (forcing someone to fix the issue) rather than having runs keep failing while trying to use the vps that will never come up. This doesn't happen often but has on occasion. Just figured I should precursor with that so that loop doesn't get 'fixed' =)
Example of current problem:
2014-02-05T15:03:54.747 DEBUG:teuthology.orchestra.run:Running [10.214.138.168]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2014-02-05T15:03:56.789 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 13 pgs down; 13 pgs peering; 13 pgs stuck inactive; 13 pgs stuck unclean; 3 requests are blocked > 32 sec; mds cluster is degraded
2014-02-05T15:03:57.789 DEBUG:teuthology.orchestra.run:Running [10.214.138.168]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2014-02-05T15:04:09.061 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 13 pgs down; 13 pgs peering; 13 pgs stuck inactive; 13 pgs stuck unclean; 3 requests are blocked > 32 sec; mds cluster is degraded
2014-02-05T15:04:10.062 DEBUG:teuthology.orchestra.run:Running [10.214.138.168]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2014-02-05T15:04:17.581 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 13 pgs down; 13 pgs peering; 13 pgs stuck inactive; 13 pgs stuck unclean; 3 requests are blocked > 32 sec; mds cluster is degraded
2014-02-05T15:04:18.581 DEBUG:teuthology.orchestra.run:Running [10.214.138.168]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2014-02-05T15:04:37.215 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 13 pgs down; 13 pgs peering; 13 pgs stuck inactive; 13 pgs stuck unclean; 3 requests are blocked > 32 sec; mds cluster is degraded
2014-02-05T15:04:38.215 DEBUG:teuthology.orchestra.run:Running [10.214.138.168]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
for hours and hours in the logs as it never recovers... it should eventually just fail.
Logs:
/var/lib/teuthworker/archive/teuthology-2014-02-04_02:30:01-upgrade:fs-next-testing-basic-vps/66668
/var/lib/teuthworker/archive/teuthology-2014-02-04_02:30:01-upgrade:fs-next-testing-basic-vps/66737