Project

General

Profile

Bug #7356

Kill all while loops that will never end....

Added by Sandon Van Ness about 10 years ago. Updated almost 6 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Ok maybe with the one exception of one of mine that is for VPS creation... If the host machine is down then it will just go forever until someone fixes the host machine and I think hung jobs in that situation is good (forcing someone to fix the issue) rather than having runs keep failing while trying to use the vps that will never come up. This doesn't happen often but has on occasion. Just figured I should precursor with that so that loop doesn't get 'fixed' =)

Example of current problem:

2014-02-05T15:03:54.747 DEBUG:teuthology.orchestra.run:Running [10.214.138.168]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2014-02-05T15:03:56.789 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 13 pgs down; 13 pgs peering; 13 pgs stuck inactive; 13 pgs stuck unclean; 3 requests are blocked > 32 sec; mds cluster is degraded
2014-02-05T15:03:57.789 DEBUG:teuthology.orchestra.run:Running [10.214.138.168]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2014-02-05T15:04:09.061 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 13 pgs down; 13 pgs peering; 13 pgs stuck inactive; 13 pgs stuck unclean; 3 requests are blocked > 32 sec; mds cluster is degraded
2014-02-05T15:04:10.062 DEBUG:teuthology.orchestra.run:Running [10.214.138.168]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2014-02-05T15:04:17.581 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 13 pgs down; 13 pgs peering; 13 pgs stuck inactive; 13 pgs stuck unclean; 3 requests are blocked > 32 sec; mds cluster is degraded
2014-02-05T15:04:18.581 DEBUG:teuthology.orchestra.run:Running [10.214.138.168]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'
2014-02-05T15:04:37.215 DEBUG:teuthology.misc:Ceph health: HEALTH_WARN 13 pgs down; 13 pgs peering; 13 pgs stuck inactive; 13 pgs stuck unclean; 3 requests are blocked > 32 sec; mds cluster is degraded
2014-02-05T15:04:38.215 DEBUG:teuthology.orchestra.run:Running [10.214.138.168]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph health'

for hours and hours in the logs as it never recovers... it should eventually just fail.

Logs:

/var/lib/teuthworker/archive/teuthology-2014-02-04_02:30:01-upgrade:fs-next-testing-basic-vps/66668
/var/lib/teuthworker/archive/teuthology-2014-02-04_02:30:01-upgrade:fs-next-testing-basic-vps/66737

History

#1 Updated by Alfredo Deza about 10 years ago

  • Status changed from New to In Progress

#2 Updated by Alfredo Deza about 10 years ago

This is going to take some effort because there are over 100 while loops in teuthology that look dangerous.

The initial step is to have a helper to avoid the boilerplate and ease the refactoring of all the loops.

Pull request for said helper https://github.com/ceph/teuthology/pull/207

#3 Updated by Zack Cerza almost 10 years ago

  • Status changed from In Progress to Need More Info

Is this still an issue?

#4 Updated by Alfredo Deza almost 10 years ago

  • Status changed from Need More Info to 12

This is still an issue.

#5 Updated by Ian Colle almost 10 years ago

  • Project changed from teuthology to devops

#6 Updated by Sage Weil almost 6 years ago

  • Status changed from 12 to Rejected

Also available in: Atom PDF