Bug #1989
teuthology: error in ceph.log didn't make teutholgy return error code
% Done:
0%
Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
in a bash while loop, I saw
INFO:teuthology.task.ceph:Checking cluster ceph.log for badness... WARNING:teuthology.task.ceph:Found errors (ERR|WRN|SEC) in cluster log INFO:teuthology.task.ceph:Cleaning ceph cluster... INFO:teuthology.task.ceph:Removing ceph binaries... INFO:teuthology.task.ceph:Removing shipped files: daemon-helper enable-coredump... INFO:teuthology.orchestra.run.out:kernel.core_pattern = core INFO:teuthology.orchestra.run.out:kernel.core_pattern = core INFO:teuthology.orchestra.run.out:kernel.core_pattern = core INFO:teuthology.orchestra.run.out:kernel.core_pattern = core INFO:teuthology.orchestra.run.out:kernel.core_pattern = core INFO:teuthology.orchestra.run.out:kernel.core_pattern = core INFO:teuthology.orchestra.run.out:kernel.core_pattern = core INFO:teuthology.task.internal:Removing archive directory... INFO:teuthology.task.internal:Tidying up after the test... INFO:teuthology.run:Duration was 1216.594967 seconds + date
and the calling script blithely continued. that was
while bin/teuthology $job $2 $3 $4 do date N=$(($N+1)) echo "$job: $N passes" title done echo "$job: $N passes, then failure."
Associated revisions
Use non-zero exit status if any tests failed
Fixes: #1989
History
#1 Updated by Greg Farnum almost 12 years ago
I thought we turned this off on purpose because thrashing always triggered it. Am I remembering incorrectly?
#2 Updated by Sage Weil almost 12 years ago
we whitelist log entries. it only prints that (and sets success=False) if it sees something unexpected
log.info('Checking cluster ceph.log for badness...') def first_in_ceph_log(pattern, excludes): args = [ 'egrep', pattern, '/tmp/cephtest/data/%s/log' % firstmon, ] for exclude in excludes: args.extend([run.Raw('|'), 'egrep', '-v', exclude]) args.extend([ run.Raw('|'), 'head', '-n', '1', ]) r = mon0_remote.run( stdout=StringIO(), args=args, ) stdout = r.stdout.getvalue() if stdout != '': return stdout return None if first_in_ceph_log('\[ERR\]|\[WRN\]|\[SEC\]', config['log_whitelist']) is not None: log.warning('Found errors (ERR|WRN|SEC) in cluster log') ctx.summary['success'] = False # use the most severe problem as the failure reason if 'failure_reason' not in ctx.summary: for pattern in ['\[SEC\]', '\[ERR\]', '\[WRN\]']: match = first_in_ceph_log(pattern, config['log_whitelist']) if match is not None: ctx.summary['failure_reason'] = \ '"{match}" in cluster log'.format( match=match.rstrip('\n'), ) break
the problem i see tho is that it prints errors found but doesn't return an error code
#3 Updated by Josh Durgin almost 12 years ago
- Assignee set to Josh Durgin
#4 Updated by Josh Durgin almost 12 years ago
- Status changed from New to Resolved
- Source set to Development