Project

General

Profile

Bug #57864

qa: fail "Checking cluster log for badness" check (and therefore the job) if the cluster log file is missing

Added by Ilya Dryomov 4 months ago. Updated 4 months ago.

Status:
In Progress
Priority:
Urgent
Category:
qa
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Discovered in https://github.com/ceph/ceph/pull/48288#discussion_r993883997:


It appears there's a case where the whitelist check fails silently

All tests reported as "pass"
http://pulpito.front.sepia.ceph.com/teuthology-2022-10-07_14:23:03-upgrade:pacific-x-quincy-distro-default-smithi/

As seen in http://qa-proxy.ceph.com/teuthology/teuthology-2022-10-07_14:23:03-upgrade:pacific-x-quincy-distro-default-smithi/7058075/teuthology.log RemoveFullTry runs as expected, but the badness check has issues :

2022-10-07T18:49:19.467 INFO:tasks.workunit.client.0.smithi110.stdout:[ RUN      ] TestLibRBD.RemoveFullTry
2022-10-07T18:49:41.562 INFO:tasks.workunit.client.0.smithi110.stdout:[       OK ] TestLibRBD.RemoveFullTry (22095 ms)
...
2022-10-07T19:10:54.268 INFO:tasks.cephadm:Checking cluster log for badness...
2022-10-07T19:10:54.269 DEBUG:teuthology.orchestra.run.smithi110:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/24ceeee2-466a-11ed-8436-001a4aab830c/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2022-10-07T19:10:54.296 INFO:teuthology.orchestra.run.smithi110.stderr:grep: /var/log/ceph/24ceeee2-466a-11ed-8436-001a4aab830c/ceph.log: No such file or directory

When grep '\[ERR\]|\[WRN\]|\[SEC\]' on a non-existent file "No such file or directory" is output to terminal via stderr. stdout is empty. When the empty stdout is piped finally to the head command the sh/run method returns 0 and fails silently.

For example:

$ egrep "SOMETHING" /does/not/exist 
grep: /does/not/exist: No such file or directory
$ echo $?
2
$ egrep "SOMETHING" /does/not/exist | head -n 1
grep: /does/not/exist: No such file or directory
$ echo $?
0


Just to expand on the commit history a bit:

- this is coming from cephadm task (`qa/tasks/cephadm.py`) and was added in https://github.com/ceph/ceph/commit/65b402563547f8caf5e57b5f75324077df9c24d9 -- cut-and-paste from the ceph task
- ceph task (`qa/tasks/ceph.py`) has the same issue and that goes all the way back, through https://github.com/ceph/ceph/commit/bcded7f163570dd6563523957bb7240cefd534fd and https://github.com/ceph/ceph/commit/1cad309d6542697eb774ab5eed985270118631db, to https://github.com/ceph/ceph/commit/42318c57cbfd29c0654bf9701dd1093bd6e93154
- rook task (`qa/tasks/rook.py`) has the same issue, again inherited from the ceph task

        r = mon0_remote.run(args=[
                'if', run.Raw('!'),
                'egrep', '-q', '\[ERR\]|\[WRN\]|\[SEC\]',
                '/tmp/cephtest/data/%s/log' % firstmon,
                run.Raw(';'), 'then', 'echo', 'OK', run.Raw(';'),
                'fi',
                ],
                stdout=StringIO(),
                )

Inverting `egrep -q` exit code (which is 2 for a nonexistent file) results in echoing OK...

History

#1 Updated by Ilya Dryomov 4 months ago

  • Status changed from New to In Progress
  • Assignee set to Christopher Hoffman

Also available in: Atom PDF