Bug #57864
qa: fail "Checking cluster log for badness" check (and therefore the job) if the cluster log file is missing
0%
Description
Discovered in https://github.com/ceph/ceph/pull/48288#discussion_r993883997:
It appears there's a case where the whitelist check fails silently
All tests reported as "pass"
http://pulpito.front.sepia.ceph.com/teuthology-2022-10-07_14:23:03-upgrade:pacific-x-quincy-distro-default-smithi/
As seen in http://qa-proxy.ceph.com/teuthology/teuthology-2022-10-07_14:23:03-upgrade:pacific-x-quincy-distro-default-smithi/7058075/teuthology.log RemoveFullTry runs as expected, but the badness check has issues :
2022-10-07T18:49:19.467 INFO:tasks.workunit.client.0.smithi110.stdout:[ RUN ] TestLibRBD.RemoveFullTry 2022-10-07T18:49:41.562 INFO:tasks.workunit.client.0.smithi110.stdout:[ OK ] TestLibRBD.RemoveFullTry (22095 ms) ... 2022-10-07T19:10:54.268 INFO:tasks.cephadm:Checking cluster log for badness... 2022-10-07T19:10:54.269 DEBUG:teuthology.orchestra.run.smithi110:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/24ceeee2-466a-11ed-8436-001a4aab830c/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1 2022-10-07T19:10:54.296 INFO:teuthology.orchestra.run.smithi110.stderr:grep: /var/log/ceph/24ceeee2-466a-11ed-8436-001a4aab830c/ceph.log: No such file or directory
When grep '\[ERR\]|\[WRN\]|\[SEC\]' on a non-existent file "No such file or directory" is output to terminal via stderr. stdout is empty. When the empty stdout is piped finally to the head command the sh/run method returns 0 and fails silently.
For example:
$ egrep "SOMETHING" /does/not/exist grep: /does/not/exist: No such file or directory $ echo $? 2 $ egrep "SOMETHING" /does/not/exist | head -n 1 grep: /does/not/exist: No such file or directory $ echo $? 0
Just to expand on the commit history a bit:
- this is coming from cephadm task (`qa/tasks/cephadm.py`) and was added in https://github.com/ceph/ceph/commit/65b402563547f8caf5e57b5f75324077df9c24d9 -- cut-and-paste from the ceph task
- ceph task (`qa/tasks/ceph.py`) has the same issue and that goes all the way back, through https://github.com/ceph/ceph/commit/bcded7f163570dd6563523957bb7240cefd534fd and https://github.com/ceph/ceph/commit/1cad309d6542697eb774ab5eed985270118631db, to https://github.com/ceph/ceph/commit/42318c57cbfd29c0654bf9701dd1093bd6e93154
- rook task (`qa/tasks/rook.py`) has the same issue, again inherited from the ceph task
r = mon0_remote.run(args=[ 'if', run.Raw('!'), 'egrep', '-q', '\[ERR\]|\[WRN\]|\[SEC\]', '/tmp/cephtest/data/%s/log' % firstmon, run.Raw(';'), 'then', 'echo', 'OK', run.Raw(';'), 'fi', ], stdout=StringIO(), )
Inverting `egrep -q` exit code (which is 2 for a nonexistent file) results in echoing OK...
History
#1 Updated by Ilya Dryomov 4 months ago
- Status changed from New to In Progress
- Assignee set to Christopher Hoffman