test_selftest_cluster_log (tasks.mgr.test_module_selftest.TestModuleSelftest) fails in vstart
The QA cluster log test (https://github.com/ceph/ceph/blob/master/qa/tasks/mgr/test_module_selftest.py#L292) fails because the command 'ceph -w --watch-channel cluster|audit' does not display the events immediatelly. The QA backend defines a timeout of 15 seconds (https://github.com/ceph/ceph/blob/master/qa/tasks/ceph_test_case.py#L52) to allow the command to display the event, otherwise it fails.
If you do the tests manually, it sometimes takes 10 seconds until the event is displayed.
Start a vstart cluster and execute the following commands:
$ ceph --watch --watch-channel "*"
Run the following commands in a separate shell:
$ ceph mgr module enable selftest
$ ceph mgr self-test cluster-log audit info "foo bar info"
$ ceph mgr self-test cluster-log cluster error "foo bar err"
$ ceph mgr self-test cluster-log ...
In Ceph Mimic the events are displayed much faster, and the QA test does not fail. Increasing the timeout to 20seconds or more does not make sense IMHO.
#3 Updated by Alfonso MH 6 months ago
The fact is that if you run vstart_runner.py with this watcher:
bin/ceph -w --watch-channel "*"
the tests pass (the default timeout is enough).
But if you run:
bin/ceph -w --watch-channel cluster
and in another terminal:
bin/ceph mgr self-test cluster-log cluster error "foo bar err"
You don't see the notifcication.
Is this expected behaviour?
#4 Updated by Alfonso MH 6 months ago
@Sage Weil pointed out that in python 3 the "watch_channel" ceph arg was not being parsed correctly (even default value).
This PR fixes it:
With the fix, the related tests pass.
So this issue can be focused on analyzing the increase of time when displaying events.