Fix #58758
openqa: fix testcase 'test_cluster_set_user_config_with_non_existing_clusterid'
0%
Description
http://pulpito.front.sepia.ceph.com/dparmar-2023-02-15_20:03:50-orch:cephadm-wip-58228-distro-default-smithi/7175071/
http://pulpito.front.sepia.ceph.com/dparmar-2023-02-16_20:38:24-orch:cephadm-wip-58228-distro-default-smithi/7176742/
While working on https://github.com/ceph/ceph/pull/49460, I found that testcase `test_cluster_set_user_config_with_non_existing_clusterid` fails intermittently and this testcase runs before the testcases I wrote as part of the PR, and even looking at logs doesn't seem like it's my code doing something.
Looking at logs I see:
2023-02-16T06:29:17.250 DEBUG:teuthology.orchestra.run.smithi049:> ceph nfs cluster config set test -i -
2023-02-16T06:29:17.628 INFO:teuthology.orchestra.run.smithi049.stderr:Error EINVAL: Invalid service name "nfs.test". View currently running services using "ceph orch ls"
2023-02-16T06:29:17.629 DEBUG:teuthology.orchestra.run:got remote process result: 22
I think 'command ceph nfs cluster config set test -i -' should be trying to communicate with daemon nfs.test (nfs.test.0.0.<machine_name>.<random_string>) and because the daemon is just starting or cluster is recovering from a bad state, the command fails. This can be solved by adding a sleep of 2-3 secs and have a loop iterate thrice to make sure we give it enough time before concluding that the daemon doesn't exist.
Another issue is with the usage of variable 'cluster_id':
def test_cluster_set_user_config_with_non_existing_clusterid(self):
'''
Test setting user config for non-existing nfs cluster.
'''
try:
cluster_id = 'invalidtest'
self.ctx.cluster.run(args=['ceph', 'nfs', 'cluster',
'config', 'set', self.cluster_id, '-i', '-'], stdin='testing')
self.fail(f"User config set for non-existing cluster {cluster_id}")
except CommandFailedError as e:
# Command should fail for test to pass
if e.exitstatus != errno.ENOENT:
raise
Here, ctx.cluster.run() uses 'self.cluster_id' whose value is "test" while the local var 'cluster_id' is used in fail() to demonstrate testcase failure caused by using 'self.cluster_id', I think this is a mistake and ctx.cluster.run() should also use local var 'cluster_id' or remove the local var and just use 'self.cluster_id' in both cmds
I'll fix these both issues in a PR