Fix #58758: qa: fix testcase 'test_cluster_set_user_config_with_non_existing_clusterid' - CephFS - Ceph

Actions

Copy link

Fix #58758

open

qa: fix testcase 'test_cluster_set_user_config_with_non_existing_clusterid'

Added by Dhairya Parmar about 1 year ago. Updated about 1 year ago.

Status:

Pending Backport

Priority:

Normal

Assignee:

Dhairya Parmar

Category:

Testing

Target version:

Ceph - v18.0.0

% Done:

Source:

Development

Tags:

backport_processed

Backport:

reef,quincy,pacific

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

mgr/nfs

Labels (FS):

qa, qa-failure

Pull request ID:

49460

Crash signature (v1):

Crash signature (v2):

Description

http://pulpito.front.sepia.ceph.com/dparmar-2023-02-15_20:03:50-orch:cephadm-wip-58228-distro-default-smithi/7175071/
http://pulpito.front.sepia.ceph.com/dparmar-2023-02-16_20:38:24-orch:cephadm-wip-58228-distro-default-smithi/7176742/

While working on https://github.com/ceph/ceph/pull/49460, I found that testcase `test_cluster_set_user_config_with_non_existing_clusterid` fails intermittently and this testcase runs before the testcases I wrote as part of the PR, and even looking at logs doesn't seem like it's my code doing something.

Looking at logs I see:

2023-02-16T06:29:17.250 DEBUG:teuthology.orchestra.run.smithi049:> ceph nfs cluster config set test -i -
2023-02-16T06:29:17.628 INFO:teuthology.orchestra.run.smithi049.stderr:Error EINVAL: Invalid service name "nfs.test". View currently running services using "ceph orch ls" 
2023-02-16T06:29:17.629 DEBUG:teuthology.orchestra.run:got remote process result: 22

I think 'command ceph nfs cluster config set test -i -' should be trying to communicate with daemon nfs.test (nfs.test.0.0.<machine_name>.<random_string>) and because the daemon is just starting or cluster is recovering from a bad state, the command fails. This can be solved by adding a sleep of 2-3 secs and have a loop iterate thrice to make sure we give it enough time before concluding that the daemon doesn't exist.

Another issue is with the usage of variable 'cluster_id':

    def test_cluster_set_user_config_with_non_existing_clusterid(self):
        '''
        Test setting user config for non-existing nfs cluster.
        '''
        try:
            cluster_id = 'invalidtest'
            self.ctx.cluster.run(args=['ceph', 'nfs', 'cluster',
                'config', 'set', self.cluster_id, '-i', '-'], stdin='testing')
            self.fail(f"User config set for non-existing cluster {cluster_id}")
        except CommandFailedError as e:
            # Command should fail for test to pass
            if e.exitstatus != errno.ENOENT:
                raise

Here, ctx.cluster.run() uses 'self.cluster_id' whose value is "test" while the local var 'cluster_id' is used in fail() to demonstrate testcase failure caused by using 'self.cluster_id', I think this is a mistake and ctx.cluster.run() should also use local var 'cluster_id' or remove the local var and just use 'self.cluster_id' in both cmds

I'll fix these both issues in a PR

Related issues 3 (2 open — 1 closed)