Project

General

Profile

Actions

Fix #58758

open

qa: fix testcase 'test_cluster_set_user_config_with_non_existing_clusterid'

Added by Dhairya Parmar about 1 year ago. Updated about 1 year ago.

Status:
Pending Backport
Priority:
Normal
Category:
Testing
Target version:
% Done:

0%

Source:
Development
Tags:
backport_processed
Backport:
reef,quincy,pacific
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/nfs
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.front.sepia.ceph.com/dparmar-2023-02-15_20:03:50-orch:cephadm-wip-58228-distro-default-smithi/7175071/
http://pulpito.front.sepia.ceph.com/dparmar-2023-02-16_20:38:24-orch:cephadm-wip-58228-distro-default-smithi/7176742/

While working on https://github.com/ceph/ceph/pull/49460, I found that testcase `test_cluster_set_user_config_with_non_existing_clusterid` fails intermittently and this testcase runs before the testcases I wrote as part of the PR, and even looking at logs doesn't seem like it's my code doing something.

Looking at logs I see:

2023-02-16T06:29:17.250 DEBUG:teuthology.orchestra.run.smithi049:> ceph nfs cluster config set test -i -
2023-02-16T06:29:17.628 INFO:teuthology.orchestra.run.smithi049.stderr:Error EINVAL: Invalid service name "nfs.test". View currently running services using "ceph orch ls" 
2023-02-16T06:29:17.629 DEBUG:teuthology.orchestra.run:got remote process result: 22

I think 'command ceph nfs cluster config set test -i -' should be trying to communicate with daemon nfs.test (nfs.test.0.0.<machine_name>.<random_string>) and because the daemon is just starting or cluster is recovering from a bad state, the command fails. This can be solved by adding a sleep of 2-3 secs and have a loop iterate thrice to make sure we give it enough time before concluding that the daemon doesn't exist.

Another issue is with the usage of variable 'cluster_id':

    def test_cluster_set_user_config_with_non_existing_clusterid(self):
        '''
        Test setting user config for non-existing nfs cluster.
        '''
        try:
            cluster_id = 'invalidtest'
            self.ctx.cluster.run(args=['ceph', 'nfs', 'cluster',
                'config', 'set', self.cluster_id, '-i', '-'], stdin='testing')
            self.fail(f"User config set for non-existing cluster {cluster_id}")
        except CommandFailedError as e:
            # Command should fail for test to pass
            if e.exitstatus != errno.ENOENT:
                raise

Here, ctx.cluster.run() uses 'self.cluster_id' whose value is "test" while the local var 'cluster_id' is used in fail() to demonstrate testcase failure caused by using 'self.cluster_id', I think this is a mistake and ctx.cluster.run() should also use local var 'cluster_id' or remove the local var and just use 'self.cluster_id' in both cmds

I'll fix these both issues in a PR


Related issues 3 (2 open1 closed)

Copied to CephFS - Backport #59244: quincy: qa: fix testcase 'test_cluster_set_user_config_with_non_existing_clusterid'In ProgressDhairya ParmarActions
Copied to CephFS - Backport #59245: reef: qa: fix testcase 'test_cluster_set_user_config_with_non_existing_clusterid'In ProgressDhairya ParmarActions
Copied to CephFS - Backport #59246: pacific: qa: fix testcase 'test_cluster_set_user_config_with_non_existing_clusterid'ResolvedDhairya ParmarActions
Actions

Also available in: Atom PDF