Project

General

Profile

Fix #58758

qa: fix testcase 'test_cluster_set_user_config_with_non_existing_clusterid'

Added by Dhairya Parmar about 1 year ago. Updated 10 months ago.

Status:
Pending Backport
Priority:
Normal
Category:
Testing
Target version:
% Done:

0%

Source:
Development
Tags:
backport_processed
Backport:
reef,quincy,pacific
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/nfs
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.front.sepia.ceph.com/dparmar-2023-02-15_20:03:50-orch:cephadm-wip-58228-distro-default-smithi/7175071/
http://pulpito.front.sepia.ceph.com/dparmar-2023-02-16_20:38:24-orch:cephadm-wip-58228-distro-default-smithi/7176742/

While working on https://github.com/ceph/ceph/pull/49460, I found that testcase `test_cluster_set_user_config_with_non_existing_clusterid` fails intermittently and this testcase runs before the testcases I wrote as part of the PR, and even looking at logs doesn't seem like it's my code doing something.

Looking at logs I see:

2023-02-16T06:29:17.250 DEBUG:teuthology.orchestra.run.smithi049:> ceph nfs cluster config set test -i -
2023-02-16T06:29:17.628 INFO:teuthology.orchestra.run.smithi049.stderr:Error EINVAL: Invalid service name "nfs.test". View currently running services using "ceph orch ls" 
2023-02-16T06:29:17.629 DEBUG:teuthology.orchestra.run:got remote process result: 22

I think 'command ceph nfs cluster config set test -i -' should be trying to communicate with daemon nfs.test (nfs.test.0.0.<machine_name>.<random_string>) and because the daemon is just starting or cluster is recovering from a bad state, the command fails. This can be solved by adding a sleep of 2-3 secs and have a loop iterate thrice to make sure we give it enough time before concluding that the daemon doesn't exist.

Another issue is with the usage of variable 'cluster_id':

    def test_cluster_set_user_config_with_non_existing_clusterid(self):
        '''
        Test setting user config for non-existing nfs cluster.
        '''
        try:
            cluster_id = 'invalidtest'
            self.ctx.cluster.run(args=['ceph', 'nfs', 'cluster',
                'config', 'set', self.cluster_id, '-i', '-'], stdin='testing')
            self.fail(f"User config set for non-existing cluster {cluster_id}")
        except CommandFailedError as e:
            # Command should fail for test to pass
            if e.exitstatus != errno.ENOENT:
                raise

Here, ctx.cluster.run() uses 'self.cluster_id' whose value is "test" while the local var 'cluster_id' is used in fail() to demonstrate testcase failure caused by using 'self.cluster_id', I think this is a mistake and ctx.cluster.run() should also use local var 'cluster_id' or remove the local var and just use 'self.cluster_id' in both cmds

I'll fix these both issues in a PR


Related issues

Copied to CephFS - Backport #59244: quincy: qa: fix testcase 'test_cluster_set_user_config_with_non_existing_clusterid' In Progress
Copied to CephFS - Backport #59245: reef: qa: fix testcase 'test_cluster_set_user_config_with_non_existing_clusterid' In Progress
Copied to CephFS - Backport #59246: pacific: qa: fix testcase 'test_cluster_set_user_config_with_non_existing_clusterid' Resolved

History

#1 Updated by Venky Shankar about 1 year ago

  • Category set to Testing
  • Target version set to v18.0.0
  • Backport set to pacific,quincy

#2 Updated by Dhairya Parmar 12 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 49460

#3 Updated by Laura Flores 12 months ago

  • Tags set to test-failure

/a/yuriw-2023-02-24_17:50:19-rados-main-distro-default-smithi/7186744

#4 Updated by Laura Flores 11 months ago

/a/lflores-2023-03-27_02:17:31-rados-wip-aclamk-bs-elastic-shared-blob-save-25.03.2023-a-distro-default-smithi/7221061

#5 Updated by Venky Shankar 11 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport changed from pacific,quincy to reef,quincy,pacific

#6 Updated by Backport Bot 11 months ago

  • Copied to Backport #59244: quincy: qa: fix testcase 'test_cluster_set_user_config_with_non_existing_clusterid' added

#7 Updated by Backport Bot 11 months ago

  • Copied to Backport #59245: reef: qa: fix testcase 'test_cluster_set_user_config_with_non_existing_clusterid' added

#8 Updated by Backport Bot 11 months ago

  • Copied to Backport #59246: pacific: qa: fix testcase 'test_cluster_set_user_config_with_non_existing_clusterid' added

#9 Updated by Backport Bot 11 months ago

  • Tags set to backport_processed

#10 Updated by Laura Flores 11 months ago

/a/yuriw-2023-03-30_21:29:24-rados-wip-yuri2-testing-2023-03-30-0826-distro-default-smithi/7227514

#11 Updated by Dhairya Parmar 10 months ago

Laura Flores wrote:

/a/yuriw-2023-03-30_21:29:24-rados-wip-yuri2-testing-2023-03-30-0826-distro-default-smithi/7227514

Branch [1] doesn't contain the fix [2]. Do let me know if it still persists even with the patch.

[1] https://github.com/ceph/ceph-ci/blob/wip-yuri2-testing-2023-03-30-0826/qa/tasks/cephfs/test_nfs.py#L583-L595
[2] https://github.com/ceph/ceph/blob/main/qa/tasks/cephfs/test_nfs.py#L655-L672

Also available in: Atom PDF