Bug #61732
closedpacific: test_cluster_info fails from "No daemons reported"
100%
Description
/a/yuriw-2023-06-15_19:41:47-rados-wip-yuri6-testing-2023-06-14-0754-pacific-distro-default-smithi/7305740
2023-06-15T22:03:58.417 INFO:teuthology.orchestra.run.smithi171.stdout:No daemons reported
2023-06-15T22:03:58.430 WARNING:teuthology.contextutil:reached maximum tries (11) after waiting for 60 seconds
2023-06-15T22:03:58.431 WARNING:tasks.cephfs.test_nfs:NFS Ganesha cluster deployment failed, retrying
2023-06-15T22:03:58.431 DEBUG:teuthology.orchestra.run.smithi171:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph log 'Ended test tasks.cephfs.test_nfs.TestNFS.test_cluster_info'
2023-06-15T22:03:58.991 INFO:tasks.cephfs_test_runner:test_cluster_info (tasks.cephfs.test_nfs.TestNFS) ... ERROR
2023-06-15T22:03:58.991 INFO:tasks.cephfs_test_runner:
2023-06-15T22:03:58.992 INFO:tasks.cephfs_test_runner:======================================================================
2023-06-15T22:03:58.992 INFO:tasks.cephfs_test_runner:ERROR: test_cluster_info (tasks.cephfs.test_nfs.TestNFS)
2023-06-15T22:03:58.992 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2023-06-15T22:03:58.992 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2023-06-15T22:03:58.993 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_a2a4ed2b4fbd1366687a5db6ac3695c86d95455f/qa/tasks/cephfs/test_nfs.py", line 574, in test_cluster_info
2023-06-15T22:03:58.993 INFO:tasks.cephfs_test_runner: self._test_create_cluster()
2023-06-15T22:03:58.993 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_a2a4ed2b4fbd1366687a5db6ac3695c86d95455f/qa/tasks/cephfs/test_nfs.py", line 149, in _test_create_cluster
2023-06-15T22:03:58.993 INFO:tasks.cephfs_test_runner: while proceed():
2023-06-15T22:03:58.994 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_teuthology_961f4fb51318c373681de5844aadbc6dc0e58abc/teuthology/contextutil.py", line 134, in __call__
2023-06-15T22:03:58.994 INFO:tasks.cephfs_test_runner: raise MaxWhileTries(error_msg)
2023-06-15T22:03:58.994 INFO:tasks.cephfs_test_runner:teuthology.exceptions.MaxWhileTries: reached maximum tries (11) after waiting for 40 seconds
Updated by Venky Shankar 10 months ago
- Category set to Correctness/Safety
- Assignee set to Dhairya Parmar
- Target version set to v19.0.0
- Labels (FS) NFS-cluster added
Updated by Laura Flores 10 months ago
/a/yuriw-2023-06-23_20:51:14-rados-wip-yuri8-testing-2023-06-22-1309-pacific-distro-default-smithi/7314160
Updated by Laura Flores 10 months ago
- Priority changed from Normal to High
Occurs quite a bit. Perhaps from a recent regression?
See http://pulpito.front.sepia.ceph.com/lflores-2023-07-05_17:10:12-rados-wip-yuri8-testing-2023-06-22-1309-pacific-distro-default-smithi/ for example.
Updated by Dhairya Parmar 10 months ago
Laura Flores wrote:
Occurs quite a bit. Perhaps from a recent regression?
See http://pulpito.front.sepia.ceph.com/lflores-2023-07-05_17:10:12-rados-wip-yuri8-testing-2023-06-22-1309-pacific-distro-default-smithi/ for example.
I'm going through logs now, will update soon.
Updated by Dhairya Parmar 10 months ago
@laura this isn't seen in quincy or reef, is it?
Updated by Laura Flores 10 months ago
Dhairya Parmar wrote:
@laura this isn't seen in quincy or reef, is it?
Right. But since it occurs in pacific, it might be in quincy and reef too, so I will continue to update the tracker if I see anything.
Updated by Dhairya Parmar 10 months ago
Laura Flores wrote:
Dhairya Parmar wrote:
@laura this isn't seen in quincy or reef, is it?
Right. But since it occurs in pacific, it might be in quincy and reef too, so I will continue to update the tracker if I see anything.
Can you share the list of PRs included in the run? also is it the same set of PRs as yuri's run you pasted above?
Updated by Dhairya Parmar 10 months ago
Laura Flores wrote:
This is a private workspace; need access to it; already requested, who do I need to get in touch with for the permission? Yuri W?
Updated by Dhairya Parmar 10 months ago
Log is full of line complaining it could not find the nfs cluster daemon
INFO:teuthology.orchestra.run.smithi164.stdout:No daemons reported
the issue lies over here:
2023-07-05T17:43:26.341 DEBUG:teuthology.orchestra.run.smithi164:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph n f s ' ' c l u s t e r ' ' c r e a t e ' ' t e s t
i.e. the weird spaces and the single quotes makes the command uninterpretable and throws EINVAL:
2023-07-05T17:43:26.907 INFO:teuthology.orchestra.run.smithi164.stderr:no valid command found; 10 closest matches:
2023-07-05T17:43:26.907 INFO:teuthology.orchestra.run.smithi164.stderr:pg stat
2023-07-05T17:43:26.907 INFO:teuthology.orchestra.run.smithi164.stderr:pg getmap
2023-07-05T17:43:26.907 INFO:teuthology.orchestra.run.smithi164.stderr:pg dump [all|summary|sum|delta|pools|osds|pgs|pgs_brief...]
2023-07-05T17:43:26.907 INFO:teuthology.orchestra.run.smithi164.stderr:pg dump_json [all|summary|sum|pools|osds|pgs...]
2023-07-05T17:43:26.907 INFO:teuthology.orchestra.run.smithi164.stderr:pg dump_pools_json
2023-07-05T17:43:26.908 INFO:teuthology.orchestra.run.smithi164.stderr:pg ls-by-pool <poolstr> [<states>...]
2023-07-05T17:43:26.908 INFO:teuthology.orchestra.run.smithi164.stderr:pg ls-by-primary <id|osd.id> [<pool:int>] [<states>...]
2023-07-05T17:43:26.908 INFO:teuthology.orchestra.run.smithi164.stderr:pg ls-by-osd <id|osd.id> [<pool:int>] [<states>...]
2023-07-05T17:43:26.908 INFO:teuthology.orchestra.run.smithi164.stderr:pg ls [<pool:int>] [<states>...]
2023-07-05T17:43:26.908 INFO:teuthology.orchestra.run.smithi164.stderr:pg dump_stuck [inactive|unclean|stale|undersized|degraded...] [<threshold:int>]
2023-07-05T17:43:26.908 INFO:teuthology.orchestra.run.smithi164.stderr:Error EINVAL: invalid command
2023-07-05T17:43:26.909 DEBUG:teuthology.orchestra.run:got remote process result: 22
This goes in a loop and at the ends fails.
Updated by Dhairya Parmar 10 months ago
_test_create_cluster() in test_nfs demanded strerr to be looked at; therefore I had created a new helper _nfs_complete_cmd()
def _nfs_complete_cmd(self, cmd):
return self.mgr_cluster.mon_manager.run_cluster_cmd(args=f"nfs {cmd}",
stdout=StringIO(),
stderr=StringIO(),
check_status=False)
Which is being used here [0].
There is a difference in the way I've called this helper, instead of sending a tuple I sent a string because it is more readable and the underlying code in main branch does allow it [1] but this code is missing in pacific branch [2]; and this clearly explains why we see those weird spaces and unintended singles quotes when the cmd `nfs cluster create test` is interpreted by the pacific's run_cluster_cmd().
The commits that allowed usage of both string and tuple while passing cli cmds are [3] and [4] and obviously were never backported to pacific. So either I make changes to [0] and pass a tuple or we backport [3] and [4]. Either way is good but I'd recommend backporting because this issue may arise in future where someone again would pass a cmd as string only to find some unearthly command in pacific teuthology logs :P
[0] https://github.com/ceph/ceph/pull/50809/files#diff-61b87b23c38fe121bbe5f110686a0cd1e5e338811b5fa1a9456c4548bd206055R153-R154
[1] https://github.com/ceph/ceph/blob/main/qa/tasks/ceph_manager.py#L1562-L1565
[2] https://github.com/ceph/ceph/blob/pacific/qa/tasks/ceph_manager.py#L1560-L1593
[3] https://github.com/ceph/ceph/commit/93677576c1fd6d0e4e2991a9ba6be6d222ea98ea
[4] https://github.com/ceph/ceph/commit/a1dc6b6c1964423158dcd7c930db5e3063ff210e
Updated by Laura Flores 9 months ago
/a/yuriw-2023-07-19_14:33:14-rados-wip-yuri11-testing-2023-07-18-0927-pacific-distro-default-smithi/7343428
Updated by Sridhar Seshasayee 9 months ago
/a/yuriw-2023-07-26_15:54:22-rados-wip-yuri6-testing-2023-07-24-0819-pacific-distro-default-smithi/7353337
/a/yuriw-2023-07-26_15:54:22-rados-wip-yuri6-testing-2023-07-24-0819-pacific-distro-default-smithi/7353548
/a/yuriw-2023-07-26_15:54:22-rados-wip-yuri6-testing-2023-07-24-0819-pacific-distro-default-smithi/7353740
/a/yuriw-2023-07-26_15:54:22-rados-wip-yuri6-testing-2023-07-24-0819-pacific-distro-default-smithi/7353948
Updated by Venky Shankar 9 months ago
Dhairya Parmar wrote:
_test_create_cluster() in test_nfs demanded strerr to be looked at; therefore I had created a new helper _nfs_complete_cmd()
[...]
Which is being used here [0].
There is a difference in the way I've called this helper, instead of sending a tuple I sent a string because it is more readable and the underlying code in main branch does allow it [1] but this code is missing in pacific branch [2]; and this clearly explains why we see those weird spaces and unintended singles quotes when the cmd `nfs cluster create test` is interpreted by the pacific's run_cluster_cmd().
The commits that allowed usage of both string and tuple while passing cli cmds are [3] and [4] and obviously were never backported to pacific. So either I make changes to [0] and pass a tuple or we backport [3] and [4]. Either way is good but I'd recommend backporting because this issue may arise in future where someone again would pass a cmd as string only to find some unearthly command in pacific teuthology logs :P
[0] https://github.com/ceph/ceph/pull/50809/files#diff-61b87b23c38fe121bbe5f110686a0cd1e5e338811b5fa1a9456c4548bd206055R153-R154
[1] https://github.com/ceph/ceph/blob/main/qa/tasks/ceph_manager.py#L1562-L1565
[2] https://github.com/ceph/ceph/blob/pacific/qa/tasks/ceph_manager.py#L1560-L1593
[3] https://github.com/ceph/ceph/commit/93677576c1fd6d0e4e2991a9ba6be6d222ea98ea
[4] https://github.com/ceph/ceph/commit/a1dc6b6c1964423158dcd7c930db5e3063ff210e
Dhairya, could you try backporting the dependent PRs?
Updated by Dhairya Parmar 9 months ago
Venky Shankar wrote:
Dhairya Parmar wrote:
_test_create_cluster() in test_nfs demanded strerr to be looked at; therefore I had created a new helper _nfs_complete_cmd()
[...]
Which is being used here [0].
There is a difference in the way I've called this helper, instead of sending a tuple I sent a string because it is more readable and the underlying code in main branch does allow it [1] but this code is missing in pacific branch [2]; and this clearly explains why we see those weird spaces and unintended singles quotes when the cmd `nfs cluster create test` is interpreted by the pacific's run_cluster_cmd().
The commits that allowed usage of both string and tuple while passing cli cmds are [3] and [4] and obviously were never backported to pacific. So either I make changes to [0] and pass a tuple or we backport [3] and [4]. Either way is good but I'd recommend backporting because this issue may arise in future where someone again would pass a cmd as string only to find some unearthly command in pacific teuthology logs :P
[0] https://github.com/ceph/ceph/pull/50809/files#diff-61b87b23c38fe121bbe5f110686a0cd1e5e338811b5fa1a9456c4548bd206055R153-R154
[1] https://github.com/ceph/ceph/blob/main/qa/tasks/ceph_manager.py#L1562-L1565
[2] https://github.com/ceph/ceph/blob/pacific/qa/tasks/ceph_manager.py#L1560-L1593
[3] https://github.com/ceph/ceph/commit/93677576c1fd6d0e4e2991a9ba6be6d222ea98ea
[4] https://github.com/ceph/ceph/commit/a1dc6b6c1964423158dcd7c930db5e3063ff210eDhairya, could you try backporting the dependent PRs?
okay
Updated by Laura Flores 9 months ago
/a/yuriw-2023-08-02_20:21:03-rados-wip-yuri3-testing-2023-08-01-0825-pacific-distro-default-smithi/7358531
Updated by Venky Shankar 9 months ago
- Status changed from Triaged to Fix Under Review
- Pull request ID set to 52763
Updated by Venky Shankar 9 months ago
- Subject changed from test_cluster_info fails from "No daemons reported" to pacific: test_cluster_info fails from "No daemons reported"
Updated by Laura Flores 9 months ago
/a/yuriw-2023-08-08_14:45:33-rados-wip-yuri6-testing-2023-08-03-0807-pacific-distro-default-smithi/7362839
Updated by Laura Flores 9 months ago
/a/yuriw-2023-08-10_20:19:11-rados-wip-yuri2-testing-2023-08-08-0755-pacific-distro-default-smithi/7366072
Updated by Aishwarya Mathuria 9 months ago
/a/yuriw-2023-08-16_22:40:18-rados-wip-yuri2-testing-2023-08-16-1142-pacific-distro-default-smithi/7370706/
Updated by Laura Flores 8 months ago
/a/yuriw-2023-08-21_23:10:07-rados-pacific-release-distro-default-smithi/7375005
Updated by Laura Flores 8 months ago
/a/yuriw-2023-09-01_19:14:47-rados-wip-batrick-testing-20230831.124848-pacific-distro-default-smithi/7386551
Updated by Laura Flores 6 months ago
/a/lflores-2023-11-01_18:38:59-rados-wip-yuri5-testing-2023-10-24-0737-pacific-distro-default-smithi/7443306
Updated by Konstantin Shalygin 26 days ago
- Status changed from Fix Under Review to Resolved
- % Done changed from 0 to 100
- Backport deleted (
pacific)