Bug #39651
openqa: test_kill_mdstable fails unexpectedly
0%
Description
I get following traceback while running the test_kill_mdstable: https://github.com/ceph/ceph/blob/master/qa/tasks/cephfs/test_snapshots.py#L41
File "/home/rishabh/repos/ceph/pr-27718/qa/tasks/cephfs/test_snapshots.py", line 76, in test_kill_mdstable self.delete_mds_coredump(rank0['name']); File "/home/rishabh/repos/ceph/pr-27718/qa/tasks/cephfs/cephfs_test_case.py", line 268, in delete_mds_coredump ], stdout=StringIO()) File "../qa/tasks/vstart_runner.py", line 346, in run proc.wait() File "../qa/tasks/vstart_runner.py", line 179, in wait raise CommandFailedError(self.args, self.exitstatus) CommandFailedError: Command failed with status 1: ['cd', '|/usr/lib/systemd', Raw('&&'), 'ls', Raw('|'), 'xargs', 'file']
code_dir:"https://github.com/ceph/ceph/blob/master/qa/tasks/cephfs/cephfs_test_case.py#L257" does not contain a path to a directory at all. The value of core_dir is "|/usr/lib/systemd" which is weird because the string (which is supposed to be a path) has a vertical bar at the beginning and, more importantly, because "/usr/lib/systemd" is not a directory. The lines of code following will attempt to use it as the target directory for "cd" command. Following is the traceback obtained from running "test_kill_mdstable"
Updated by Patrick Donnelly almost 5 years ago
- Assignee set to Rishabh Dave
- Target version set to v15.0.0
- Start date deleted (
05/09/2019) - Component(FS) qa-suite added
Updated by Patrick Donnelly almost 5 years ago
- Subject changed from test_kill_mdstable fails unexpectedly to qa: test_kill_mdstable fails unexpectedly
- Description updated (diff)
- Source set to Q/A
Updated by Rishabh Dave over 4 years ago
Part of the problem is that the pipe character wasn't trimmed from output while extracting the path -
$ sysctl -n kernel.core_pattern |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e
This is easy to fix; I've raised a PR for that - https://github.com/ceph/ceph/pull/31619.
But next part of the issue is that the assert in following code (it's from qa/tasks/cephfs/cephfs_test_case.py
) fails -
def delete_mds_coredump(self, daemon_id): # delete coredump file, otherwise teuthology.internal.coredump will # catch it later and treat it as a failure. path = self.mds_cluster.mds_daemons[daemon_id].remote.run(args=[ "sudo", "sysctl", "-n", "kernel.core_pattern"], stdout=StringIO()).stdout.getvalue().strip() if path[0] == '|': path = path[1:] core_dir = os.path.dirname(path) if core_dir: # Non-default core_pattern with a directory in it # We have seen a core_pattern that looks like it's from teuthology's coredump # task, so proceed to clear out the core file log.info("Clearing core from directory: {0}".format(core_dir)) # Verify that we see the expected single coredump ls_proc = self.mds_cluster.mds_daemons[daemon_id].remote.run(args=[ "cd", core_dir, run.Raw('&&'), "sudo", "ls", run.Raw('|'), "sudo", "xargs", "file" ], stdout=StringIO()) cores = [l.partition(":")[0] for l in ls_proc.stdout.getvalue().strip().split("\n") if re.match(r'.*ceph-mds.* -i +{0}'.format(daemon_id), l)] log.info("Enumerated cores: {0}".format(cores)) self.assertEqual(len(cores), 1)
There's no "ceph-mds" in the ls_proc
. I've got no idea about the significance of core file. @Patrick @Zheng any suggestions/hints you can give?
Updated by Rishabh Dave over 4 years ago
I talked with Zheng. He told me that many tests cannot be executed successfully with vstart cluster and this is one of them.
Updated by Patrick Donnelly about 4 years ago
- Target version changed from v15.0.0 to v16.0.0
Updated by Patrick Donnelly over 3 years ago
- Target version changed from v16.0.0 to v17.0.0
- Backport set to pacific,octopus,nautilus