Project

General

Profile

Bug #48125

qa: test_subvolume_snapshot_clone_cancel_in_progress failure

Added by Patrick Donnelly 3 months ago. Updated 6 days ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
pacific,octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph, qa-suite
Labels (FS):
qa-failure
Pull request ID:
Crash signature:

Description

2020-11-05T03:00:02.672 INFO:teuthology.orchestra.run.smithi079:> (cd /home/ubuntu/cephtest/mnt.0 && exec sudo bash -c 'stat -c %h /home/ubuntu/cephtest/mnt.0/./volumes/_deleting')
2020-11-05T03:00:02.706 INFO:teuthology.orchestra.run.smithi079.stdout:3
2020-11-05T03:00:07.707 INFO:teuthology.orchestra.run:Running command with timeout 900
2020-11-05T03:00:07.708 INFO:teuthology.orchestra.run.smithi079:> (cd /home/ubuntu/cephtest/mnt.0 && exec sudo bash -c 'stat -c %h /home/ubuntu/cephtest/mnt.0/./volumes/_deleting')
2020-11-05T03:00:07.741 INFO:teuthology.orchestra.run.smithi079.stdout:3
...
2020-11-05T03:00:12.917 INFO:tasks.cephfs_test_runner:======================================================================
2020-11-05T03:00:12.918 INFO:tasks.cephfs_test_runner:ERROR: test_subvolume_snapshot_clone_cancel_in_progress (tasks.cephfs.test_volumes.TestSubvolumeSnapshotClones)
2020-11-05T03:00:12.918 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2020-11-05T03:00:12.918 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2020-11-05T03:00:12.918 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20201104.220526/qa/tasks/cephfs/test_volumes.py", line 2829, in test_subvolume_snapshot_clone_cancel_in_progress
2020-11-05T03:00:12.919 INFO:tasks.cephfs_test_runner:    self._wait_for_trash_empty()
2020-11-05T03:00:12.919 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20201104.220526/qa/tasks/cephfs/test_volumes.py", line 291, in _wait_for_trash_empty
2020-11-05T03:00:12.919 INFO:tasks.cephfs_test_runner:    self.mount_a.wait_for_dir_empty(trashdir, timeout=timeout)
2020-11-05T03:00:12.919 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20201104.220526/qa/tasks/cephfs/mount.py", line 810, in wait_for_dir_empty
2020-11-05T03:00:12.919 INFO:tasks.cephfs_test_runner:    while proceed():
2020-11-05T03:00:12.920 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 133, in __call__
2020-11-05T03:00:12.920 INFO:tasks.cephfs_test_runner:    raise MaxWhileTries(error_msg)
2020-11-05T03:00:12.920 INFO:tasks.cephfs_test_runner:teuthology.exceptions.MaxWhileTries: reached maximum tries (6) after waiting for 30 seconds

From: /ceph/teuthology-archive/pdonnell-2020-11-05_00:20:13-fs-wip-pdonnell-testing-20201104.220526-distro-basic-smithi/5591293/teuthology.log

mgr deleted the file quickly:

2020-11-05T02:59:42.611+0000 7fc9edf0e700  8 client.10770 rmdir(#0x1000000021a/fc1b25a9-dc09-47f2-9257-8d7f7f61593b) = 0

From: /ceph/teuthology-archive/pdonnell-2020-11-05_00:20:13-fs-wip-pdonnell-testing-20201104.220526-distro-basic-smithi/5591293/remote/smithi079/log/ceph-mgr.y.log.gz

It looks like stat is returning the wrong result (3 hard links instead of 2).

History

#1 Updated by Patrick Donnelly 3 months ago

  • Labels (FS) qa-failure added

#2 Updated by Patrick Donnelly 2 months ago

  • Status changed from New to Triaged
  • Assignee set to Jeff Layton

#3 Updated by Jeff Layton 2 months ago

I assume that "_deleting" is a directory and that this test is expecting to see a particular link count on the directory when all of its entries are removed?

Note that POSIX doesn't place any specific meaning on the link count for directories, but it looks like ceph has some code for that in ceph_getattr:

                /*
                 * Some applications rely on the number of st_nlink
                 * value on directories to be either 0 (if unlinked)
                 * or 2 + number of subdirectories.
                 */
                if (stat->nlink == 1)
                        /* '.' + '..' + subdirs */
                        stat->nlink = 1 + 1 + ci->i_subdirs;

In any case, I think we only update i_subdirs when we have Fs caps, so I guess we need to include Fs caps when this is a directory and we're fetching the link count. I'll try to spin up a patch for it.

How reproducible is this?

#4 Updated by Patrick Donnelly 2 months ago

Jeff Layton wrote:

I assume that "_deleting" is a directory and that this test is expecting to see a particular link count on the directory when all of its entries are removed?

Yes. (Assuming all of its entries are sub-directories.)

Note that POSIX doesn't place any specific meaning on the link count for directories, but it looks like ceph has some code for that in ceph_getattr:

[...]

At least ext4 also does this:

pdonnell@aglarond ~$ mkdir foo
pdonnell@aglarond ~$ cd foo
pdonnell@aglarond ~/foo$ stat -c %h .
2
pdonnell@aglarond ~/foo$ mkdir bar
pdonnell@aglarond ~/foo$ stat -c %h .
3
pdonnell@aglarond ~/foo$ rmdir bar
pdonnell@aglarond ~/foo$ stat -c %h .
2
pdonnell@aglarond ~/foo$ 

In any case, I think we only update i_subdirs when we have Fs caps, so I guess we need to include Fs caps when this is a directory and we're fetching the link count. I'll try to spin up a patch for it.

Thanks!

How reproducible is this?

Not 100% but often. Here's the latest one: https://pulpito.ceph.com/pdonnell-2020-11-06_23:20:14-fs-wip-pdonnell-testing-20201106.185908-distro-basic-smithi/5597723/

#5 Updated by Jeff Layton 2 months ago

  • Status changed from Triaged to Fix Under Review

#6 Updated by Patrick Donnelly 6 days ago

  • Target version changed from v16.0.0 to v17.0.0
  • Backport set to pacific,octopus,nautilus

Also available in: Atom PDF