Project

General

Profile

Actions

Bug #48125

closed

qa: test_subvolume_snapshot_clone_cancel_in_progress failure

Added by Patrick Donnelly over 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
pacific,octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph, qa-suite
Labels (FS):
qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-11-05T03:00:02.672 INFO:teuthology.orchestra.run.smithi079:> (cd /home/ubuntu/cephtest/mnt.0 && exec sudo bash -c 'stat -c %h /home/ubuntu/cephtest/mnt.0/./volumes/_deleting')
2020-11-05T03:00:02.706 INFO:teuthology.orchestra.run.smithi079.stdout:3
2020-11-05T03:00:07.707 INFO:teuthology.orchestra.run:Running command with timeout 900
2020-11-05T03:00:07.708 INFO:teuthology.orchestra.run.smithi079:> (cd /home/ubuntu/cephtest/mnt.0 && exec sudo bash -c 'stat -c %h /home/ubuntu/cephtest/mnt.0/./volumes/_deleting')
2020-11-05T03:00:07.741 INFO:teuthology.orchestra.run.smithi079.stdout:3
...
2020-11-05T03:00:12.917 INFO:tasks.cephfs_test_runner:======================================================================
2020-11-05T03:00:12.918 INFO:tasks.cephfs_test_runner:ERROR: test_subvolume_snapshot_clone_cancel_in_progress (tasks.cephfs.test_volumes.TestSubvolumeSnapshotClones)
2020-11-05T03:00:12.918 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2020-11-05T03:00:12.918 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2020-11-05T03:00:12.918 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20201104.220526/qa/tasks/cephfs/test_volumes.py", line 2829, in test_subvolume_snapshot_clone_cancel_in_progress
2020-11-05T03:00:12.919 INFO:tasks.cephfs_test_runner:    self._wait_for_trash_empty()
2020-11-05T03:00:12.919 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20201104.220526/qa/tasks/cephfs/test_volumes.py", line 291, in _wait_for_trash_empty
2020-11-05T03:00:12.919 INFO:tasks.cephfs_test_runner:    self.mount_a.wait_for_dir_empty(trashdir, timeout=timeout)
2020-11-05T03:00:12.919 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20201104.220526/qa/tasks/cephfs/mount.py", line 810, in wait_for_dir_empty
2020-11-05T03:00:12.919 INFO:tasks.cephfs_test_runner:    while proceed():
2020-11-05T03:00:12.920 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 133, in __call__
2020-11-05T03:00:12.920 INFO:tasks.cephfs_test_runner:    raise MaxWhileTries(error_msg)
2020-11-05T03:00:12.920 INFO:tasks.cephfs_test_runner:teuthology.exceptions.MaxWhileTries: reached maximum tries (6) after waiting for 30 seconds

From: /ceph/teuthology-archive/pdonnell-2020-11-05_00:20:13-fs-wip-pdonnell-testing-20201104.220526-distro-basic-smithi/5591293/teuthology.log

mgr deleted the file quickly:

2020-11-05T02:59:42.611+0000 7fc9edf0e700  8 client.10770 rmdir(#0x1000000021a/fc1b25a9-dc09-47f2-9257-8d7f7f61593b) = 0

From: /ceph/teuthology-archive/pdonnell-2020-11-05_00:20:13-fs-wip-pdonnell-testing-20201104.220526-distro-basic-smithi/5591293/remote/smithi079/log/ceph-mgr.y.log.gz

It looks like stat is returning the wrong result (3 hard links instead of 2).

Actions #1

Updated by Patrick Donnelly over 3 years ago

  • Labels (FS) qa-failure added
Actions #2

Updated by Patrick Donnelly over 3 years ago

  • Status changed from New to Triaged
  • Assignee set to Jeff Layton
Actions #3

Updated by Jeff Layton over 3 years ago

I assume that "_deleting" is a directory and that this test is expecting to see a particular link count on the directory when all of its entries are removed?

Note that POSIX doesn't place any specific meaning on the link count for directories, but it looks like ceph has some code for that in ceph_getattr:

                /*
                 * Some applications rely on the number of st_nlink
                 * value on directories to be either 0 (if unlinked)
                 * or 2 + number of subdirectories.
                 */
                if (stat->nlink == 1)
                        /* '.' + '..' + subdirs */
                        stat->nlink = 1 + 1 + ci->i_subdirs;

In any case, I think we only update i_subdirs when we have Fs caps, so I guess we need to include Fs caps when this is a directory and we're fetching the link count. I'll try to spin up a patch for it.

How reproducible is this?

Actions #4

Updated by Patrick Donnelly over 3 years ago

Jeff Layton wrote:

I assume that "_deleting" is a directory and that this test is expecting to see a particular link count on the directory when all of its entries are removed?

Yes. (Assuming all of its entries are sub-directories.)

Note that POSIX doesn't place any specific meaning on the link count for directories, but it looks like ceph has some code for that in ceph_getattr:

[...]

At least ext4 also does this:

pdonnell@aglarond ~$ mkdir foo
pdonnell@aglarond ~$ cd foo
pdonnell@aglarond ~/foo$ stat -c %h .
2
pdonnell@aglarond ~/foo$ mkdir bar
pdonnell@aglarond ~/foo$ stat -c %h .
3
pdonnell@aglarond ~/foo$ rmdir bar
pdonnell@aglarond ~/foo$ stat -c %h .
2
pdonnell@aglarond ~/foo$ 

In any case, I think we only update i_subdirs when we have Fs caps, so I guess we need to include Fs caps when this is a directory and we're fetching the link count. I'll try to spin up a patch for it.

Thanks!

How reproducible is this?

Not 100% but often. Here's the latest one: https://pulpito.ceph.com/pdonnell-2020-11-06_23:20:14-fs-wip-pdonnell-testing-20201106.185908-distro-basic-smithi/5597723/

Actions #5

Updated by Jeff Layton over 3 years ago

  • Status changed from Triaged to Fix Under Review
Actions #6

Updated by Patrick Donnelly over 3 years ago

  • Target version changed from v16.0.0 to v17.0.0
  • Backport set to pacific,octopus,nautilus
Actions #7

Updated by Jeff Layton over 2 years ago

  • Status changed from Fix Under Review to Resolved

Fixed in upstream commit 04fabb1199d1f995d6b9a1c42c046ac4bdac2d19.

Actions

Also available in: Atom PDF