Project

General

Profile

Bug #56524

xfstest-dev: generic/467 failed with "open_by_handle(/mnt/kcephfs.A/467-dir/file000006) opened an unlinked file!"

Added by Xiubo Li 7 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
kcephfs
Crash signature (v1):
Crash signature (v2):

Description

I hit it only once by using the 'testing' branch and haven't reproduced it yet:

generic/467 73s ... - output mismatch (see /data/xfstests-dev/results//generic/467.out.bad)
--- tests/generic/467.out 2022-04-07 21:42:10.690048179 0800
+
+ /data/xfstests-dev/results//generic/467.out.bad 2022-07-07 16:34:34.682222265 +0800
@ -1,5 +1,6 @
QA output created by 467
test_file_handles TEST_DIR/467-dir -dp
+open_by_handle(/mnt/kcephfs.A/467-dir/file000006) opened an unlinked file!
test_file_handles TEST_DIR/467-dir -rp
test_file_handles TEST_DIR/467-dir -dkr
test_file_handles TEST_DIR/467-dir -lr
...
(Run 'diff -u /data/xfstests-dev/tests/generic/467.out /data/xfstests-dev/results//generic/467.out.bad' to see the entire diff)

History

#1 Updated by Xiubo Li 7 months ago

Hit it again:

generic/467 73s ... - output mismatch (see /data/xfstests-dev/results//generic/467.out.bad)
    --- tests/generic/467.out    2022-04-07 21:42:10.690048179 +0800
    +++ /data/xfstests-dev/results//generic/467.out.bad    2022-07-12 16:52:45.692472984 +0800
    @@ -1,5 +1,6 @@
     QA output created by 467
     test_file_handles TEST_DIR/467-dir -dp
    +open_by_handle(/mnt/kcephfs.A/467-dir/file000002) opened an unlinked file!
     test_file_handles TEST_DIR/467-dir -rp
     test_file_handles TEST_DIR/467-dir -dkr
     test_file_handles TEST_DIR/467-dir -lr
    ...
    (Run 'diff -u /data/xfstests-dev/tests/generic/467.out /data/xfstests-dev/results//generic/467.out.bad'  to see the entire diff)
Ran: generic/467
Failures: generic/467
Failed 1 of 1 tests

#2 Updated by Xiubo Li 6 months ago

  • Status changed from New to In Progress

#3 Updated by Xiubo Li 6 months ago

Locally I have reproduced it and the root cause is that just after a file was deleted and only the unsafe reply was received, but the dentry was still held by the unlink request->r_dentry.

So later when the xfstests-dev test try to open it by using open_by_handle_at() the kernel will succeed:

182 static struct dentry *__fh_to_dentry(struct super_block *sb, u64 ino)
183 {                  
184         struct inode *inode = __lookup_inode(sb, ino);
185         int err;
186    
187         if (IS_ERR(inode))
188                 return ERR_CAST(inode);             
189         /* We need LINK caps to reliably check i_nlink */
190         err = ceph_do_getattr(inode, CEPH_CAP_LINK_SHARED, false);                                                                                 
191         if (err) {
192                 iput(inode);   
193                 return ERR_PTR(err);                
194         }
195         /* -ESTALE if inode as been unlinked and no file is open */
196         if ((inode->i_nlink == 0) && (atomic_read(&inode->i_count) == 1)) {
197                 iput(inode);   
198                 return ERR_PTR(-ESTALE);            
199         }
200         return d_obtain_alias(inode);       
201 }

To fix it we need to make sure the dentry is not unhashed.

#4 Updated by Xiubo Li 6 months ago

  • Status changed from In Progress to Fix Under Review

The patchwork link: https://patchwork.kernel.org/project/ceph-devel/patch/20220804080624.14768-1-xiubli@redhat.com/

commit 575c8218fa3e9399ca03c9e3840c6eab59749532 (HEAD -> lxb-testing37)
Author: Xiubo Li <xiubli@redhat.com>
Date:   Thu Aug 4 15:21:44 2022 +0800

    ceph: fail the open_by_handle_at() if the dentry is being unlinked

    When unlinking a file the kclient will send a unlink request to MDS
    by holding the dentry reference, and then the MDS will return 2 replies,
    which are unsafe reply and a deferred safe reply.

    After the unsafe reply received the kernel will return and succeed
    the unlink request to user space apps.

    Only when the safe reply received the dentry's reference will be
    released. Or the dentry will only be unhashed from dcache. But when
    the open_by_handle_at() begins to open the unlinked files it will
    succeed.

    URL: https://tracker.ceph.com/issues/56524
    Signed-off-by: Xiubo Li <xiubli@redhat.com>

#5 Updated by Xiubo Li 4 months ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF