Bug #23097: Stale directories and files in CentOS (release <= 7.3 or kernel version < 3.19) cannot get refreshed in time. - Linux kernel client - Ceph

Actions

Copy link

Bug #23097

closed

Stale directories and files in CentOS (release <= 7.3 or kernel version < 3.19) cannot get refreshed in time.

Added by yupeng chen about 6 years ago. Updated about 6 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

vfs

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

kcephfs

Crash signature (v1):

Crash signature (v2):

Description

In our cephfs environments(CentOS 7.2 with kernel: 3.10.0-514.10.2.el7.x86_64), we have an one-producer-multiple-consumers sceanario:

1、Assume the mountpoint for the cephfs is /mnt/cephfs
2、We performed like what the following code snippet does:
On producer client:
while [ true ]; do rm -fr /mnt/cephfs/pro.bak; mv /mnt/cephfs/pro /mnt/cephfs/pro.bak; mkdir /mnt/cephfs/pro && touch /mnt/cephfs/pro/[1..2]; sleep 60; done
On consumer client:
wath -n 10 "stat /mnt/cephfs/pro/* | tee -a ~/log.txt"
3、We expected that the consumers fetch the refreshed data immediately after the producer finished production. However, the actual result is that 'stat's issuied by the consumers did not refresh the stale directries or files. The stale directories or files got extinguished with the lease expiration.

Actions

Copy link

Updated by yupeng chen about 6 years ago

After the investigation, we found that when doing .d_revalidate() on the dentry of the stale directory, the cephfs would issue the lookup OP to MDS. Then with the MDSReply replied, ceph_fill_trace() is invoked to incorporate the fresh data into local cache. The newly created directory with the same name carries a different inode, so the stale dentry gets invalidated by d_invalidate(), expecting it to be unhashed from the global dentry hashtable.
However, the dentry is being referenced at least by the vfs lookup() and the ceph lookup request, leading to its reference count at least 2, so the __d_drop() in the d_invalidate is not invoked() on the dentry. The stale dentry is preserved on the hashtable.

d_invalidate():

if (dentry->d_lockref.count > 1 && dentry->d_inode) { // <=========
        if (S_ISDIR(dentry->d_inode->i_mode) || d_mountpoint(dentry)) {
            spin_unlock(&dentry->d_lock);
            return -EBUSY;
        }
    }
    __d_drop(dentry);

ceph_d_revalidate():

err = ceph_mdsc_do_request(mdsc, NULL, req);
            if (err  0 || err  ENOENT) {
                if (dentry  req>r_dentry) {
                    valid = !d_unhashed(dentry);// <========
                } else {
                    d_invalidate(req->r_dentry);
                    err = -EAGAIN;
                }
            }
    ... ...
    if (valid) {
        ceph_dentry_lru_touch(dentry);
    } else {
        ceph_dir_clear_complete(dir);
        d_drop(dentry);
    }
    return valid;

When the lookup request finishes, the condition 'if (dentry == req->r_dentry)' still holds, so 'valid' is evaluated from '!d_unhashed(dentry)' that evaluates true, so ceph_d_revalidate() returns true on the stale dentry, leading to the stale directory occur on the console.

The problem exists only in kernel version <3.19.
since starting from the kernel version 3.19(included), d_invalidate() changed the logic: the incoming dentry gets unhashed from the global dentry hashtable without regard to its reference count.

My fix is to insert d_drop() just after d_invalidate() in the ceph_fill_trace() where the problem occurs to confirm the stale dentry unhashed successfully.

I am not sure where to commit the bugfix, since this is problem on old linux kernel only. Or can I just push it to the master branch?

Actions

Copy link