Project

General

Profile

Actions

Bug #40746

closed

client: removing dir reports "not empty" issue due to client side filled wrong dir offset

Added by Peng Xie almost 5 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
nautilus,mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

recently, during use nfs-ganesha+cephfs, we found some "directory not empty error" when removing
existing directory.

after deeper investigating the interaction between nfs-ganesha and cephfs, we root cause the
problem due to the readdir missing some of the dir entries prior to the rmdir operation.

the problem is: during the first time of "ls" directory "a", ceph client readdir_cache is empty and
fetches the entries from mds and fill the dir entry with the correct cookie for ganesha mdcache.
however, the second time "ls" operation of directory "a" for the rmdir operation, missing the last
entry in ganesha mdcache and filled from ceph client side readdir_cache, in which _readdir_cache_cb
wrongly calculate the cookie , finally causing the misbehaving in ganesha mdcache.

for example, in our error case, directory "a" contains 2 ordinary files: "f6b6" and "f78c"
here is the problematic logs we got in the ceph client, the detailed explanation was in
the error log's comment:

ceph client log :

.........
2019-07-10 14:34:03.500338 7ff66a2d0700 10 client.13224161 readdir_r_cb 100001d7cac.head(faked_ino=0 ref=4 ll_ref=3 cap_refs={} open={} mode=40777 size=0/0 mtime=2019-07-10 12:24:29.172806 mds_cap_wantedFx caps=pAsLsXsFsx(0=pAsLsXsFsx/0/0) COMPLETE parents=0x4b76920 0x49d1900) offset ff8f9a040000003 at_end=0 hash_order=1
2019-07-10 14:34:03.500349 7ff66a2d0700 10 client.13224161 offset ff8f9a040000003 snapid head (complete && ordered) 1 issued pAsLsXsFsx
2019-07-10 14:34:03.500353 7ff66a2d0700 10 client.13224161 _readdir_cache_cb 0x2bb71e0 on 100001d7cac last_name f6b6 offset ff8f9a040000003 <<<---- the previous filled up dir entries "f6b6"
2019-07-10 14:34:03.500356 7ff66a2d0700 10 client.13224161 fill_stat on 100001d74ab snap/devhead mode 0100666 mtime 2019-07-10 12:52:13.901023 ctime 2019-07-10 12:29:48.863330
2019-07-10 14:34:03.500361 7ff66a2d0700 10 client.13224161 fill_dirent 'f78c' > 100001d74ab type 8 w/ next_off 1000000000000000 .
<<<--
the next_off will be filled up 'f78c' dir entry's off and reply to nfs-ganesha as its mdcache
dirent's cookie which now be set wrongly to dir_result_t::END (1000000000000000) in the following code logic:

Client::_readdir_cache_cb(...) {
......
uint64_t next_off = dn->offset + 1;
++pd;
if (pd == dir->readdir_cache.end())
next_off = dir_result_t::END; <<<---- dir entry is "f78c" whose filled off should not be END

Inode *in = NULL;
fill_dirent(&de, dn->name.c_str(), stx.stx_mode, stx.stx_ino, next_off)
.....
}

At last, in nfs-ganesha, the expected the 'f78c''s lookup cookie was "ffe9bfe10000003" mismatching
with the "1000000000000000", so return the nfs client without 'f78c' and leaving directory "a" as
not empty
.


Related issues 3 (0 open3 closed)

Copied to CephFS - Backport #41855: nautilus: client: removing dir reports "not empty" issue due to client side filled wrong dir offset ResolvedPrashant DActions
Copied to CephFS - Backport #41856: mimic: client: removing dir reports "not empty" issue due to client side filled wrong dir offset ResolvedPrashant DActions
Copied to CephFS - Backport #41857: luminous: client: removing dir reports "not empty" issue due to client side filled wrong dir offset ResolvedPatrick DonnellyActions
Actions

Also available in: Atom PDF