Bug #49912
closedclient: dir->dentries inconsistent, both newname and oldname points to same inode, mv complains "are the same file"
0%
Description
we have an applications that use FS as a lock --- an empty file named .dw_gem2_cmn_sd_{INPROGRESS/COMPLETE} , application use mv that move from .dw_gem2_cmn_sd_INPROGRESS to .dw_gem2_cmn_sd_COMPLETE and reverse. Sometimes the mv complains .dw_gem2_cmn_sd_INPROGRESS and .dw_gem2_cmn_sd_COMPLETE are the same file. ls is good however stat both INPROGRESS and COMPLETE returns same inode.
bash-4.2# ls -ialh total 7.0K 5497578360817 drwxr-xr-x 2 dw_adm dw_ops 2.1K Mar 21 05:18 . 4398051190604 drwxrwxr-x 5 dw_adm dw_ops 4.2K Mar 3 10:36 .. 5497607496142 -rw-r--r-- 1 dw_adm dw_ops 159 Mar 21 03:47 .dw_gem2_cmn_actvty_cd_COMPLETE 5497607496143 -rw-r--r-- 1 dw_adm dw_ops 158 Mar 21 03:09 .dw_gem2_cmn_adj_rvnu_i_COMPLETE 5497661531523 -rw-r--r-- 1 dw_adm dw_ops 177 Mar 21 03:32 .dw_gem2_cmn_adj_rvnu_i_etrs_COMPLETE 5497607496144 -rw-r--r-- 1 dw_adm dw_ops 166 Mar 21 05:10 .dw_gem2_cmn_ck_i_COMPLETE 5497607496145 -rw-r--r-- 1 dw_adm dw_ops 156 Mar 21 03:20 .dw_gem2_cmn_lstg_i_COMPLETE 5497607496146 -rw-r--r-- 1 dw_adm dw_ops 154 Mar 21 03:09 .dw_gem2_cmn_rvnu_i_COMPLETE 5497661523504 -rw-r--r-- 1 dw_adm dw_ops 169 Mar 21 03:32 .dw_gem2_cmn_rvnu_i_etrs_COMPLETE 5497653129440 -rw-r--r-- 1 dw_adm dw_ops 177 Dec 7 23:53 .dw_gem2_cmn_sap_sd_INPROGRESS 5497607496147 -rw-r--r-- 1 dw_adm dw_ops 163 Mar 21 05:18 .dw_gem2_cmn_sd_INPROGRESS 5497578827365 -rw-r--r-- 1 dw_adm dw_ops 187 Mar 2 10:08 .gem2_cmn_aum_off_pltfrm_cm_COMPLETE 5497578360826 -rw-r--r-- 1 dw_adm dw_ops 201 Mar 14 10:07 .gem2_cmn_aum_off_pltfrm_cw_COMPLETE 5497607496148 -rw-r--r-- 1 dw_adm dw_ops 191 Mar 21 03:10 .gem2_cmn_off_pltfrm_i_COMPLETE bash-4.2# stat .dw_gem2_cmn_sd_INPROGRESS File: ‘.dw_gem2_cmn_sd_INPROGRESS’ Size: 163 Blocks: 1 IO Block: 4194304 regular file Device: 92h/146d Inode: 5497607496147 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 506/ dw_adm) Gid: ( 21/ dw_ops) Access: 2020-03-31 20:44:17.108239682 -0700 Modify: 2021-03-21 05:18:47.373138099 -0700 Change: 2021-03-21 05:18:47.373138099 -0700 Birth: - bash-4.2# stat .dw_gem2_cmn_sd_COMPLETE File: ‘.dw_gem2_cmn_sd_COMPLETE’ Size: 163 Blocks: 1 IO Block: 4194304 regular file Device: 92h/146d Inode: 5497607496147 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 506/ dw_adm) Gid: ( 21/ dw_ops) Access: 2020-03-31 20:44:17.108239682 -0700 Modify: 2021-03-21 05:18:47.373138099 -0700 Change: 2021-03-21 05:18:47.373138099 -0700 Birth: -
We debugged into the issue, see below, the dn of the inode 0x50002f121d3 should be .dw_gem2_cmn_sd_INPROGRESS(see the dump inode from mds), when ls, it reads from dir->readdir_cache, however when doing the stat, it checks the dn from dir->dentries, the .dw_gem2_cmn_sd_COMPLETE still exists in the dn (which it should not), and as the inode satisfy CEPH_STAT_CAP_INODE_ALL(the the dump inode result below), it returns the cache result.
dropping the dentry/inode cache by echo 2> /proc/sys/vm/drop_caches and redo the stat .dw_gem2_cmn_sd_COMPLETE get properly -2(right result)
if (dir->caps_issued_mask(CEPH_CAP_FILE_SHARED, true)) { if (dn->cap_shared_gen == dir->shared_gen && (!dn->inode || dn->inode->caps_issued_mask(mask, true))) goto hit_dn;
Debug log (debug_client 20/debug_ms 1)
2021-03-21T05:29:10.942-0700 7fccff7fe700 15 inode.put on 0x7fcd5ed3c000 0x50006cc83ae.head now 4 2021-03-21T05:29:10.942-0700 7fcd85ffb700 3 client.96500661 ll_lookup 0x50001348ff1.head .dw_gem2_cmn_sd_COMPLETE 2021-03-21T05:29:10.942-0700 7fcd85ffb700 20 client.96500661 may_lookup 0x50001348ff1.head(faked_ino=0 ref=3 ll_ref=498 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2019-09-24T15:45:53.816441-0700 mtime=2021-03-21T05:18:47.362286-0700 ctime=2021-03-21T05:18:47.362286-0700 caps=pAsLsXsFs(4=pAsLsXsFs) COMPLETE parents=0x4000047674c.head[".lock"] 0x7fcc7ada61f0); UserPerm(uid: 0, gid: 0) 2021-03-21T05:29:10.942-0700 7fcd85ffb700 10 client.96500661 _getattr mask As issued=1 2021-03-21T05:29:10.942-0700 7fcd85ffb700 3 client.96500661 may_lookup 0x7fcc7ada61f0 = 0 2021-03-21T05:29:10.942-0700 7fcd85ffb700 20 client.96500661 _lookup have dn .dw_gem2_cmn_sd_COMPLETE mds.-1 ttl 0.000000 seq 0 2021-03-21T05:29:10.942-0700 7fcd85ffb700 15 inode.get on 0x7fccc2606910 0x50002f121d3.head now 4 2021-03-21T05:29:10.942-0700 7fcd85ffb700 10 client.96500661 _lookup 0x50001348ff1.head(faked_ino=0 ref=3 ll_ref=498 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2019-09-24T15:45:53.816441-0700 mtime=2021-03-21T05:18:47.362286-0700 ctime=2021-03-21T05:18:47.362286-0700 caps=pAsLsXsFs(4=pAsLsXsFs) COMPLETE parents=0x4000047674c.head[".lock"] 0x7fcc7ada61f0) .dw_gem2_cmn_sd_COMPLETE = 0x50002f121d3.head(faked_ino=0 ref=4 ll_ref=15 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100644 size=163/0 nlink=1 btime=2020-03-31T20:44:17.111315-0700 mtime=2021-03-21T05:18:47.373138-0700 ctime=2021-03-21T05:18:47.373138-0700 caps=pAsLsXsFscr(4=pAsLsXsFscr) objectset[0x50002f121d3 ts 6858/0 objects 1 dirty_or_tx 0] parents=0x50001348ff1.head[".dw_gem2_cmn_sd_COMPLETE"], 0x50001348ff1.head[".dw_gem2_cmn_sd_INPROGRESS"] 0x7fccc2606910) 2021-03-21T05:29:10.942-0700 7fcd85ffb700 10 client.96500661 fill_stat on 0x50002f121d3 snap/devhead mode 0100644 mtime 2021-03-21T05:18:47.373138-0700 ctime 2021-03-21T05:18:47.373138-0700 2021-03-21T05:29:10.942-0700 7fcd85ffb700 20 client.96500661 _ll_get 0x7fccc2606910 0x50002f121d3 -> 16 2021-03-21T05:29:10.942-0700 7fcd85ffb700 3 client.96500661 ll_lookup 0x50001348ff1.head .dw_gem2_cmn_sd_COMPLETE -> 0 (50002f121d3)
dump inode from mds.
root@mds:/var/run/ceph# ceph daemon /var/run/ceph/mds.asok dump inode 0x50002f121d3 { "path": "/sharefs_prod/etl/watch/dat/hd10/dw_gem2/.lock/.dw_gem2_cmn_sd_INPROGRESS", "ino": 5497607496147, "rdev": 0, "ctime": "2021-03-21T05:18:47.373138-0700", "btime": "2020-03-31T20:44:17.111315-0700", "mode": 33188, "uid": 506, "gid": 21, "nlink": 1, "dir_layout": { "dir_hash": 0, "unused1": 0, "unused2": 0, "unused3": 0 }, "layout": { "stripe_unit": 4194304, "stripe_count": 1, "object_size": 4194304, "pool_id": 7, "pool_ns": "" }, "old_pools": [ 2 ], "size": 163, "truncate_seq": 6858, "truncate_size": 0, "truncate_from": 0, "truncate_pending": 0, "mtime": "2021-03-21T05:18:47.373138-0700", "atime": "2020-03-31T20:44:17.108239-0700", "time_warp_seq": 1, "change_attr": 27434, "export_pin": -1, "export_ephemeral_random_pin": 0, "export_ephemeral_distributed_pin": false, "client_ranges": [], "dirstat": { "version": 0, "mtime": "0.000000", "num_files": 0, "num_subdirs": 0, "change_attr": 0 }, "rstat": { "version": 0, "rbytes": 163, "rfiles": 1, "rsubdirs": 0, "rsnaps": 0, "rctime": "2021-03-21T05:18:47.373138-0700" }, "accounted_rstat": { "version": 0, "rbytes": 163, "rfiles": 1, "rsubdirs": 0, "rsnaps": 0, "rctime": "2021-03-21T05:18:47.373138-0700" }, "version": 193545, "file_data_version": 0, "xattr_version": 1, "backtrace_version": 193538, "stray_prior_path": "", "max_size_ever": 173, "quota": { "max_bytes": 0, "max_files": 0 }, "last_scrub_stamp": "0.000000", "last_scrub_version": 0, "symlink": "", "xattrs": [], "dirfragtree": { "splits": [] }, "old_inodes": [], "oldest_snap": 18446744073709551614, "damage_flags": 0, "is_auth": true, "auth_state": { "replicas": {} }, "replica_state": { "authority": [ 4, -2 ], "replica_nonce": 0 }, "auth_pins": 0, "is_frozen": false, "is_freezing": false, "pins": { "ptrwaiter": 0, "request": 0, "lock": 0, "caps": 1, "truncating": 0, "dirtyrstat": 0, "dirtyparent": 1, "dirty": 1, "waiter": 0, "authpin": 0 }, "nref": 3, "versionlock": { "gather_set": [], "state": "lock", "is_leased": false, "num_rdlocks": 0, "num_wrlocks": 0, "num_xlocks": 0, "xlock_by": {} }, "authlock": {}, "linklock": {}, "dirfragtreelock": {}, "filelock": {}, "xattrlock": {}, "snaplock": {}, "nestlock": { "gather_set": [], "state": "lock", "is_leased": false, "num_rdlocks": 0, "num_wrlocks": 0, "num_xlocks": 0, "xlock_by": {} }, "flocklock": {}, "policylock": {}, "states": [ "auth", "dirty", "dirtyparent" ], "client_caps": [ { "client_id": 96500661, "pending": "pAsLsXsFscr", "issued": "pAsLsXsFscr", "wanted": "-", "last_sent": 151 } ], "loner": -1, "want_loner": -1, "mds_caps_wanted": [] }
Updated by Patrick Donnelly about 3 years ago
- Subject changed from dir->dentries inconsistent, both newname and oldname points to same inode, mv complains "are the same file" to client: dir->dentries inconsistent, both newname and oldname points to same inode, mv complains "are the same file"
- Status changed from New to Triaged
- Assignee set to Xiubo Li
- Target version set to v17.0.0
- Component(FS) Client added
Updated by Jeff Layton about 3 years ago
I think that after the mv, the directory should no longer be considered ORDERED. We probably can consider it complete in that we know that the dentry should no longer exist at that point.
One question: When you see this occur on a client, was the previous mv done on that client or on a different one? If you can reproduce this all with a single client, then we probably need to fix up the rename codepaths. If the mv occurs on a different client completely, then we may need to look at lease/cap handling.
Updated by Xiubo Li about 3 years ago
@Xiaoxi
Is this reproduceable for you ? If so, how often ? Locally I was trying in a loop by renaming two file for hours, and didn't hit any issue. Is there any other setting you did ? Or do you have more logs ?
Updated by Xiubo Li about 3 years ago
I am doubting that if there has two tasks are doing the rename:
For task1, if it just do _lookup(_INPROGRESS) and then sleep to wait for the client_lock when doing the _lookup(_COMPLETE).
If then task2 start to run and will finish the rename before the task1 is wokeup, and after that the task1's _lookup(_COMPLETE) will succeed, then the task1 will get the errors like this.
Not sure whether the test case above is similiar with yours.
In two terminals to run the follwoing command, randomly I can hit:
# while [ 1 ]; do date; mv file file_tmp; mv file_tmp file; done mv: cannot move 'file' to 'file_tmp': No such file or directory Tue Mar 23 17:25:51 CST 2021 Tue Mar 23 17:25:51 CST 2021 mv: overwrite 'file'?
Updated by Xiaoxi Chen about 3 years ago
Jeff Layton wrote:
I think that after the mv, the directory should no longer be considered ORDERED. We probably can consider it complete in that we know that the dentry should no longer exist at that point.
One question: When you see this occur on a client, was the previous mv done on that client or on a different one? If you can reproduce this all with a single client, then we probably need to fix up the rename codepaths. If the mv occurs on a different client completely, then we may need to look at lease/cap handling.
There are several clients that mounting the shares however the MV (should only run on one node, based on the input of application owner), at least, the node that complains "are the same file" always the same node.
The application does thousands of MV each day, it fails ~ 2-3 times per day, consistently reproduce-able.
Updated by Xiaoxi Chen about 3 years ago
Xiubo Li wrote:
I am doubting that if there has two tasks are doing the rename:
For task1, if it just do _lookup(_INPROGRESS) and then sleep to wait for the client_lock when doing the _lookup(_COMPLETE).
If then task2 start to run and will finish the rename before the task1 is wokeup, and after that the task1's _lookup(_COMPLETE) will succeed, then the task1 will get the errors like this.
Not sure whether the test case above is similiar with yours.
In two terminals to run the follwoing command, randomly I can hit:
[...]
Similar with my guessing though I believe it is two task both trying to "mv INPROGRESS COMPLETE".
Task 1 lookup INPROGRESS ,sleep, task2 lookup INPROGRESS, lookup COMPLETE, finish mv. task 1 woke up lookup COMPLETE then complains "are the same file". However application team doesnt believe and based on the fact that only one client has cap, so it is hard for us to find out the other competitor.
Regardless, the dirty dentries in readdir_cache is an issue, that makes the application cannot retry and recovery it self.
Updated by Xiubo Li about 3 years ago
Xiaoxi Chen wrote:
Jeff Layton wrote:
I think that after the mv, the directory should no longer be considered ORDERED. We probably can consider it complete in that we know that the dentry should no longer exist at that point.
One question: When you see this occur on a client, was the previous mv done on that client or on a different one? If you can reproduce this all with a single client, then we probably need to fix up the rename codepaths. If the mv occurs on a different client completely, then we may need to look at lease/cap handling.
There are several clients that mounting the shares however the MV (should only run on one node, based on the input of application owner), at least, the node that complains "are the same file" always the same node.
The application does thousands of MV each day, it fails ~ 2-3 times per day, consistently reproduce-able.
I doubt that it is the multiple clients and lease/cap issue caused this as Jeff mentioned above.
Updated by Xiubo Li about 3 years ago
Since the `ls` command output was correct, the ORDERED flag should have been cleared as expected, or it should show '.dw_gem2_cmn_sd_COMPLETE' instead of '.dw_gem2_cmn_sd_INPROGRESS'.
I still couldn't figure out why the old dentry wasn't removed from 'dir->dentries' after `mv`, from the code the old dentry removing and new dentry inserting are both under a signal client_lock scope, no unlock/lock happen during that. Till now what I can figure out is that the only possible case is the `request->old_dentry()` is NULL for some reason.
Updated by Xiubo Li about 3 years ago
I have figured one case could reprodce it in theory:
1, I have check the `mv` source code, before doing the `rename`, it will stat both the '_INPROGRESS' and '_COMPLETE' files one by one.
2, If there has two threads are doing the 'mv', and when the thread_A is on the way, and in the thread_B it is possible that the two stats will both be successful and return the same info. That's because during the two stats gap in thread_B, thread_A could have finished the 'mv' successfully. Locally I can reproduce this.
3, Then in thread_B, the 'mv' itself will try to compare the two stats' info before doing the rename, which will fail with they are the same file.
4, In the Description, you mentioned when stating the two files, they are showing the same info. Possiblly the 'mv' still on the way in the app when you were doing these ? So next time when you do the stat again it may fail, no matter you did the caches dropping or not.
5, I have checked the drop caches related code, it won't do anything that's related to the dir->dentries.
Updated by Xiubo Li almost 3 years ago
- Status changed from Triaged to Fix Under Review
Updated by Rishabh Dave 10 months ago
The PR has been merged. Should this PR be backported?
Updated by Backport Bot 10 months ago
- Copied to Backport #62010: quincy: client: dir->dentries inconsistent, both newname and oldname points to same inode, mv complains "are the same file" added
Updated by Backport Bot 10 months ago
- Copied to Backport #62011: reef: client: dir->dentries inconsistent, both newname and oldname points to same inode, mv complains "are the same file" added
Updated by Backport Bot 10 months ago
- Copied to Backport #62012: pacific: client: dir->dentries inconsistent, both newname and oldname points to same inode, mv complains "are the same file" added