Bug #49912: client: dir->dentries inconsistent, both newname and oldname points to same inode, mv complains "are the same file" - CephFS - Ceph

Actions

Copy link

Bug #49912

closed

client: dir->dentries inconsistent, both newname and oldname points to same inode, mv complains "are the same file"

Added by Xiaoxi Chen about 3 years ago. Updated 8 months ago.

Status:

Resolved

Priority:

High

Assignee:

Xiubo Li

Category:

Correctness/Safety

Target version:

Ceph - v17.0.0

% Done:

Source:

Community (dev)

Tags:

backport_processed

Backport:

pacific, quincy, reef

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v15.2.2

ceph-qa-suite:

Component(FS):

Client, ceph-fuse

Labels (FS):

Pull request ID:

40787

Crash signature (v1):

Crash signature (v2):

Description

we have an applications that use FS as a lock --- an empty file named .dw_gem2_cmn_sd_{INPROGRESS/COMPLETE} , application use mv that move from .dw_gem2_cmn_sd_INPROGRESS to .dw_gem2_cmn_sd_COMPLETE and reverse. Sometimes the mv complains .dw_gem2_cmn_sd_INPROGRESS and .dw_gem2_cmn_sd_COMPLETE are the same file. ls is good however stat both INPROGRESS and COMPLETE returns same inode.

bash-4.2# ls -ialh
total 7.0K
5497578360817 drwxr-xr-x 2 dw_adm dw_ops 2.1K Mar 21 05:18 .
4398051190604 drwxrwxr-x 5 dw_adm dw_ops 4.2K Mar  3 10:36 ..
5497607496142 -rw-r--r-- 1 dw_adm dw_ops  159 Mar 21 03:47 .dw_gem2_cmn_actvty_cd_COMPLETE
5497607496143 -rw-r--r-- 1 dw_adm dw_ops  158 Mar 21 03:09 .dw_gem2_cmn_adj_rvnu_i_COMPLETE
5497661531523 -rw-r--r-- 1 dw_adm dw_ops  177 Mar 21 03:32 .dw_gem2_cmn_adj_rvnu_i_etrs_COMPLETE
5497607496144 -rw-r--r-- 1 dw_adm dw_ops  166 Mar 21 05:10 .dw_gem2_cmn_ck_i_COMPLETE
5497607496145 -rw-r--r-- 1 dw_adm dw_ops  156 Mar 21 03:20 .dw_gem2_cmn_lstg_i_COMPLETE
5497607496146 -rw-r--r-- 1 dw_adm dw_ops  154 Mar 21 03:09 .dw_gem2_cmn_rvnu_i_COMPLETE
5497661523504 -rw-r--r-- 1 dw_adm dw_ops  169 Mar 21 03:32 .dw_gem2_cmn_rvnu_i_etrs_COMPLETE
5497653129440 -rw-r--r-- 1 dw_adm dw_ops  177 Dec  7 23:53 .dw_gem2_cmn_sap_sd_INPROGRESS
5497607496147 -rw-r--r-- 1 dw_adm dw_ops  163 Mar 21 05:18 .dw_gem2_cmn_sd_INPROGRESS
5497578827365 -rw-r--r-- 1 dw_adm dw_ops  187 Mar  2 10:08 .gem2_cmn_aum_off_pltfrm_cm_COMPLETE
5497578360826 -rw-r--r-- 1 dw_adm dw_ops  201 Mar 14 10:07 .gem2_cmn_aum_off_pltfrm_cw_COMPLETE
5497607496148 -rw-r--r-- 1 dw_adm dw_ops  191 Mar 21 03:10 .gem2_cmn_off_pltfrm_i_COMPLETE
bash-4.2# stat .dw_gem2_cmn_sd_INPROGRESS
  File: ‘.dw_gem2_cmn_sd_INPROGRESS’
  Size: 163           Blocks: 1          IO Block: 4194304 regular file
Device: 92h/146d    Inode: 5497607496147  Links: 1
Access: (0644/-rw-r--r--)  Uid: (  506/  dw_adm)   Gid: (   21/  dw_ops)
Access: 2020-03-31 20:44:17.108239682 -0700
Modify: 2021-03-21 05:18:47.373138099 -0700
Change: 2021-03-21 05:18:47.373138099 -0700
 Birth: -
bash-4.2# stat .dw_gem2_cmn_sd_COMPLETE
  File: ‘.dw_gem2_cmn_sd_COMPLETE’
  Size: 163           Blocks: 1          IO Block: 4194304 regular file
Device: 92h/146d    Inode: 5497607496147  Links: 1
Access: (0644/-rw-r--r--)  Uid: (  506/  dw_adm)   Gid: (   21/  dw_ops)
Access: 2020-03-31 20:44:17.108239682 -0700
Modify: 2021-03-21 05:18:47.373138099 -0700
Change: 2021-03-21 05:18:47.373138099 -0700
 Birth: -

We debugged into the issue, see below, the dn of the inode 0x50002f121d3 should be .dw_gem2_cmn_sd_INPROGRESS(see the dump inode from mds), when ls, it reads from dir->readdir_cache, however when doing the stat, it checks the dn from dir->dentries, the .dw_gem2_cmn_sd_COMPLETE still exists in the dn (which it should not), and as the inode satisfy CEPH_STAT_CAP_INODE_ALL(the the dump inode result below), it returns the cache result.

dropping the dentry/inode cache by echo 2> /proc/sys/vm/drop_caches and redo the stat .dw_gem2_cmn_sd_COMPLETE get properly -2(right result)

      if (dir->caps_issued_mask(CEPH_CAP_FILE_SHARED, true)) {
    if (dn->cap_shared_gen == dir->shared_gen &&
        (!dn->inode || dn->inode->caps_issued_mask(mask, true)))
          goto hit_dn;

Debug log (debug_client 20/debug_ms 1)

2021-03-21T05:29:10.942-0700 7fccff7fe700 15 inode.put on 0x7fcd5ed3c000 0x50006cc83ae.head now 4
2021-03-21T05:29:10.942-0700 7fcd85ffb700  3 client.96500661 ll_lookup 0x50001348ff1.head .dw_gem2_cmn_sd_COMPLETE
2021-03-21T05:29:10.942-0700 7fcd85ffb700 20 client.96500661 may_lookup 0x50001348ff1.head(faked_ino=0 ref=3 ll_ref=498 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2019-09-24T15:45:53.816441-0700 mtime=2021-03-21T05:18:47.362286-0700 ctime=2021-03-21T05:18:47.362286-0700 caps=pAsLsXsFs(4=pAsLsXsFs) COMPLETE parents=0x4000047674c.head[".lock"] 0x7fcc7ada61f0); UserPerm(uid: 0, gid: 0)
2021-03-21T05:29:10.942-0700 7fcd85ffb700 10 client.96500661 _getattr mask As issued=1
2021-03-21T05:29:10.942-0700 7fcd85ffb700  3 client.96500661 may_lookup 0x7fcc7ada61f0 = 0
2021-03-21T05:29:10.942-0700 7fcd85ffb700 20 client.96500661 _lookup have dn .dw_gem2_cmn_sd_COMPLETE mds.-1 ttl 0.000000 seq 0
2021-03-21T05:29:10.942-0700 7fcd85ffb700 15 inode.get on 0x7fccc2606910 0x50002f121d3.head now 4
2021-03-21T05:29:10.942-0700 7fcd85ffb700 10 client.96500661 _lookup 0x50001348ff1.head(faked_ino=0 ref=3 ll_ref=498 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2019-09-24T15:45:53.816441-0700 mtime=2021-03-21T05:18:47.362286-0700 ctime=2021-03-21T05:18:47.362286-0700 caps=pAsLsXsFs(4=pAsLsXsFs) COMPLETE parents=0x4000047674c.head[".lock"] 0x7fcc7ada61f0) .dw_gem2_cmn_sd_COMPLETE = 0x50002f121d3.head(faked_ino=0 ref=4 ll_ref=15 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100644 size=163/0 nlink=1 btime=2020-03-31T20:44:17.111315-0700 mtime=2021-03-21T05:18:47.373138-0700 ctime=2021-03-21T05:18:47.373138-0700 caps=pAsLsXsFscr(4=pAsLsXsFscr) objectset[0x50002f121d3 ts 6858/0 objects 1 dirty_or_tx 0] parents=0x50001348ff1.head[".dw_gem2_cmn_sd_COMPLETE"], 0x50001348ff1.head[".dw_gem2_cmn_sd_INPROGRESS"] 0x7fccc2606910)
2021-03-21T05:29:10.942-0700 7fcd85ffb700 10 client.96500661 fill_stat on 0x50002f121d3 snap/devhead mode 0100644 mtime 2021-03-21T05:18:47.373138-0700 ctime 2021-03-21T05:18:47.373138-0700
2021-03-21T05:29:10.942-0700 7fcd85ffb700 20 client.96500661 _ll_get 0x7fccc2606910 0x50002f121d3 -> 16
2021-03-21T05:29:10.942-0700 7fcd85ffb700  3 client.96500661 ll_lookup 0x50001348ff1.head .dw_gem2_cmn_sd_COMPLETE -> 0 (50002f121d3)

dump inode from mds.

root@mds:/var/run/ceph# ceph daemon /var/run/ceph/mds.asok dump inode 0x50002f121d3
{
    "path": "/sharefs_prod/etl/watch/dat/hd10/dw_gem2/.lock/.dw_gem2_cmn_sd_INPROGRESS",
    "ino": 5497607496147,
    "rdev": 0,
    "ctime": "2021-03-21T05:18:47.373138-0700",
    "btime": "2020-03-31T20:44:17.111315-0700",
    "mode": 33188,
    "uid": 506,
    "gid": 21,
    "nlink": 1,
    "dir_layout": {
        "dir_hash": 0,
        "unused1": 0,
        "unused2": 0,
        "unused3": 0
    },
    "layout": {
        "stripe_unit": 4194304,
        "stripe_count": 1,
        "object_size": 4194304,
        "pool_id": 7,
        "pool_ns": "" 
    },
    "old_pools": [
        2
    ],
    "size": 163,
    "truncate_seq": 6858,
    "truncate_size": 0,
    "truncate_from": 0,
    "truncate_pending": 0,
    "mtime": "2021-03-21T05:18:47.373138-0700",
    "atime": "2020-03-31T20:44:17.108239-0700",
    "time_warp_seq": 1,
    "change_attr": 27434,
    "export_pin": -1,
    "export_ephemeral_random_pin": 0,
    "export_ephemeral_distributed_pin": false,
    "client_ranges": [],
    "dirstat": {
        "version": 0,
        "mtime": "0.000000",
        "num_files": 0,
        "num_subdirs": 0,
        "change_attr": 0
    },
    "rstat": {
        "version": 0,
        "rbytes": 163,
        "rfiles": 1,
        "rsubdirs": 0,
        "rsnaps": 0,
        "rctime": "2021-03-21T05:18:47.373138-0700" 
    },
    "accounted_rstat": {
        "version": 0,
        "rbytes": 163,
        "rfiles": 1,
        "rsubdirs": 0,
        "rsnaps": 0,
        "rctime": "2021-03-21T05:18:47.373138-0700" 
    },
    "version": 193545,
    "file_data_version": 0,
    "xattr_version": 1,
    "backtrace_version": 193538,
    "stray_prior_path": "",
    "max_size_ever": 173,
    "quota": {
        "max_bytes": 0,
        "max_files": 0
    },
    "last_scrub_stamp": "0.000000",
    "last_scrub_version": 0,
    "symlink": "",
    "xattrs": [],
    "dirfragtree": {
        "splits": []
    },
    "old_inodes": [],
    "oldest_snap": 18446744073709551614,
    "damage_flags": 0,
    "is_auth": true,
    "auth_state": {
        "replicas": {}
    },
    "replica_state": {
        "authority": [
            4,
            -2
        ],
        "replica_nonce": 0
    },
    "auth_pins": 0,
    "is_frozen": false,
    "is_freezing": false,
    "pins": {
        "ptrwaiter": 0,
        "request": 0,
        "lock": 0,
        "caps": 1,
        "truncating": 0,
        "dirtyrstat": 0,
        "dirtyparent": 1,
        "dirty": 1,
        "waiter": 0,
        "authpin": 0
    },
    "nref": 3,
    "versionlock": {
        "gather_set": [],
        "state": "lock",
        "is_leased": false,
        "num_rdlocks": 0,
        "num_wrlocks": 0,
        "num_xlocks": 0,
        "xlock_by": {}
    },
    "authlock": {},
    "linklock": {},
    "dirfragtreelock": {},
    "filelock": {},
    "xattrlock": {},
    "snaplock": {},
    "nestlock": {
        "gather_set": [],
        "state": "lock",
        "is_leased": false,
        "num_rdlocks": 0,
        "num_wrlocks": 0,
        "num_xlocks": 0,
        "xlock_by": {}
    },
    "flocklock": {},
    "policylock": {},
    "states": [
        "auth",
        "dirty",
        "dirtyparent" 
    ],
    "client_caps": [
        {
            "client_id": 96500661,
            "pending": "pAsLsXsFscr",
            "issued": "pAsLsXsFscr",
            "wanted": "-",
            "last_sent": 151
        }
    ],
    "loner": -1,
    "want_loner": -1,
    "mds_caps_wanted": []
}

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Patrick Donnelly about 3 years ago

Subject changed from dir->dentries inconsistent, both newname and oldname points to same inode, mv complains "are the same file" to client: dir->dentries inconsistent, both newname and oldname points to same inode, mv complains "are the same file"
Status changed from New to Triaged
Assignee set to Xiubo Li
Target version set to v17.0.0
Component(FS) Client added

Actions

Copy link

Updated by Jeff Layton about 3 years ago

I think that after the mv, the directory should no longer be considered ORDERED. We probably can consider it complete in that we know that the dentry should no longer exist at that point.

One question: When you see this occur on a client, was the previous mv done on that client or on a different one? If you can reproduce this all with a single client, then we probably need to fix up the rename codepaths. If the mv occurs on a different client completely, then we may need to look at lease/cap handling.

Actions

Copy link

Updated by Xiubo Li about 3 years ago

@Xiaoxi

Is this reproduceable for you ? If so, how often ? Locally I was trying in a loop by renaming two file for hours, and didn't hit any issue. Is there any other setting you did ? Or do you have more logs ?

Actions

Copy link

Updated by Xiubo Li about 3 years ago

I am doubting that if there has two tasks are doing the rename:

For task1, if it just do _lookup(_INPROGRESS) and then sleep to wait for the client_lock when doing the _lookup(_COMPLETE).

If then task2 start to run and will finish the rename before the task1 is wokeup, and after that the task1's _lookup(_COMPLETE) will succeed, then the task1 will get the errors like this.

Not sure whether the test case above is similiar with yours.

In two terminals to run the follwoing command, randomly I can hit:

# while [ 1 ]; do date; mv file file_tmp; mv file_tmp file; done

mv: cannot move 'file' to 'file_tmp': No such file or directory
Tue Mar 23 17:25:51 CST 2021
Tue Mar 23 17:25:51 CST 2021
mv: overwrite 'file'?

Actions

Copy link

Updated by Xiaoxi Chen about 3 years ago

Jeff Layton wrote:

I think that after the mv, the directory should no longer be considered ORDERED. We probably can consider it complete in that we know that the dentry should no longer exist at that point.

One question: When you see this occur on a client, was the previous mv done on that client or on a different one? If you can reproduce this all with a single client, then we probably need to fix up the rename codepaths. If the mv occurs on a different client completely, then we may need to look at lease/cap handling.

There are several clients that mounting the shares however the MV (should only run on one node, based on the input of application owner), at least, the node that complains "are the same file" always the same node.

The application does thousands of MV each day, it fails ~ 2-3 times per day, consistently reproduce-able.

Actions

Copy link

Updated by Xiaoxi Chen about 3 years ago

Xiubo Li wrote:

I am doubting that if there has two tasks are doing the rename:

For task1, if it just do _lookup(_INPROGRESS) and then sleep to wait for the client_lock when doing the _lookup(_COMPLETE).

If then task2 start to run and will finish the rename before the task1 is wokeup, and after that the task1's _lookup(_COMPLETE) will succeed, then the task1 will get the errors like this.

Not sure whether the test case above is similiar with yours.

In two terminals to run the follwoing command, randomly I can hit:

[...]

Similar with my guessing though I believe it is two task both trying to "mv INPROGRESS COMPLETE".
Task 1 lookup INPROGRESS ,sleep, task2 lookup INPROGRESS, lookup COMPLETE, finish mv. task 1 woke up lookup COMPLETE then complains "are the same file". However application team doesnt believe and based on the fact that only one client has cap, so it is hard for us to find out the other competitor.

Regardless, the dirty dentries in readdir_cache is an issue, that makes the application cannot retry and recovery it self.

Actions

Copy link

Updated by Xiubo Li about 3 years ago

Xiaoxi Chen wrote:

Jeff Layton wrote:

I think that after the mv, the directory should no longer be considered ORDERED. We probably can consider it complete in that we know that the dentry should no longer exist at that point.

One question: When you see this occur on a client, was the previous mv done on that client or on a different one? If you can reproduce this all with a single client, then we probably need to fix up the rename codepaths. If the mv occurs on a different client completely, then we may need to look at lease/cap handling.

There are several clients that mounting the shares however the MV (should only run on one node, based on the input of application owner), at least, the node that complains "are the same file" always the same node.

The application does thousands of MV each day, it fails ~ 2-3 times per day, consistently reproduce-able.

I doubt that it is the multiple clients and lease/cap issue caused this as Jeff mentioned above.

Actions

Copy link

Updated by Xiubo Li about 3 years ago

Since the `ls` command output was correct, the ORDERED flag should have been cleared as expected, or it should show '.dw_gem2_cmn_sd_COMPLETE' instead of '.dw_gem2_cmn_sd_INPROGRESS'.

I still couldn't figure out why the old dentry wasn't removed from 'dir->dentries' after `mv`, from the code the old dentry removing and new dentry inserting are both under a signal client_lock scope, no unlock/lock happen during that. Till now what I can figure out is that the only possible case is the `request->old_dentry()` is NULL for some reason.

Actions

Copy link

Updated by Xiubo Li about 3 years ago

I have figured one case could reprodce it in theory:

1, I have check the `mv` source code, before doing the `rename`, it will stat both the '_INPROGRESS' and '_COMPLETE' files one by one.

2, If there has two threads are doing the 'mv', and when the thread_A is on the way, and in the thread_B it is possible that the two stats will both be successful and return the same info. That's because during the two stats gap in thread_B, thread_A could have finished the 'mv' successfully. Locally I can reproduce this.

3, Then in thread_B, the 'mv' itself will try to compare the two stats' info before doing the rename, which will fail with they are the same file.

4, In the Description, you mentioned when stating the two files, they are showing the same info. Possiblly the 'mv' still on the way in the app when you were doing these ? So next time when you do the stat again it may fail, no matter you did the caches dropping or not.

5, I have checked the drop caches related code, it won't do anything that's related to the dir->dentries.

Actions

Copy link

#10

Updated by Xiubo Li about 3 years ago

Pull request ID set to 40787

Actions

Copy link

#11

Updated by Xiubo Li almost 3 years ago

Status changed from Triaged to Fix Under Review

Actions

Copy link

#12

Updated by Rishabh Dave 10 months ago

The PR has been merged. Should this PR be backported?

Actions

Copy link

#13

Updated by Xiubo Li 10 months ago

Backport set to pacific, quincy, reef

Rishabh Dave wrote:

The PR has been merged. Should this PR be backported?

Yeah, it should be.

Actions

Copy link

#14

Updated by Xiubo Li 10 months ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

#15

Updated by Backport Bot 10 months ago

Copied to Backport #62010: quincy: client: dir->dentries inconsistent, both newname and oldname points to same inode, mv complains "are the same file" added

Actions

Copy link

#16

Updated by Backport Bot 10 months ago

Copied to Backport #62011: reef: client: dir->dentries inconsistent, both newname and oldname points to same inode, mv complains "are the same file" added

Actions

Copy link

#17

Updated by Backport Bot 10 months ago

Copied to Backport #62012: pacific: client: dir->dentries inconsistent, both newname and oldname points to same inode, mv complains "are the same file" added

Actions

Copy link

#18

Updated by Backport Bot 10 months ago

Tags set to backport_processed

Actions

Copy link

#19

Updated by Xiubo Li 8 months ago

Status changed from Pending Backport to Resolved

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #49912

client: dir->dentries inconsistent, both newname and oldname points to same inode, mv complains "are the same file"

Updated by Patrick Donnelly about 3 years ago

Updated by Jeff Layton about 3 years ago

Updated by Xiubo Li about 3 years ago

Updated by Xiubo Li about 3 years ago

Updated by Xiaoxi Chen about 3 years ago

Updated by Xiaoxi Chen about 3 years ago

Updated by Xiubo Li about 3 years ago

Updated by Xiubo Li about 3 years ago

Updated by Xiubo Li about 3 years ago

Updated by Xiubo Li about 3 years ago

Updated by Xiubo Li almost 3 years ago

Updated by Rishabh Dave 10 months ago

Updated by Xiubo Li 10 months ago

Updated by Xiubo Li 10 months ago

Updated by Backport Bot 10 months ago

Updated by Backport Bot 10 months ago

Updated by Backport Bot 10 months ago

Updated by Backport Bot 10 months ago

Updated by Xiubo Li 8 months ago