Bug #46273
closedmds: deleting a large number of files in a directory causes the file system to read only
0%
Description
Log as follow:
2020-06-18 15:32:20.706904 7f1fe9f54700 1 mds.2.cache.dir(0x2000255af08.100*) commit error -2 v 1248554 2020-06-18 15:32:20.706914 7f1fe9f54700 -1 log_channel(cluster) log [ERR] : failed to commit dir 0x2000255af08.100* object, errno -2 2020-06-18 15:32:20.706921 7f1fe9f54700 -1 mds.2.334296 unhandled write error (2) No such file or directory, force readonly... 2020-06-18 15:32:20.706929 7f1fe9f54700 1 mds.2.cache force file system read-only
The reason is that deleting files in a directory until it meets the merge condition, the merge operation will merge multiple subfrags into a basefrag, and create basedir for basefrag, then call steal_dentry to moves the dentries under subdir to basedir. When commit the basedir, the dentries under basedir will satisfies the condition: dn->is_dirty() && dn->get_linkage()->is_null(), so the OMAP request type is CEPH_OSD_OP_OMAPRMKEYS, When OSD processes this request, it thinks that the object corresponding to basedir must exist (in fact, it does not exist), and returns error - 2.
Updated by Patrick Donnelly almost 4 years ago
- Project changed from 16 to CephFS
- Subject changed from Deleting a large number of files in a directory causes the file system to read only to mds: deleting a large number of files in a directory causes the file system to read only
- Description updated (diff)
- Status changed from New to Fix Under Review
- Assignee set to wei qiaomiao
- Target version set to v16.0.0
- Source set to Community (dev)
- Backport set to octopus,nautilus
- Pull request ID set to 35848
- Component(FS) MDS added
Updated by wei qiaomiao almost 4 years ago
The origin dirfrag is closed through close_dirfrag,and then the mergeed basedirfrag commits it.
More detailed log:
2020-06-18 15:32:20.135905 7f1feef5e700 1 mds.2.bal merging [dir 0x2000255af08.100110111* /test227/test227-3/#test-dir.0-0/mdtest_tree.0/mdtest_tree.1/mdtest_tree.3/ [2,head] auth{0=1} v=1247462 cv=1241630/1241630 state=1610612738|complete f(v1074951 m2020-06-18 15:32:15.683728) n(v11815 rc2020-06-18 15:32:15.683728) hs=0+2908,ss=0+0 dirty=2908 | child=1 frozen=0 subtree=0 importing=0 importbound=0 sticky=0 replicated=1 dirty=1 authpin=0 0x555aad539800]
2020-06-18 15:32:20.135932 7f1feef5e700 1 mds.2.bal all sibs under 100110110* 0x555a568e5500 should merge
2020-06-18 15:32:20.135935 7f1feef5e700 1 mds.2.bal all sibs under 10011010* 0x555a1ce35100 should merge
2020-06-18 15:32:20.135938 7f1feef5e700 1 mds.2.bal all sibs under 1001100* 0x555a4d9ffc00 should merge
2020-06-18 15:32:20.135945 7f1feef5e700 1 mds.2.bal all sibs under 100111* 0x555a1ce32000,0x555a1ce32700,0x555a9c00aa00,0x555a57ab6a00,0x555a568e5c00,0x555a568e6300,0x555a568e6a00,0x555a1ce32e00 should merge
2020-06-18 15:32:20.135951 7f1feef5e700 1 mds.2.bal all sibs under 10010* 0x555a9a267500 should merge
2020-06-18 15:32:20.135953 7f1feef5e700 1 mds.2.bal all sibs under 1000* 0x555a8c4f1100 should merge
2020-06-18 15:32:20.136005 7f1feef5e700 1 mds.2.bal not all sibs under 101* in cache (have 0x555a568e7100,0x555ab01c2000,0x555a30cbb800,0x555a30cc0000,0x555a30cc0700,0x555a30cb5500)
2020-06-18 15:32:20.388226 7f1ff1763700 1 mds.2.cache.dir(0x2000255af08.100*) merge 0x555a8c4f1100,0x555a9a267500,0x555a4d9ffc00,0x555a1ce35100,0x555a568e5500,0x555aad539800,0x555a1ce32000,0x555a1ce32700,0x555a9c00aa00,0x555a57ab6a00,0x555a568e5c00,0x555a568e6300,0x555a568e6a00,0x555a1ce32e00
2020-06-18 15:32:20.416342 7f1fe9f54700 1 mds.2.cache.dir(0x2000255af08.100*) _commit want 1248554 on [dir 0x2000255af08.100* /test227/test227-3/#test-dir.0-0/mdtest_tree.0/mdtest_tree.1/mdtest_tree.3/ [2,head] auth{0=2,1=2} v=1248554 cv=0/0 ap=2+0+0 state=1610629138|complete|frozendir|fragmenting f(v1074951)/f(v1074951 m2020-06-18 15:32:17.361970) n(v11815)/n(v11815 rc2020-06-18 15:32:17.361970) hs=0+27460,ss=0+0 dirty=27460 | child=1 frozen=1 replicated=1 dirty=1 authpin=1 0x555a30cb5c00]
2020-06-18 15:32:20.416354 7f1fe9f54700 1 mds.2.cache.dir(0x2000255af08.100*) marking committing
2020-06-18 15:32:20.416357 7f1fe9f54700 1 mds.2.cache.dir(0x2000255af08.100*) _omap_commit
2020-06-18 15:32:20.416364 7f1fe9f54700 1 mds.2.cache.dir(0x2000255af08.100*) rm file.mdtest.0.34113490 [dentry #0x1/test227/test227-3/#test-dir.0-0/mdtest_tree.0/mdtest_tree.1/mdtest_tree.3/file.mdtest.0.34113490 [2,head] auth NULL (dversion lock) v=1240786 inode=0 state=1610612802|bottomlru | request=0 lock=0 fragmenting=1 inodepin=0 replicated=0 dirty=1 authpin=0 clientlease=0 0x555a5d1de780]
2020-06-18 15:32:20.416377 7f1fe9f54700 1 mds.2.cache.dir(0x2000255af08.100*) rm file.mdtest.0.33947990 [dentry #0x1/test227/test227-3/#test-dir.0-0/mdtest_tree.0/mdtest_tree.1/mdtest_tree.3/file.mdtest.0.33947990 [2,head] auth NULL (dversion lock) v=1240166 inode=0 state=1610612802|bottomlru | fragmenting=1 replicated=0 dirty=1 0x555a5d1dfa40]
Updated by Patrick Donnelly almost 4 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #46520: octopus: mds: deleting a large number of files in a directory causes the file system to read only added
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #46521: nautilus: mds: deleting a large number of files in a directory causes the file system to read only added
Updated by Patrick Donnelly over 3 years ago
- Related to Bug #47201: mds: CDir::_omap_commit(int): Assertion `committed_version == 0' failed. added
Updated by Nathan Cutler over 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".