Project

General

Profile

Bug #46273

mds: deleting a large number of files in a directory causes the file system to read only

Added by wei qiaomiao 13 days ago. Updated 11 days ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, MDSMonitor
Labels (FS):
Pull request ID:
Crash signature:

Description

Log as follow:

2020-06-18 15:32:20.706904 7f1fe9f54700  1 mds.2.cache.dir(0x2000255af08.100*) commit error -2 v 1248554
2020-06-18 15:32:20.706914 7f1fe9f54700 -1 log_channel(cluster) log [ERR] : failed to commit dir 0x2000255af08.100* object, errno -2
2020-06-18 15:32:20.706921 7f1fe9f54700 -1 mds.2.334296 unhandled write error (2) No such file or directory, force readonly...
2020-06-18 15:32:20.706929 7f1fe9f54700  1 mds.2.cache force file system read-only

The reason is that deleting files in a directory until it meets the merge condition, the merge operation will merge multiple subfrags into a basefrag, and create basedir for basefrag, then call steal_dentry to moves the dentries under subdir to basedir. When commit the basedir, the dentries under basedir will satisfies the condition: dn->is_dirty() && dn->get_linkage()->is_null(), so the OMAP request type is CEPH_OSD_OP_OMAPRMKEYS, When OSD processes this request, it thinks that the object corresponding to basedir must exist (in fact, it does not exist), and returns error - 2.

History

#1 Updated by Patrick Donnelly 13 days ago

  • Project changed from Calamari to fs
  • Subject changed from Deleting a large number of files in a directory causes the file system to read only to mds: deleting a large number of files in a directory causes the file system to read only
  • Description updated (diff)
  • Status changed from New to Fix Under Review
  • Assignee set to wei qiaomiao
  • Target version set to v16.0.0
  • Source set to Community (dev)
  • Backport set to octopus,nautilus
  • Pull request ID set to 35848
  • Component(FS) MDS added

#2 Updated by Zheng Yan 11 days ago

  • Component(FS) MDSMonitor added

#3 Updated by wei qiaomiao 11 days ago

The origin dirfrag is closed through close_dirfrag´╝îand then the mergeed basedirfrag commits it.

More detailed log:
2020-06-18 15:32:20.135905 7f1feef5e700 1 mds.2.bal merging [dir 0x2000255af08.100110111* /test227/test227-3/#test-dir.0-0/mdtest_tree.0/mdtest_tree.1/mdtest_tree.3/ [2,head] auth{0=1} v=1247462 cv=1241630/1241630 state=1610612738|complete f(v1074951 m2020-06-18 15:32:15.683728) n(v11815 rc2020-06-18 15:32:15.683728) hs=0+2908,ss=0+0 dirty=2908 | child=1 frozen=0 subtree=0 importing=0 importbound=0 sticky=0 replicated=1 dirty=1 authpin=0 0x555aad539800]
2020-06-18 15:32:20.135932 7f1feef5e700 1 mds.2.bal all sibs under 100110110* 0x555a568e5500 should merge
2020-06-18 15:32:20.135935 7f1feef5e700 1 mds.2.bal all sibs under 10011010* 0x555a1ce35100 should merge
2020-06-18 15:32:20.135938 7f1feef5e700 1 mds.2.bal all sibs under 1001100* 0x555a4d9ffc00 should merge
2020-06-18 15:32:20.135945 7f1feef5e700 1 mds.2.bal all sibs under 100111* 0x555a1ce32000,0x555a1ce32700,0x555a9c00aa00,0x555a57ab6a00,0x555a568e5c00,0x555a568e6300,0x555a568e6a00,0x555a1ce32e00 should merge
2020-06-18 15:32:20.135951 7f1feef5e700 1 mds.2.bal all sibs under 10010* 0x555a9a267500 should merge
2020-06-18 15:32:20.135953 7f1feef5e700 1 mds.2.bal all sibs under 1000* 0x555a8c4f1100 should merge
2020-06-18 15:32:20.136005 7f1feef5e700 1 mds.2.bal not all sibs under 101* in cache (have 0x555a568e7100,0x555ab01c2000,0x555a30cbb800,0x555a30cc0000,0x555a30cc0700,0x555a30cb5500)

2020-06-18 15:32:20.388226 7f1ff1763700 1 mds.2.cache.dir(0x2000255af08.100*) merge 0x555a8c4f1100,0x555a9a267500,0x555a4d9ffc00,0x555a1ce35100,0x555a568e5500,0x555aad539800,0x555a1ce32000,0x555a1ce32700,0x555a9c00aa00,0x555a57ab6a00,0x555a568e5c00,0x555a568e6300,0x555a568e6a00,0x555a1ce32e00

2020-06-18 15:32:20.416342 7f1fe9f54700 1 mds.2.cache.dir(0x2000255af08.100*) _commit want 1248554 on [dir 0x2000255af08.100* /test227/test227-3/#test-dir.0-0/mdtest_tree.0/mdtest_tree.1/mdtest_tree.3/ [2,head] auth{0=2,1=2} v=1248554 cv=0/0 ap=2+0+0 state=1610629138|complete|frozendir|fragmenting f(v1074951)/f(v1074951 m2020-06-18 15:32:17.361970) n(v11815)/n(v11815 rc2020-06-18 15:32:17.361970) hs=0+27460,ss=0+0 dirty=27460 | child=1 frozen=1 replicated=1 dirty=1 authpin=1 0x555a30cb5c00]
2020-06-18 15:32:20.416354 7f1fe9f54700 1 mds.2.cache.dir(0x2000255af08.100*) marking committing
2020-06-18 15:32:20.416357 7f1fe9f54700 1 mds.2.cache.dir(0x2000255af08.100*) _omap_commit
2020-06-18 15:32:20.416364 7f1fe9f54700 1 mds.2.cache.dir(0x2000255af08.100*) rm file.mdtest.0.34113490 [dentry #0x1/test227/test227-3/#test-dir.0-0/mdtest_tree.0/mdtest_tree.1/mdtest_tree.3/file.mdtest.0.34113490 [2,head] auth NULL (dversion lock) v=1240786 inode=0 state=1610612802|bottomlru | request=0 lock=0 fragmenting=1 inodepin=0 replicated=0 dirty=1 authpin=0 clientlease=0 0x555a5d1de780]
2020-06-18 15:32:20.416377 7f1fe9f54700 1 mds.2.cache.dir(0x2000255af08.100*) rm file.mdtest.0.33947990 [dentry #0x1/test227/test227-3/#test-dir.0-0/mdtest_tree.0/mdtest_tree.1/mdtest_tree.3/file.mdtest.0.33947990 [2,head] auth NULL (dversion lock) v=1240166 inode=0 state=1610612802|bottomlru | fragmenting=1 replicated=0 dirty=1 0x555a5d1dfa40]

Also available in: Atom PDF