Project

General

Profile

Actions

Bug #8255

closed

mds: directory with missing object cannot be removed

Added by Dmitry Smirnov almost 10 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
fsck/damage handling
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

MDS write the following line to it log over 14000 times per minute:

2014-04-30 15:05:50.996261 7fe8b4237700  0 mds.0.cache open_remote_dentry_finish bad remote dentry [dentry #1/home/user/.config/epiphany/session_state.xml~ [2,head] auth REMOTE(reg) (dversion lock) pv=0 v=1036 inode=0 0x7fe8ec0c8190]

Also the following error was logged once:

2014-04-30 14:42:36.148296 mds.0 [ERR] unmatched rstat rbytes on single dirfrag 1000010bd69, inode has n(v19 rc2014-04-30 14:42:36.134246 b200307 35=29+6), dirfrag has n(v19 rc2014-04-30 14:42:36.134246 b197383 33=28+5)

I can't remove /home/user/.config/epiphany:

# sudo rm -rv /mnt/ceph/home/user/.config/epiphany
rm: cannot remove `/mnt/ceph/home/user/.config/epiphany': Directory not empty

Please advise.

Actions #1

Updated by Zheng Yan almost 10 years ago

  • Status changed from New to Need More Info

need more log to diagnose

truncate the mds log
execute "rm -rv /mnt/ceph/home/user/.config/epiphany"
update the mds log

Actions #2

Updated by Zheng Yan almost 10 years ago

besides, I'm curious when was the fs created (which version)

Actions #3

Updated by Dmitry Smirnov almost 10 years ago

FS was created on 0.72.2 then upgraded to 0.78, 0.79 following by 0.80~rc1.
Somehow journal was corrupted during cluster recovery; MDS was crashing on journal replay;
Unfortunately I lost crash dump because of the mentioned log flood.

I had to resort to "--reset-journal" to get access to files.
Some files are corrupted (that's not a problem) but now I'm getting errors like

2014-05-01 03:37:22.059908 mds.0 [ERR] dir 10000421bec object missing on disk; some files may be lost
2014-05-01 03:53:58.440638 mds.0 [ERR] dir 10000421646 object missing on disk; some files may be lost

on MDS start.

I moved "epiphany" directory out of the way and rebooted client(s) that were accessing it.
Now I still can't remove it but there is nothing in MDS log whatsoever. Could it be that one directory entry stuck but MDS do not show it due to above error?

I wonder how can I wipe out affected directories (together with directory fragments)?

Actions #4

Updated by Zheng Yan almost 10 years ago

get inode number of 'epiphany' directory, then modify Server::_dir_is_nonempty_unlocked() and Server::_dir_is_nonempty() in src/mds/Server.cc, add line
"if (in->ino() == <inode number>) return false;" to the beginning of these two functions.

Actions #5

Updated by Dmitry Smirnov almost 10 years ago

Thanks, I might try that or make a new file system from scratch.

There are more than one issue mentioned in this ticket but as for TODO I think we should rate-limit logging to prevent flood of similar messages. Perhaps logger can remember last message and just increment the counter if the same message is repeated. Then it can print something like "previous message repeated NNN times." every 10 or 30 seconds.

Actions #6

Updated by Sage Weil almost 10 years ago

  • Subject changed from 0.80~rc1: MDS log pollution, unable to remove directory (unmatched rstat rbytes on single dirfrag) to mds: directory with missing object cannot be removed
  • Status changed from Need More Info to 12
  • Priority changed from High to Normal
  • Source changed from other to Community (user)

I think the remaining step is to eventually incorporate the ability to remove teh last trace of the damaged directory.

Actions #7

Updated by Zheng Yan over 9 years ago

  • Status changed from 12 to Fix Under Review
Actions #8

Updated by John Spray almost 8 years ago

  • Status changed from Fix Under Review to New
Actions #9

Updated by Greg Farnum almost 8 years ago

  • Category changed from 47 to fsck/damage handling
  • Component(FS) MDS added

John, much of this is handled now with the metadata damaged flags. What's left?

Actions #10

Updated by John Spray almost 8 years ago

  • Status changed from New to Resolved

This kind of issue should be handled cleanly (MDS will raise 'damaged' health alert, specifics in "damage ls") as of Jewel

Actions

Also available in: Atom PDF