Bug #8255
closedmds: directory with missing object cannot be removed
0%
Description
MDS write the following line to it log over 14000 times per minute:
2014-04-30 15:05:50.996261 7fe8b4237700 0 mds.0.cache open_remote_dentry_finish bad remote dentry [dentry #1/home/user/.config/epiphany/session_state.xml~ [2,head] auth REMOTE(reg) (dversion lock) pv=0 v=1036 inode=0 0x7fe8ec0c8190]
Also the following error was logged once:
2014-04-30 14:42:36.148296 mds.0 [ERR] unmatched rstat rbytes on single dirfrag 1000010bd69, inode has n(v19 rc2014-04-30 14:42:36.134246 b200307 35=29+6), dirfrag has n(v19 rc2014-04-30 14:42:36.134246 b197383 33=28+5)
I can't remove /home/user/.config/epiphany
:
# sudo rm -rv /mnt/ceph/home/user/.config/epiphany rm: cannot remove `/mnt/ceph/home/user/.config/epiphany': Directory not empty
Please advise.
Updated by Zheng Yan almost 10 years ago
- Status changed from New to Need More Info
need more log to diagnose
truncate the mds log
execute "rm -rv /mnt/ceph/home/user/.config/epiphany"
update the mds log
Updated by Zheng Yan almost 10 years ago
besides, I'm curious when was the fs created (which version)
Updated by Dmitry Smirnov almost 10 years ago
FS was created on 0.72.2 then upgraded to 0.78, 0.79 following by 0.80~rc1.
Somehow journal was corrupted during cluster recovery; MDS was crashing on journal replay;
Unfortunately I lost crash dump because of the mentioned log flood.
I had to resort to "--reset-journal" to get access to files.
Some files are corrupted (that's not a problem) but now I'm getting errors like
2014-05-01 03:37:22.059908 mds.0 [ERR] dir 10000421bec object missing on disk; some files may be lost 2014-05-01 03:53:58.440638 mds.0 [ERR] dir 10000421646 object missing on disk; some files may be lost
on MDS start.
I moved "epiphany" directory out of the way and rebooted client(s) that were accessing it.
Now I still can't remove it but there is nothing in MDS log whatsoever. Could it be that one directory entry stuck but MDS do not show it due to above error?
I wonder how can I wipe out affected directories (together with directory fragments)?
Updated by Zheng Yan almost 10 years ago
get inode number of 'epiphany' directory, then modify Server::_dir_is_nonempty_unlocked() and Server::_dir_is_nonempty() in src/mds/Server.cc, add line
"if (in->ino() == <inode number>) return false;" to the beginning of these two functions.
Updated by Dmitry Smirnov almost 10 years ago
Thanks, I might try that or make a new file system from scratch.
There are more than one issue mentioned in this ticket but as for TODO I think we should rate-limit logging to prevent flood of similar messages. Perhaps logger can remember last message and just increment the counter if the same message is repeated. Then it can print something like "previous message repeated NNN times." every 10 or 30 seconds.
Updated by Sage Weil almost 10 years ago
- Subject changed from 0.80~rc1: MDS log pollution, unable to remove directory (unmatched rstat rbytes on single dirfrag) to mds: directory with missing object cannot be removed
- Status changed from Need More Info to 12
- Priority changed from High to Normal
- Source changed from other to Community (user)
I think the remaining step is to eventually incorporate the ability to remove teh last trace of the damaged directory.
Updated by Zheng Yan over 9 years ago
- Status changed from 12 to Fix Under Review
Updated by John Spray almost 8 years ago
- Status changed from Fix Under Review to New
Updated by Greg Farnum almost 8 years ago
- Category changed from 47 to fsck/damage handling
- Component(FS) MDS added
John, much of this is handled now with the metadata damaged flags. What's left?
Updated by John Spray almost 8 years ago
- Status changed from New to Resolved
This kind of issue should be handled cleanly (MDS will raise 'damaged' health alert, specifics in "damage ls") as of Jewel