Bug #8255: mds: directory with missing object cannot be removed - CephFS - Ceph

Actions

Copy link

Bug #8255

closed

mds: directory with missing object cannot be removed

Added by Dmitry Smirnov almost 10 years ago. Updated almost 8 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

fsck/damage handling

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

MDS write the following line to it log over 14000 times per minute:

2014-04-30 15:05:50.996261 7fe8b4237700  0 mds.0.cache open_remote_dentry_finish bad remote dentry [dentry #1/home/user/.config/epiphany/session_state.xml~ [2,head] auth REMOTE(reg) (dversion lock) pv=0 v=1036 inode=0 0x7fe8ec0c8190]

Also the following error was logged once:

2014-04-30 14:42:36.148296 mds.0 [ERR] unmatched rstat rbytes on single dirfrag 1000010bd69, inode has n(v19 rc2014-04-30 14:42:36.134246 b200307 35=29+6), dirfrag has n(v19 rc2014-04-30 14:42:36.134246 b197383 33=28+5)

I can't remove /home/user/.config/epiphany:

# sudo rm -rv /mnt/ceph/home/user/.config/epiphany
rm: cannot remove `/mnt/ceph/home/user/.config/epiphany': Directory not empty

Please advise.

Actions

Copy link

Updated by Zheng Yan almost 10 years ago

Status changed from New to Need More Info

need more log to diagnose

truncate the mds log
execute "rm -rv /mnt/ceph/home/user/.config/epiphany"
update the mds log

Actions

Copy link

Updated by Zheng Yan almost 10 years ago

besides, I'm curious when was the fs created (which version)

Actions

Copy link

Updated by Dmitry Smirnov almost 10 years ago

FS was created on 0.72.2 then upgraded to 0.78, 0.79 following by 0.80~rc1.
Somehow journal was corrupted during cluster recovery; MDS was crashing on journal replay;
Unfortunately I lost crash dump because of the mentioned log flood.

I had to resort to "--reset-journal" to get access to files.
Some files are corrupted (that's not a problem) but now I'm getting errors like

2014-05-01 03:37:22.059908 mds.0 [ERR] dir 10000421bec object missing on disk; some files may be lost
2014-05-01 03:53:58.440638 mds.0 [ERR] dir 10000421646 object missing on disk; some files may be lost

on MDS start.

I moved "epiphany" directory out of the way and rebooted client(s) that were accessing it.
Now I still can't remove it but there is nothing in MDS log whatsoever. Could it be that one directory entry stuck but MDS do not show it due to above error?

I wonder how can I wipe out affected directories (together with directory fragments)?

Actions

Copy link

Updated by Zheng Yan almost 10 years ago

get inode number of 'epiphany' directory, then modify Server::_dir_is_nonempty_unlocked() and Server::_dir_is_nonempty() in src/mds/Server.cc, add line
"if (in->ino() == <inode number>) return false;" to the beginning of these two functions.

Actions

Copy link

Updated by Dmitry Smirnov almost 10 years ago

Thanks, I might try that or make a new file system from scratch.

There are more than one issue mentioned in this ticket but as for TODO I think we should rate-limit logging to prevent flood of similar messages. Perhaps logger can remember last message and just increment the counter if the same message is repeated. Then it can print something like "previous message repeated NNN times." every 10 or 30 seconds.

Actions

Copy link

Updated by Sage Weil almost 10 years ago

Subject changed from 0.80~rc1: MDS log pollution, unable to remove directory (unmatched rstat rbytes on single dirfrag) to mds: directory with missing object cannot be removed
Status changed from Need More Info to 12
Priority changed from High to Normal
Source changed from other to Community (user)