Project

General

Profile

Actions

Bug #58597

open

The MDS crashes when deleting a specific file

Added by Tobias Reinhard about 1 year ago. Updated about 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
fsck/damage handling
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

   -88> 2023-01-29T04:23:09.960+0100 7f429ccd8700 10 mds.0.cache  oldest old_inode is [28051,28051], done.
   -87> 2023-01-29T04:23:09.960+0100 7f429ccd8700 20 mds.0.journal EMetaBlob::add_dir_context(0x55b33ecb9a80) reached unambig auth subtree, don't need  at [dir 0x100 ~mds0/ [2,head] auth pv=175065187 v=175065185 cv=0/0 dir_auth=0
   -86> 2023-01-29T04:23:09.960+0100 7f429ccd8700 20 mds.0.journal EMetaBlob::add_dir_context final:
   -85> 2023-01-29T04:23:09.960+0100 7f429ccd8700 10 mds.0.cache journal_cow_dentry follows head on [dentry #0x100/stray0 [2,head] auth (dversion lock) pv=175065186 v=175065158 ino=0x600 state=1610612736 | inodepin=1 dirty=1 0x55
   -84> 2023-01-29T04:23:09.960+0100 7f429ccd8700 10 mds.0.cache journal_cow_dentry follows 28921 < first on [inode 0x600 [...28922,head] ~mds0/stray0/ auth v175065158 pv175065186 ap=1 f(v38 m2023-01-29T04:14:47.829119+0100) n(v8
   -83> 2023-01-29T04:23:09.960+0100 7f429ccd8700 10 mds.0.cache journal_cow_dentry follows head on [dentry #0x1/docker/nextcloud-13-nzh/db~/mysql/innodb_index_stats.ibd [10000000031,head] auth (dn xlock x=1 by 0x55b33fa88c00) (d
   -82> 2023-01-29T04:23:09.960+0100 7f429ccd8700 10 mds.0.cache journal_cow_dentry follows 28921 < first on [dentry #0x1/docker/nextcloud-13-nzh/db~/mysql/innodb_index_stats.ibd [10000000031,head] auth (dn xlock x=1 by 0x55b33fa
   -81> 2023-01-29T04:23:09.960+0100 7f429ccd8700 -1 ./src/mds/Server.cc: In function 'void Server::_unlink_local(MDRequestRef&, CDentry*, CDentry*)' thread 7f429ccd8700 time 2023-01-29T04:23:09.962505+0100
./src/mds/Server.cc: 7806: FAILED ceph_assert(in->first <= straydn->first)

 ceph version 16.2.9 (a569859f5e07da0c4c39da81d5fb5675cd95da49) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x124) [0x7f42a21b8fde]
 2: /usr/lib/ceph/libceph-common.so.2(+0x251169) [0x7f42a21b9169]
 3: (Server::_unlink_local(boost::intrusive_ptr<MDRequestImpl>&, CDentry*, CDentry*)+0x11ff) [0x55b33c58072f]
 4: (MDSContext::complete(int)+0x5b) [0x55b33c80786b]
 5: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x55b33c4d7218]
 6: (Locker::eval(CInode*, int, bool)+0x3de) [0x55b33c6d14de]
 7: (Locker::handle_client_caps(boost::intrusive_ptr<MClientCaps const> const&)+0x21ef) [0x55b33c6dcddf]
 8: (Locker::dispatch(boost::intrusive_ptr<Message const> const&)+0x224) [0x55b33c6ded34]
 9: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x5c0) [0x55b33c4f63f0]
 10: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x58) [0x55b33c4f69e8]
 11: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x1bf) [0x55b33c4d0b6f]
 12: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x468) [0x7f42a23e8cb8]
 13: (DispatchQueue::entry()+0x5ef) [0x7f42a23e63bf]
 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f42a24a54bd]
 15: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f42a1f13ea7]
 16: clone()

This is perfectly reproducible, unfortunately on a Production System.

Actions #1

Updated by Venky Shankar about 1 year ago

Tobias Reinhard wrote:

[...]

This is perfectly reproducible, unfortunately on a Production System.

This is a metadata corruption for an inode. Could you share the reproducer?

Actions #2

Updated by Tobias Reinhard about 1 year ago

Venky Shankar wrote:

Tobias Reinhard wrote:

[...]

This is perfectly reproducible, unfortunately on a Production System.

This is a metadata corruption for an inode. Could you share the reproducer?

The crash happens, whenever I delete the file - every time.

I don't know how or when the corruption itsepf happened. (so this part is not reproducible)

Can you tell me a way how to fix this?

Is there a way to search the rest of the Filesystem for this type of corruption?

Actions #3

Updated by Venky Shankar about 1 year ago

Hi Tobias,

The crash happens, whenever I delete the file - every time.

I don't know how or when the corruption itsepf happened. (so this part is not reproducible)

We've seen this corruption when PostgreSQL is used on CephFS. Seems like you are using MySQL on CephFS?

Can you tell me a way how to fix this?

Is there a way to search the rest of the Filesystem for this type of corruption?

We do! Look at - https://github.com/ceph/ceph/blob/main/src/tools/cephfs/first-damage.py

This script precisely scans inodes from cephfs metadata pool and reports corruption. Optionally, you can instruct the script to remove the corrupted metadata and later recover by running the recovery procedure. I must admin, that the tool usage is not documented (we will have that done), so you'll have to follow the usage detailed in the comments in the tool source itself. Sorry!

For the corruption fix itself, we have started running PostgreSQL jobs on cephfs as part of out upstream testing. Sadly, we haven't reproduced this yet. If we do mange to reproduce it, we would all the required debug logs to figure out when and where the metadata gets corrupted.

Actions #4

Updated by Tobias Reinhard about 1 year ago

Venky Shankar wrote:

Hi Tobias,

The crash happens, whenever I delete the file - every time.

I don't know how or when the corruption itsepf happened. (so this part is not reproducible)

We've seen this corruption when PostgreSQL is used on CephFS. Seems like you are using MySQL on CephFS?

I'm runnung MySQL/MariaDB for years on CephFS without any problems. Also started using PostgreSQL on it for several months - also without problems.

I updated from Ceph 14 to 15 to 16 just before this problem came up.

Also I've enabled two active MDS for a short time.

Can you tell me a way how to fix this?

Is there a way to search the rest of the Filesystem for this type of corruption?

We do! Look at - https://github.com/ceph/ceph/blob/main/src/tools/cephfs/first-damage.py

This script precisely scans inodes from cephfs metadata pool and reports corruption. Optionally, you can instruct the script to remove the corrupted metadata and later recover by running the recovery procedure. I must admin, that the tool usage is not documented (we will have that done), so you'll have to follow the usage detailed in the comments in the tool source itself. Sorry!

Great, thank you. I did not find that tool.

What do you mean with "recovery procedure".

Since this is an production system, having CephFS offline (to run this tool) is difficult.

I think about solving this by creating a new CephFS and moving the Files from the current CephFS to the new one and then dropping the corrupted one at the end.

Actions #5

Updated by Venky Shankar about 1 year ago

Tobias Reinhard wrote:

Venky Shankar wrote:

Hi Tobias,

The crash happens, whenever I delete the file - every time.

I don't know how or when the corruption itsepf happened. (so this part is not reproducible)

We've seen this corruption when PostgreSQL is used on CephFS. Seems like you are using MySQL on CephFS?

I'm runnung MySQL/MariaDB for years on CephFS without any problems. Also started using PostgreSQL on it for several months - also without problems.

From what we know, there is specific I/O pattern that causes this silent corruption which then crashes the MDS only when the file is unlinked. So, the corrupted (MDS internal) metadata lives for long before an unlink catches it (due to specific asserts in the code that expect a certain field to be sane).

I updated from Ceph 14 to 15 to 16 just before this problem came up.

Also I've enabled two active MDS for a short time.

We have seen the corruption with a single active MDS too.

Can you tell me a way how to fix this?

Is there a way to search the rest of the Filesystem for this type of corruption?

We do! Look at - https://github.com/ceph/ceph/blob/main/src/tools/cephfs/first-damage.py

This script precisely scans inodes from cephfs metadata pool and reports corruption. Optionally, you can instruct the script to remove the corrupted metadata and later recover by running the recovery procedure. I must admin, that the tool usage is not documented (we will have that done), so you'll have to follow the usage detailed in the comments in the tool source itself. Sorry!

Great, thank you. I did not find that tool.

What do you mean with "recovery procedure".

The tool has an option to "fix" the corrupted metadata, however that's only if it recognizes a certain pattern in the corruption. Seeing the corruption you are running into (from the logs), this is not possible (`--repair-nosnap` tool option -- again, undocumented). So, you'd need to run the tool with `--remove` that removes the corrupted dentry, thereby losing the inode linkage (hierarchy). To fix that up, you'd need to follow

https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#recovery-from-missing-metadata-objects

which scans the data pool to regenerate metadata objects - which requires the file system to be offline.

Actions #6

Updated by Venky Shankar about 1 year ago

Hi Tobias,

Any update on using the tool? Were you able to get the file system back online?

Actions #7

Updated by Tobias Reinhard about 1 year ago

Venky Shankar wrote:

Hi Tobias,

Any update on using the tool? Were you able to get the file system back online?

Hi Venky,

the system is working because I do not touch the broken files.

Unfortunately I did not have time to do anything on this topic. Since this is a production-system it could take another month to get some down-time.

Actions #8

Updated by Venky Shankar about 1 year ago

Tobias Reinhard wrote:

Venky Shankar wrote:

Hi Tobias,

Any update on using the tool? Were you able to get the file system back online?

Hi Venky,

the system is working because I do not touch the broken files.

Unfortunately I did not have time to do anything on this topic. Since this is a production-system it could take another month to get some down-time.

Sure. In that case, would you be fine to close this issue? You can always create another (or reopen this) if you need assistance with running the tool.

Actions

Also available in: Atom PDF