Bug #38452: mds: assert crash loop while unlinking file - CephFS - Ceph

Actions

Copy link

Bug #38452

open

mds: assert crash loop while unlinking file

Added by Jérôme Poulin about 5 years ago. Updated 7 months ago.

Status:

Need More Info

Priority:

Urgent

Assignee:

Category:

Correctness/Safety

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v13.2.4

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Here is the stack trace, it was caused by a Postgresql trying to unlink a file in the log archive.

 ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
 1: (()+0x4058c0) [0x5588549cd8c0]
 2: (()+0x12890) [0x7fdf788d3890]
 3: (gsignal()+0xc7) [0x7fdf779c6e97]
 4: (abort()+0x141) [0x7fdf779c8801]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x250) [0x7fdf78fbd530]
 6: (()+0x26d5a7) [0x7fdf78fbd5a7]
 7: (Server::_unlink_local(boost::intrusive_ptr<MDRequestImpl>&, CDentry*, CDentry*)+0x15f8) [0x5588547482e8]
 8: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>&)+0x961) [0x558854748cd1]
 9: (Server::handle_client_request(MClientRequest*)+0x49b) [0x55885476305b]
 10: (Server::dispatch(Message*)+0x2db) [0x558854766d1b]
 11: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x5588546da1e4]
 12: (MDSRank::_dispatch(Message*, bool)+0x89b) [0x5588546e7a1b]
 13: (MDSRank::retry_dispatch(Message*)+0x12) [0x5588546e8012]
 14: (MDSInternalContextBase::complete(int)+0x67) [0x55885494df87]
 15: (MDSRank::_advance_queues()+0xf1) [0x5588546e69c1]
 16: (MDSRank::ProgressThread::entry()+0x43) [0x5588546e7043]
 17: (()+0x76db) [0x7fdf788c86db]
 18: (clone()+0x3f) [0x7fdf77aa988f]

Attached a debug-ms=10 file, however I can't reproduce as I fixed it by killing the ceph-fuse client on our Postgres server et restarting the MDS.

Files

Download all files

ceph-mds.log.gz (225 KB) ceph-mds.log.gz		Jérôme Poulin, 02/22/2019 05:06 PM
ceph-mds.log.xz (82.8 KB) ceph-mds.log.xz		Jérôme Poulin, 03/04/2019 07:16 AM
ceph-mds-active.log.xz (92.7 KB) ceph-mds-active.log.xz		Jérôme Poulin, 03/04/2019 02:54 PM
ceph-mds-0x100000367e5.log (51.9 KB) ceph-mds-0x100000367e5.log		Jérôme Poulin, 03/05/2019 02:49 PM
mds_crash.tar.gz (125 KB) mds_crash.tar.gz	error scatter_lock	Felix Stolte, 12/13/2022 09:28 AM

Related issues 7 (2 open — 5 closed)

Related to CephFS - Bug #42829: tools/cephfs: linkages injected by cephfs-data-scan have first == head

Resolved

Zheng Yan

Actions

Related to CephFS - Bug #58482: mds: catch damage to CDentry's first member before persisting

Resolved

Patrick Donnelly

Actions

Related to CephFS - Bug #54546: mds: crash due to corrupt inode and omap entry

New

Patrick Donnelly

Actions

Has duplicate CephFS - Bug #41147: mds: crash loop - Server.cc 6835: FAILED ceph_assert(in->first <= straydn->first)

Duplicate

Actions

Has duplicate CephFS - Bug #54643: crash: void Server::_unlink_local(MDRequestRef&, CDentry*, CDentry*): assert(in->first <= straydn->first)

Duplicate

Actions

Has duplicate CephFS - Bug #53179: Crash when unlink in corrupted cephfs

Duplicate

Venky Shankar

Actions

Has duplicate CephFS - Bug #60669: crash: void Server::_unlink_local(MDRequestRef&, CDentry*, CDentry*): assert(in->first <= straydn->first)

New

Actions

Copy link

Updated by Patrick Donnelly about 5 years ago

Subject changed from ceph-mds assert crash loop while unlinking file to mds: assert crash loop while unlinking file
Description updated (diff)
Priority changed from Normal to High
Target version set to v15.0.0
Start date deleted (~~02/22/2019~~)
Backport set to nautilus,mimic,luminous
ceph-qa-suite deleted (fs)

Actions

Copy link

Updated by Jérôme Poulin about 5 years ago

I can reproduce the problem by deleting a specific file if you need more information. This is a development environment so I can reproduce on demand.

Actions

Copy link

Updated by Jérôme Poulin about 5 years ago

Here is the last assert I got, would you think they're caused by the same source?

     0> 2019-03-03 21:25:31.306 7f2145c48700 -1 /build/ceph-13.2.4/src/mds/Server.cc: In function 'void Server::_unlink_local(MDRequestRef&, CDentry*, CDentry*)' thread 7f2145c48700 time 2019-03-03 21:25:31.306591
/build/ceph-13.2.4/src/mds/Server.cc: 6567: FAILED assert(in->first <= straydn->first)

 ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7f214de5d3e2]
 2: (()+0x26d5a7) [0x7f214de5d5a7]
 3: (Server::_unlink_local(boost::intrusive_ptr<MDRequestImpl>&, CDentry*, CDentry*)+0x15f8) [0x559839a2c2e8]
 4: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>&)+0x961) [0x559839a2ccd1]
 5: (Server::handle_client_request(MClientRequest*)+0x49b) [0x559839a4705b]
 6: (Server::dispatch(Message*)+0x2db) [0x559839a4ad1b]
 7: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x5598399be1e4]
 8: (MDSRank::_dispatch(Message*, bool)+0x89b) [0x5598399cba1b]
 9: (MDSRank::retry_dispatch(Message*)+0x12) [0x5598399cc012]
 10: (MDSInternalContextBase::complete(int)+0x67) [0x559839c31f87]
 11: (MDSRank::_advance_queues()+0xf1) [0x5598399ca9c1]
 12: (MDSRank::_dispatch(Message*, bool)+0x2d0) [0x5598399cb450]
 13: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x5598399cbff5]
 14: (MDSDaemon::ms_dispatch(Message*)+0xd3) [0x5598399b5e03]
 15: (DispatchQueue::entry()+0xb92) [0x7f214ded7722]
 16: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f214df762cd]
 17: (()+0x76db) [0x7f214d7686db]
 18: (clone()+0x3f) [0x7f214c94988f]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions

Copy link

Updated by Zheng Yan about 5 years ago

please add 'debug_mds = 20' to ceph.conf, then trigger the assertion again

Actions

Copy link

Updated by Jérôme Poulin about 5 years ago

File ceph-mds.log.xz ceph-mds.log.xz added

Here is the whole log attached (8.4MB)

Actions

Copy link

Updated by Zheng Yan about 5 years ago

looks like you set debug_ms to 20, not debug_mds

Actions

Copy link

Updated by Jérôme Poulin about 5 years ago

Sorry, my mistake, the new file with debug-mds=20 has 520 MB now.

Since the resulting compressed file has 10 MB, do you have a preferred place I can upload the file?

Also, do you have a GPG key I can encrypt the file with since it seems to contain a lot of data (or is it only metadata)?

I can also remove the replay/rejoin/reconnect phase if you prefer to have it joined to this ticket.

Actions

Copy link

Updated by Zheng Yan about 5 years ago

removing replay/rejoin/reconnect should be good

Actions

Copy link

Updated by Jérôme Poulin about 5 years ago

File ceph-mds-active.log.xz ceph-mds-active.log.xz added

Here you go.

Actions

Copy link

#10

Updated by Zheng Yan about 5 years ago

[inode 0x100000367e5 [head,head] /pg_xlog_archives/9.6/smobile/000000200000002C000000BB.00000028.backup.zst auth v145794 ap=2+0 s=193 n(v0 rc2019-03-03 21:27:34.226503) (iversion lock w=1 last_client=25558652) caps={25432667=pAsXsFs/-@2,25558652=pAsXsFscr/-@1} | request=1 lock=2 caps=1 dirty=0 waiter=0 authpin=1 0x55933ec87100]

looks like a field of the inode is corrupted. no idea what happend. please grep 'inode 0x100000367e5' from the full log and upload the result.

Actions

Copy link

#11

Updated by Jérôme Poulin about 5 years ago

File ceph-mds-0x100000367e5.log ceph-mds-0x100000367e5.log added

Actions

Copy link

#12

Updated by Zheng Yan about 5 years ago

there is no clue in the log. If you want to fix the issue, you can:

- ceph daemon mds.xx flush journal
- stop mds
- rados -p cephfs_metadata_pool rmomapkey 1000000088b.01800000 000000200000002C000000BB.00000028.backup.zst_head
- restart mds

Actions

Copy link

#13

Updated by Patrick Donnelly about 5 years ago

Target version deleted (~~v15.0.0~~)

Actions

Copy link

#14

Updated by Jérôme Poulin about 5 years ago

I had to repeat the procedure a couple time before everything cleaned up, but now it is back to normal.

I included a hexdump of a key for your reference.

┌─│15:37:52│0│$ ~
└──╼ rados --cluster dev -p cephfsmeta listomapkeys 1000000088b.01800000 | grep 000000200000002C000000BD.zst
000000200000002C000000BD.zst_head
┌─│15:38:03│0│jerome.poulin@JeromeCauca:~
└──╼ rados --cluster dev -p cephfsmeta rmomapkey 1000000088b.01800000 000000200000002C000000BD.zst_head
┌─│15:38:20│0│$ ~
└──╼ rados --cluster dev -p cephfsmeta listomapkeys 1000000088b.01800000 | grep 000000200000002C000000C2.zst
000000200000002C000000C2.zst_head
┌─│15:38:49│0│$ ~
└──╼ rados --cluster dev -p cephfsmeta rmomapkey 1000000088b.01800000 000000200000002C000000C2.zst_head
┌─│15:38:55│0│$ ~
└──╼ rados --cluster dev -p cephfsmeta listomapkeys 1000000088b.01800000 | grep 000000200000002C000000BE.zst
000000200000002C000000BE.zst_head
┌─│15:39:30│0│$ ~
└──╼ rados --cluster dev -p cephfsmeta getomapval 1000000088b.01800000 000000200000002C000000BE.zst_head
value (462 bytes) :
00000000  fe ff ff ff ff ff ff ff  49 0f 06 a3 01 00 00 24  |........I......$|
00000010  68 03 00 00 01 00 00 00  00 00 00 16 8d 7c 5c f6  |h............|\.|
00000020  62 bc 0d 00 80 00 00 00  00 00 00 00 00 00 00 01  |b...............|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 02 02 18 00  |................|
00000040  00 00 00 00 40 00 01 00  00 00 00 00 40 00 06 00  |....@.......@...|
00000050  00 00 00 00 00 00 00 00  00 00 c4 b1 2c 00 00 00  |............,...|
00000060  00 00 01 00 00 00 ff ff  ff ff ff ff ff ff 00 00  |................|
00000070  00 00 00 00 00 00 00 00  00 00 ca 93 a8 5b 00 00  |.............[..|
00000080  00 00 ca 93 a8 5b 00 00  00 00 00 00 00 00 00 00  |.....[..........|
00000090  00 00 03 02 28 00 00 00  00 00 00 00 00 00 00 00  |....(...........|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000c0  03 02 38 00 00 00 00 00  00 00 00 00 00 00 00 00  |..8.............|
000000d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000f0  00 00 00 00 00 00 16 8d  7c 5c f6 62 bc 0d 03 02  |........|\.b....|
00000100  38 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |8...............|
00000110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000130  00 00 00 00 16 8d 7c 5c  f6 62 bc 0d 85 39 02 00  |......|\.b...9..|
00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000150  00 00 00 00 01 00 00 00  00 00 00 00 00 00 00 00  |................|
00000160  c4 b1 2c 00 00 00 00 00  ff ff ff ff ff ff ff ff  |..,.............|
00000170  00 00 00 00 01 01 10 00  00 00 00 00 00 00 00 00  |................|
00000180  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001a0  00 00 00 00 00 00 01 00  00 00 00 00 00 00 ff ff  |................|
000001b0  ff ff 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000001c0  00 00 fe ff ff ff ff ff  ff ff 00 00 00 00        |..............|
000001ce

┌─│15:39:41│0│$ ~
└──╼ rados --cluster dev -p cephfsmeta rmomapkey 1000000088b.01800000 000000200000002C000000BE.zst_head
┌─│15:39:49│0│$ ~
└──╼ rados --cluster dev -p cephfsmeta rmomapkey 1000000088b.01800000 000000200000002C000000BA.zst_head

Actions

Copy link

#15

Updated by Patrick Donnelly over 4 years ago

Related to Bug #42829: tools/cephfs: linkages injected by cephfs-data-scan have first == head added

Actions

Copy link

#16

Updated by Patrick Donnelly over 4 years ago

Status changed from New to Resolved
Assignee set to Zheng Yan
Backport deleted (~~nautilus,mimic,luminous~~)

Actions

Copy link

#17

Updated by Patrick Donnelly almost 3 years ago

Has duplicate Bug #41147: mds: crash loop - Server.cc 6835: FAILED ceph_assert(in->first <= straydn->first) added

Actions

Copy link

#18

Updated by Patrick Donnelly over 1 year ago

Category set to Correctness/Safety
Status changed from Resolved to Need More Info
Assignee deleted (~~Zheng Yan~~)
Priority changed from High to Urgent
Target version set to v18.0.0

Actions

Copy link

#19

Updated by Patrick Donnelly over 1 year ago

Has duplicate Bug #54643: crash: void Server::_unlink_local(MDRequestRef&, CDentry*, CDentry*): assert(in->first <= straydn->first) added

Actions

Copy link

#20

Updated by Patrick Donnelly over 1 year ago

Has duplicate Bug #53179: Crash when unlink in corrupted cephfs added

Actions

Copy link

#21

Updated by Felix Stolte over 1 year ago

File mds_crash.tar.gz mds_crash.tar.gz added

Last week our mds crashed only once with the following error:

2022-12-08T13:05:48.919+0100 7f440afec700 -1 /build/ceph-16.2.10/src/mds/ScatterLock.h: In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f440afec700 time 2022-12-08T13:05:48.921223+0100
/build/ceph-16.2.10/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state LOCK_XLOCK || state LOCK_XLOCKDONE)

At 13:05 a snapshot was created and at 13:18 we observed the unlink_local error again, leading to the crash loop we killed the mds session:

2022-12-08T13:18:57.756+0100 7fec7d8fd700 1 /build/ceph-16.2.10/src/mds/Server.cc: In function 'void Server::_unlink_local(MDRequestRef&, CDentry*, CDentry*)' thread 7fec7d8fd700 time 2022-12-08T13:18:57
.757771+0100
/build/ceph-16.2.10/src/mds/Server.cc: 7806: FAILED ceph_assert(in>first <= straydn->first)

User confirmed the file in question was copied to the cephfs shortly before snapshot creation.

While debugging the issue we realizied we have a MTU mismatch. All OSDs and Clients used a MTU 9000 while the MDS was using MTU 1500. Client in question is using cephfs kernel mount with Kernel 5.4.0-135. I attached the error log of the first crash with scatter_lock assuming it was the cause unlink_local problem.

Actions

Copy link

#22