Bug #54546: mds: crash due to corrupt inode and omap entry - CephFS - Ceph

Actions

Copy link

Bug #54546

open

mds: crash due to corrupt inode and omap entry

Added by Venky Shankar about 2 years ago. Updated 10 months ago.

Status:

New

Priority:

Normal

Assignee:

Patrick Donnelly

Category:

Correctness/Safety

Target version:

% Done:

100%

Source:

Community (dev)

Tags:

Backport:

quincy, pacific

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

A corrupted on-disk inode causes the MDS to crash with an assert. The backtrace looks something like:

 1: (()+0x4058c0) [0x5588549cd8c0]
 2: (()+0x12890) [0x7fdf788d3890]
 3: (gsignal()+0xc7) [0x7fdf779c6e97]
 4: (abort()+0x141) [0x7fdf779c8801]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x250) [0x7fdf78fbd530]
 6: (()+0x26d5a7) [0x7fdf78fbd5a7]
 7: (Server::_unlink_local(boost::intrusive_ptr<MDRequestImpl>&, CDentry*, CDentry*)+0x15f8) [0x5588547482e8]
 8: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>&)+0x961) [0x558854748cd1]
 9: (Server::handle_client_request(MClientRequest*)+0x49b) [0x55885476305b]
 10: (Server::dispatch(Message*)+0x2db) [0x558854766d1b]
 11: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x5588546da1e4]
 12: (MDSRank::_dispatch(Message*, bool)+0x89b) [0x5588546e7a1b]
 13: (MDSRank::retry_dispatch(Message*)+0x12) [0x5588546e8012]
 14: (MDSInternalContextBase::complete(int)+0x67) [0x55885494df87]
 15: (MDSRank::_advance_queues()+0xf1) [0x5588546e69c1]
 16: (MDSRank::ProgressThread::entry()+0x43) [0x5588546e7043]
 17: (()+0x76db) [0x7fdf788c86db]
 18: (clone()+0x3f) [0x7fdf77aa988f]

This is hard to reproduce and was seen on a couple of occasions with PostgreSQL doing I/O on CephFS. AFAICT, this happend even with latest pacific/master. The failed assert this in Server::_unlink_local():

  if (straydn) {
    ceph_assert(in->first <= straydn->first);
    in->first = straydn->first;
  }

`in->first` is not a sane value. From one setup the inode was something like (not the nonsensically large value):

[inode 0x1000014e91a [...1000010a4a0,head"

Surprisingly, `1000010a4a0` is the inode of another (ancestor) directory:

debug 2022-02-23 11:37:33.107 7f59a55ca700 15 mds.0.cache  chose lock states on [inode 0x1000010a4a0 [...d2,head] /path/to/ancestor auth v7874399 snaprealm=0x55b5cc463900 f(v0 m2021-06-25 07:24:25.703543 1=0+1) n(v252768 rc2022-02-23 11:36:29.112471 b226649760 2906=2877+29) (inest lock dirty) (iversion lock) | dirtyscattered=1 dirfrag=1 dirty=1 0x55b5c9c3b100]

This is same as tracker https://tracker.ceph.com/issues/38452 where in note-14 the omap dump also confirms the same:

00000000  fe ff ff ff ff ff ff ff  49 0f 06 a3 01 00 00 24  |........I......$|
00000010  68 03 00 00 01 00 00 00  00 00 00 16 8d 7c 5c f6  |h............|\.|
00000020  62 bc 0d 00 80 00 00 00  00 00 00 00 00 00 00 01  |b...............|

Although in this case the value is 0xfffffffffffffffe.

Subtasks 1 (0 open — 1 closed)

Related issues 3 (3 open — 0 closed)

Actions

Copy link

Updated by Venky Shankar about 2 years ago

Related to Feature #55414: mds:asok interface to cleanup permanently damaged inodes added

Actions

Copy link

Updated by Venky Shankar almost 2 years ago

Saw this in another cluster. The corruption is seen in the EMetaBlob journal event. The inode+dentry fetch from the journal (fullbit) has corrupted `dnfirst' field. This narrows down the scope of the problem to code paths that are journaling operations.

Actions

Copy link