Bug #54546
openmds: crash due to corrupt inode and omap entry
100%
Description
A corrupted on-disk inode causes the MDS to crash with an assert. The backtrace looks something like:
1: (()+0x4058c0) [0x5588549cd8c0] 2: (()+0x12890) [0x7fdf788d3890] 3: (gsignal()+0xc7) [0x7fdf779c6e97] 4: (abort()+0x141) [0x7fdf779c8801] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x250) [0x7fdf78fbd530] 6: (()+0x26d5a7) [0x7fdf78fbd5a7] 7: (Server::_unlink_local(boost::intrusive_ptr<MDRequestImpl>&, CDentry*, CDentry*)+0x15f8) [0x5588547482e8] 8: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>&)+0x961) [0x558854748cd1] 9: (Server::handle_client_request(MClientRequest*)+0x49b) [0x55885476305b] 10: (Server::dispatch(Message*)+0x2db) [0x558854766d1b] 11: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x5588546da1e4] 12: (MDSRank::_dispatch(Message*, bool)+0x89b) [0x5588546e7a1b] 13: (MDSRank::retry_dispatch(Message*)+0x12) [0x5588546e8012] 14: (MDSInternalContextBase::complete(int)+0x67) [0x55885494df87] 15: (MDSRank::_advance_queues()+0xf1) [0x5588546e69c1] 16: (MDSRank::ProgressThread::entry()+0x43) [0x5588546e7043] 17: (()+0x76db) [0x7fdf788c86db] 18: (clone()+0x3f) [0x7fdf77aa988f]
This is hard to reproduce and was seen on a couple of occasions with PostgreSQL doing I/O on CephFS. AFAICT, this happend even with latest pacific/master. The failed assert this in Server::_unlink_local():
if (straydn) { ceph_assert(in->first <= straydn->first); in->first = straydn->first; }
`in->first` is not a sane value. From one setup the inode was something like (not the nonsensically large value):
[inode 0x1000014e91a [...1000010a4a0,head"
Surprisingly, `1000010a4a0` is the inode of another (ancestor) directory:
debug 2022-02-23 11:37:33.107 7f59a55ca700 15 mds.0.cache chose lock states on [inode 0x1000010a4a0 [...d2,head] /path/to/ancestor auth v7874399 snaprealm=0x55b5cc463900 f(v0 m2021-06-25 07:24:25.703543 1=0+1) n(v252768 rc2022-02-23 11:36:29.112471 b226649760 2906=2877+29) (inest lock dirty) (iversion lock) | dirtyscattered=1 dirfrag=1 dirty=1 0x55b5c9c3b100]
This is same as tracker https://tracker.ceph.com/issues/38452 where in note-14 the omap dump also confirms the same:
00000000 fe ff ff ff ff ff ff ff 49 0f 06 a3 01 00 00 24 |........I......$| 00000010 68 03 00 00 01 00 00 00 00 00 00 16 8d 7c 5c f6 |h............|\.| 00000020 62 bc 0d 00 80 00 00 00 00 00 00 00 00 00 00 01 |b...............|
Although in this case the value is 0xfffffffffffffffe.
Updated by Venky Shankar about 2 years ago
- Related to Feature #55414: mds:asok interface to cleanup permanently damaged inodes added
Updated by Venky Shankar almost 2 years ago
Saw this in another cluster. The corruption is seen in the EMetaBlob journal event. The inode+dentry fetch from the journal (fullbit) has corrupted `dnfirst' field. This narrows down the scope of the problem to code paths that are journaling operations.
Updated by Venky Shankar almost 2 years ago
- Assignee changed from Venky Shankar to Patrick Donnelly
Patrick, assigning this to you since you are making progress on this.
Updated by Patrick Donnelly over 1 year ago
- Related to Feature #56140: cephfs: tooling to identify inode (metadata) corruption added
Updated by Dhairya Parmar 10 months ago
- Related to Bug #38452: mds: assert crash loop while unlinking file added