Project

General

Profile

Bug #54384

mds: crash due to seemingly unrecoverable metadata error

Added by Xiubo Li 9 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
backport_processed
Backport:
quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From: https://www.spinics.net/lists/ceph-users/msg71028.html

Reported by Wolfgang Mair

Hi

I have a weird problem with my ceph cluster:

basic info:

 - 3-node cluster
 - cephfs runs on three data pools:
    - cephfs_meta (replicated)
    - ec_basic (erasure coded)
    - ec_sensitive (erasure coded with higher redundancy)

My MDS keeps crashing with a bad backtrace error:
2022-02-21T16:11:09.661+0100 7fd2cd290700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10002000f5d

So far so good. To my best understanding these metadata errors should be fixed by following the disaster recovery procedure described here: https://docs.ceph.com/en/nautilus/cephfs/disaster-recovery-experts/

However, the weird part is: the error remains unchanged. Even directly after resetting, i.e. before recreating metadata objects, the error does not change.

Is there something else that i need to reset?
I have already tried to delete the corrupt inode via rmomapkey, i.e. rados -p cephfs_meta listomapkeys 10002000f5d.00000000  returns empty

Any suggestions on how to proceed? Any hints are appreciated!

MDS Log:

--------------------------
Feb 21 16:11:07 herta systemd[1]: Started Ceph metadata server daemon.
Feb 21 16:11:07 herta ceph-mds[128287]: starting mds.herta at
Feb 21 16:11:09 herta ceph-mds[128287]: 2022-02-21T16:11:09.661+0100 7fd2cd290700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10002000f5d Feb 21 16:11:10 herta ceph-mds[128287]: ./src/mds/CInode.cc: In function 'CDir* CInode::get_or_open_dirfrag(MDCache*, frag_t)' thread 7fd2cd290700 time 2022-02-21T16:11:10.629363+0100 Feb 21 16:11:10 herta ceph-mds[128287]: ./src/mds/CInode.cc: 785: FAILED ceph_assert(is_dir()) Feb 21 16:11:10 herta ceph-mds[128287]:  ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) Feb 21 16:11:10 herta ceph-mds[128287]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x124) [0x7fd2d876e046] Feb 21 16:11:10 herta ceph-mds[128287]:  2: /usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7fd2d876e1d1] Feb 21 16:11:10 herta ceph-mds[128287]:  3: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x557ec94be365] Feb 21 16:11:10 herta ceph-mds[128287]:  4: (OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x557ec956645d] Feb 21 16:11:10 herta ceph-mds[128287]:  5: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  6: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x557ec920dd58] Feb 21 16:11:10 herta ceph-mds[128287]:  7: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x557ec935bfc8] Feb 21 16:11:10 herta ceph-mds[128287]:  8: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v15_2_0::list&, int)+0x277) [0x557ec9363717] Feb 21 16:11:10 herta ceph-mds[128287]:  9: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  10: (MDSIOContextBase::complete(int)+0x524) [0x557ec95380f4] Feb 21 16:11:10 herta ceph-mds[128287]:  11: (Finisher::finisher_thread_entry()+0x18d) [0x7fd2d880bc0d] Feb 21 16:11:10 herta ceph-mds[128287]:  12: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd2d84c9ea7]
Feb 21 16:11:10 herta ceph-mds[128287]:  13: clone()
Feb 21 16:11:10 herta ceph-mds[128287]: *** Caught signal (Aborted) **
Feb 21 16:11:10 herta ceph-mds[128287]:  in thread 7fd2cd290700 thread_name:MR_Finisher Feb 21 16:11:10 herta ceph-mds[128287]: 2022-02-21T16:11:10.625+0100 7fd2cd290700 -1 ./src/mds/CInode.cc: In function 'CDir* CInode::get_or_open_dirfrag(MDCache*, frag_t)' thread 7fd2cd290700 time 2022-02-21T16:11:10.629363+0100 Feb 21 16:11:10 herta ceph-mds[128287]: ./src/mds/CInode.cc: 785: FAILED ceph_assert(is_dir()) Feb 21 16:11:10 herta ceph-mds[128287]:  ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) Feb 21 16:11:10 herta ceph-mds[128287]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x124) [0x7fd2d876e046] Feb 21 16:11:10 herta ceph-mds[128287]:  2: /usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7fd2d876e1d1] Feb 21 16:11:10 herta ceph-mds[128287]:  3: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x557ec94be365] Feb 21 16:11:10 herta ceph-mds[128287]:  4: (OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x557ec956645d] Feb 21 16:11:10 herta ceph-mds[128287]:  5: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  6: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x557ec920dd58] Feb 21 16:11:10 herta ceph-mds[128287]:  7: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x557ec935bfc8] Feb 21 16:11:10 herta ceph-mds[128287]:  8: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v15_2_0::list&, int)+0x277) [0x557ec9363717] Feb 21 16:11:10 herta ceph-mds[128287]:  9: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  10: (MDSIOContextBase::complete(int)+0x524) [0x557ec95380f4] Feb 21 16:11:10 herta ceph-mds[128287]:  11: (Finisher::finisher_thread_entry()+0x18d) [0x7fd2d880bc0d] Feb 21 16:11:10 herta ceph-mds[128287]:  12: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd2d84c9ea7]
Feb 21 16:11:10 herta ceph-mds[128287]:  13: clone()
Feb 21 16:11:10 herta ceph-mds[128287]:  ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) Feb 21 16:11:10 herta ceph-mds[128287]:  1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7fd2d84d5140]
Feb 21 16:11:10 herta ceph-mds[128287]:  2: gsignal()
Feb 21 16:11:10 herta ceph-mds[128287]:  3: abort()
Feb 21 16:11:10 herta ceph-mds[128287]:  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x16e) [0x7fd2d876e090] Feb 21 16:11:10 herta ceph-mds[128287]:  5: /usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7fd2d876e1d1] Feb 21 16:11:10 herta ceph-mds[128287]:  6: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x557ec94be365] Feb 21 16:11:10 herta ceph-mds[128287]:  7: (OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x557ec956645d] Feb 21 16:11:10 herta ceph-mds[128287]:  8: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  9: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x557ec920dd58] Feb 21 16:11:10 herta ceph-mds[128287]:  10: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x557ec935bfc8] Feb 21 16:11:10 herta ceph-mds[128287]:  11: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v15_2_0::list&, int)+0x277) [0x557ec9363717] Feb 21 16:11:10 herta ceph-mds[128287]:  12: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  13: (MDSIOContextBase::complete(int)+0x524) [0x557ec95380f4] Feb 21 16:11:10 herta ceph-mds[128287]:  14: (Finisher::finisher_thread_entry()+0x18d) [0x7fd2d880bc0d] Feb 21 16:11:10 herta ceph-mds[128287]:  15: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd2d84c9ea7]
Feb 21 16:11:10 herta ceph-mds[128287]:  16: clone()
Feb 21 16:11:10 herta ceph-mds[128287]: 2022-02-21T16:11:10.629+0100 7fd2cd290700 -1 *** Caught signal (Aborted) ** Feb 21 16:11:10 herta ceph-mds[128287]:  in thread 7fd2cd290700 thread_name:MR_Finisher Feb 21 16:11:10 herta ceph-mds[128287]:  ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) Feb 21 16:11:10 herta ceph-mds[128287]:  1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7fd2d84d5140]
Feb 21 16:11:10 herta ceph-mds[128287]:  2: gsignal()
Feb 21 16:11:10 herta ceph-mds[128287]:  3: abort()
Feb 21 16:11:10 herta ceph-mds[128287]:  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x16e) [0x7fd2d876e090] Feb 21 16:11:10 herta ceph-mds[128287]:  5: /usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7fd2d876e1d1] Feb 21 16:11:10 herta ceph-mds[128287]:  6: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x557ec94be365] Feb 21 16:11:10 herta ceph-mds[128287]:  7: (OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x557ec956645d] Feb 21 16:11:10 herta ceph-mds[128287]:  8: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  9: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x557ec920dd58] Feb 21 16:11:10 herta ceph-mds[128287]:  10: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x557ec935bfc8] Feb 21 16:11:10 herta ceph-mds[128287]:  11: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v15_2_0::list&, int)+0x277) [0x557ec9363717] Feb 21 16:11:10 herta ceph-mds[128287]:  12: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  13: (MDSIOContextBase::complete(int)+0x524) [0x557ec95380f4] Feb 21 16:11:10 herta ceph-mds[128287]:  14: (Finisher::finisher_thread_entry()+0x18d) [0x7fd2d880bc0d] Feb 21 16:11:10 herta ceph-mds[128287]:  15: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd2d84c9ea7]
Feb 21 16:11:10 herta ceph-mds[128287]:  16: clone()
Feb 21 16:11:10 herta ceph-mds[128287]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Feb 21 16:11:10 herta ceph-mds[128287]:  -1430> 2022-02-21T16:11:09.661+0100 7fd2cd290700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10002000f5d Feb 21 16:11:10 herta ceph-mds[128287]:  -1429> 2022-02-21T16:11:10.625+0100 7fd2cd290700 -1 ./src/mds/CInode.cc: In function 'CDir* CInode::get_or_open_dirfrag(MDCache*, frag_t)' thread 7fd2cd290700 time 2022-02-21T16:11:10.629363+0100 Feb 21 16:11:10 herta ceph-mds[128287]: ./src/mds/CInode.cc: 785: FAILED ceph_assert(is_dir()) Feb 21 16:11:10 herta ceph-mds[128287]:  ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) Feb 21 16:11:10 herta ceph-mds[128287]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x124) [0x7fd2d876e046] Feb 21 16:11:10 herta ceph-mds[128287]:  2: /usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7fd2d876e1d1] Feb 21 16:11:10 herta ceph-mds[128287]:  3: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x557ec94be365] Feb 21 16:11:10 herta ceph-mds[128287]:  4: (OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x557ec956645d] Feb 21 16:11:10 herta ceph-mds[128287]:  5: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  6: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x557ec920dd58] Feb 21 16:11:10 herta ceph-mds[128287]:  7: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x557ec935bfc8] Feb 21 16:11:10 herta ceph-mds[128287]:  8: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v15_2_0::list&, int)+0x277) [0x557ec9363717] Feb 21 16:11:10 herta ceph-mds[128287]:  9: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  10: (MDSIOContextBase::complete(int)+0x524) [0x557ec95380f4] Feb 21 16:11:10 herta ceph-mds[128287]:  11: (Finisher::finisher_thread_entry()+0x18d) [0x7fd2d880bc0d] Feb 21 16:11:10 herta ceph-mds[128287]:  12: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd2d84c9ea7]
Feb 21 16:11:10 herta ceph-mds[128287]:  13: clone()
Feb 21 16:11:10 herta ceph-mds[128287]:  -1428> 2022-02-21T16:11:10.629+0100 7fd2cd290700 -1 *** Caught signal (Aborted) ** Feb 21 16:11:10 herta ceph-mds[128287]:  in thread 7fd2cd290700 thread_name:MR_Finisher Feb 21 16:11:10 herta ceph-mds[128287]:  ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) Feb 21 16:11:10 herta ceph-mds[128287]:  1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7fd2d84d5140]
Feb 21 16:11:10 herta ceph-mds[128287]:  2: gsignal()
Feb 21 16:11:10 herta ceph-mds[128287]:  3: abort()
Feb 21 16:11:10 herta ceph-mds[128287]:  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x16e) [0x7fd2d876e090] Feb 21 16:11:10 herta ceph-mds[128287]:  5: /usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7fd2d876e1d1] Feb 21 16:11:10 herta ceph-mds[128287]:  6: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x557ec94be365] Feb 21 16:11:10 herta ceph-mds[128287]:  7: (OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x557ec956645d] Feb 21 16:11:10 herta ceph-mds[128287]:  8: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  9: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x557ec920dd58] Feb 21 16:11:10 herta ceph-mds[128287]:  10: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x557ec935bfc8] Feb 21 16:11:10 herta ceph-mds[128287]:  11: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v15_2_0::list&, int)+0x277) [0x557ec9363717] Feb 21 16:11:10 herta ceph-mds[128287]:  12: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]:  13: (MDSIOContextBase::complete(int)+0x524) [0x557ec95380f4] Feb 21 16:11:10 herta ceph-mds[128287]:  14: (Finisher::finisher_thread_entry()+0x18d) [0x7fd2d880bc0d] Feb 21 16:11:10 herta ceph-mds[128287]:  15: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd2d84c9ea7]
Feb 21 16:11:10 herta ceph-mds[128287]:  16: clone()
Feb 21 16:11:10 herta ceph-mds[128287]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues

Copied to CephFS - Backport #56461: quincy: mds: crash due to seemingly unrecoverable metadata error Resolved
Copied to CephFS - Backport #56462: pacific: mds: crash due to seemingly unrecoverable metadata error Resolved

History

#1 Updated by Xiubo Li 9 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 45143

#2 Updated by Venky Shankar 5 months ago

  • Status changed from Fix Under Review to Pending Backport

#3 Updated by Backport Bot 5 months ago

  • Copied to Backport #56461: quincy: mds: crash due to seemingly unrecoverable metadata error added

#4 Updated by Backport Bot 5 months ago

  • Copied to Backport #56462: pacific: mds: crash due to seemingly unrecoverable metadata error added

#5 Updated by Backport Bot 4 months ago

  • Tags set to backport_processed

#6 Updated by Xiubo Li 3 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF