Bug #54384
mds: crash due to seemingly unrecoverable metadata error
% Done:
0%
Source:
Tags:
backport_processed
Backport:
quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
From: https://www.spinics.net/lists/ceph-users/msg71028.html
Reported by Wolfgang Mair
Hi I have a weird problem with my ceph cluster: basic info: - 3-node cluster - cephfs runs on three data pools: - cephfs_meta (replicated) - ec_basic (erasure coded) - ec_sensitive (erasure coded with higher redundancy) My MDS keeps crashing with a bad backtrace error: 2022-02-21T16:11:09.661+0100 7fd2cd290700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10002000f5d So far so good. To my best understanding these metadata errors should be fixed by following the disaster recovery procedure described here: https://docs.ceph.com/en/nautilus/cephfs/disaster-recovery-experts/ However, the weird part is: the error remains unchanged. Even directly after resetting, i.e. before recreating metadata objects, the error does not change. Is there something else that i need to reset? I have already tried to delete the corrupt inode via rmomapkey, i.e. rados -p cephfs_meta listomapkeys 10002000f5d.00000000 returns empty Any suggestions on how to proceed? Any hints are appreciated! MDS Log: -------------------------- Feb 21 16:11:07 herta systemd[1]: Started Ceph metadata server daemon. Feb 21 16:11:07 herta ceph-mds[128287]: starting mds.herta at Feb 21 16:11:09 herta ceph-mds[128287]: 2022-02-21T16:11:09.661+0100 7fd2cd290700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10002000f5d Feb 21 16:11:10 herta ceph-mds[128287]: ./src/mds/CInode.cc: In function 'CDir* CInode::get_or_open_dirfrag(MDCache*, frag_t)' thread 7fd2cd290700 time 2022-02-21T16:11:10.629363+0100 Feb 21 16:11:10 herta ceph-mds[128287]: ./src/mds/CInode.cc: 785: FAILED ceph_assert(is_dir()) Feb 21 16:11:10 herta ceph-mds[128287]: ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) Feb 21 16:11:10 herta ceph-mds[128287]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x124) [0x7fd2d876e046] Feb 21 16:11:10 herta ceph-mds[128287]: 2: /usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7fd2d876e1d1] Feb 21 16:11:10 herta ceph-mds[128287]: 3: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x557ec94be365] Feb 21 16:11:10 herta ceph-mds[128287]: 4: (OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x557ec956645d] Feb 21 16:11:10 herta ceph-mds[128287]: 5: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]: 6: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x557ec920dd58] Feb 21 16:11:10 herta ceph-mds[128287]: 7: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x557ec935bfc8] Feb 21 16:11:10 herta ceph-mds[128287]: 8: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v15_2_0::list&, int)+0x277) [0x557ec9363717] Feb 21 16:11:10 herta ceph-mds[128287]: 9: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]: 10: (MDSIOContextBase::complete(int)+0x524) [0x557ec95380f4] Feb 21 16:11:10 herta ceph-mds[128287]: 11: (Finisher::finisher_thread_entry()+0x18d) [0x7fd2d880bc0d] Feb 21 16:11:10 herta ceph-mds[128287]: 12: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd2d84c9ea7] Feb 21 16:11:10 herta ceph-mds[128287]: 13: clone() Feb 21 16:11:10 herta ceph-mds[128287]: *** Caught signal (Aborted) ** Feb 21 16:11:10 herta ceph-mds[128287]: in thread 7fd2cd290700 thread_name:MR_Finisher Feb 21 16:11:10 herta ceph-mds[128287]: 2022-02-21T16:11:10.625+0100 7fd2cd290700 -1 ./src/mds/CInode.cc: In function 'CDir* CInode::get_or_open_dirfrag(MDCache*, frag_t)' thread 7fd2cd290700 time 2022-02-21T16:11:10.629363+0100 Feb 21 16:11:10 herta ceph-mds[128287]: ./src/mds/CInode.cc: 785: FAILED ceph_assert(is_dir()) Feb 21 16:11:10 herta ceph-mds[128287]: ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) Feb 21 16:11:10 herta ceph-mds[128287]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x124) [0x7fd2d876e046] Feb 21 16:11:10 herta ceph-mds[128287]: 2: /usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7fd2d876e1d1] Feb 21 16:11:10 herta ceph-mds[128287]: 3: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x557ec94be365] Feb 21 16:11:10 herta ceph-mds[128287]: 4: (OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x557ec956645d] Feb 21 16:11:10 herta ceph-mds[128287]: 5: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]: 6: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x557ec920dd58] Feb 21 16:11:10 herta ceph-mds[128287]: 7: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x557ec935bfc8] Feb 21 16:11:10 herta ceph-mds[128287]: 8: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v15_2_0::list&, int)+0x277) [0x557ec9363717] Feb 21 16:11:10 herta ceph-mds[128287]: 9: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]: 10: (MDSIOContextBase::complete(int)+0x524) [0x557ec95380f4] Feb 21 16:11:10 herta ceph-mds[128287]: 11: (Finisher::finisher_thread_entry()+0x18d) [0x7fd2d880bc0d] Feb 21 16:11:10 herta ceph-mds[128287]: 12: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd2d84c9ea7] Feb 21 16:11:10 herta ceph-mds[128287]: 13: clone() Feb 21 16:11:10 herta ceph-mds[128287]: ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) Feb 21 16:11:10 herta ceph-mds[128287]: 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7fd2d84d5140] Feb 21 16:11:10 herta ceph-mds[128287]: 2: gsignal() Feb 21 16:11:10 herta ceph-mds[128287]: 3: abort() Feb 21 16:11:10 herta ceph-mds[128287]: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x16e) [0x7fd2d876e090] Feb 21 16:11:10 herta ceph-mds[128287]: 5: /usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7fd2d876e1d1] Feb 21 16:11:10 herta ceph-mds[128287]: 6: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x557ec94be365] Feb 21 16:11:10 herta ceph-mds[128287]: 7: (OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x557ec956645d] Feb 21 16:11:10 herta ceph-mds[128287]: 8: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]: 9: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x557ec920dd58] Feb 21 16:11:10 herta ceph-mds[128287]: 10: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x557ec935bfc8] Feb 21 16:11:10 herta ceph-mds[128287]: 11: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v15_2_0::list&, int)+0x277) [0x557ec9363717] Feb 21 16:11:10 herta ceph-mds[128287]: 12: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]: 13: (MDSIOContextBase::complete(int)+0x524) [0x557ec95380f4] Feb 21 16:11:10 herta ceph-mds[128287]: 14: (Finisher::finisher_thread_entry()+0x18d) [0x7fd2d880bc0d] Feb 21 16:11:10 herta ceph-mds[128287]: 15: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd2d84c9ea7] Feb 21 16:11:10 herta ceph-mds[128287]: 16: clone() Feb 21 16:11:10 herta ceph-mds[128287]: 2022-02-21T16:11:10.629+0100 7fd2cd290700 -1 *** Caught signal (Aborted) ** Feb 21 16:11:10 herta ceph-mds[128287]: in thread 7fd2cd290700 thread_name:MR_Finisher Feb 21 16:11:10 herta ceph-mds[128287]: ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) Feb 21 16:11:10 herta ceph-mds[128287]: 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7fd2d84d5140] Feb 21 16:11:10 herta ceph-mds[128287]: 2: gsignal() Feb 21 16:11:10 herta ceph-mds[128287]: 3: abort() Feb 21 16:11:10 herta ceph-mds[128287]: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x16e) [0x7fd2d876e090] Feb 21 16:11:10 herta ceph-mds[128287]: 5: /usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7fd2d876e1d1] Feb 21 16:11:10 herta ceph-mds[128287]: 6: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x557ec94be365] Feb 21 16:11:10 herta ceph-mds[128287]: 7: (OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x557ec956645d] Feb 21 16:11:10 herta ceph-mds[128287]: 8: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]: 9: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x557ec920dd58] Feb 21 16:11:10 herta ceph-mds[128287]: 10: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x557ec935bfc8] Feb 21 16:11:10 herta ceph-mds[128287]: 11: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v15_2_0::list&, int)+0x277) [0x557ec9363717] Feb 21 16:11:10 herta ceph-mds[128287]: 12: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]: 13: (MDSIOContextBase::complete(int)+0x524) [0x557ec95380f4] Feb 21 16:11:10 herta ceph-mds[128287]: 14: (Finisher::finisher_thread_entry()+0x18d) [0x7fd2d880bc0d] Feb 21 16:11:10 herta ceph-mds[128287]: 15: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd2d84c9ea7] Feb 21 16:11:10 herta ceph-mds[128287]: 16: clone() Feb 21 16:11:10 herta ceph-mds[128287]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Feb 21 16:11:10 herta ceph-mds[128287]: -1430> 2022-02-21T16:11:09.661+0100 7fd2cd290700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10002000f5d Feb 21 16:11:10 herta ceph-mds[128287]: -1429> 2022-02-21T16:11:10.625+0100 7fd2cd290700 -1 ./src/mds/CInode.cc: In function 'CDir* CInode::get_or_open_dirfrag(MDCache*, frag_t)' thread 7fd2cd290700 time 2022-02-21T16:11:10.629363+0100 Feb 21 16:11:10 herta ceph-mds[128287]: ./src/mds/CInode.cc: 785: FAILED ceph_assert(is_dir()) Feb 21 16:11:10 herta ceph-mds[128287]: ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) Feb 21 16:11:10 herta ceph-mds[128287]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x124) [0x7fd2d876e046] Feb 21 16:11:10 herta ceph-mds[128287]: 2: /usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7fd2d876e1d1] Feb 21 16:11:10 herta ceph-mds[128287]: 3: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x557ec94be365] Feb 21 16:11:10 herta ceph-mds[128287]: 4: (OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x557ec956645d] Feb 21 16:11:10 herta ceph-mds[128287]: 5: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]: 6: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x557ec920dd58] Feb 21 16:11:10 herta ceph-mds[128287]: 7: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x557ec935bfc8] Feb 21 16:11:10 herta ceph-mds[128287]: 8: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v15_2_0::list&, int)+0x277) [0x557ec9363717] Feb 21 16:11:10 herta ceph-mds[128287]: 9: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]: 10: (MDSIOContextBase::complete(int)+0x524) [0x557ec95380f4] Feb 21 16:11:10 herta ceph-mds[128287]: 11: (Finisher::finisher_thread_entry()+0x18d) [0x7fd2d880bc0d] Feb 21 16:11:10 herta ceph-mds[128287]: 12: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd2d84c9ea7] Feb 21 16:11:10 herta ceph-mds[128287]: 13: clone() Feb 21 16:11:10 herta ceph-mds[128287]: -1428> 2022-02-21T16:11:10.629+0100 7fd2cd290700 -1 *** Caught signal (Aborted) ** Feb 21 16:11:10 herta ceph-mds[128287]: in thread 7fd2cd290700 thread_name:MR_Finisher Feb 21 16:11:10 herta ceph-mds[128287]: ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) Feb 21 16:11:10 herta ceph-mds[128287]: 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7fd2d84d5140] Feb 21 16:11:10 herta ceph-mds[128287]: 2: gsignal() Feb 21 16:11:10 herta ceph-mds[128287]: 3: abort() Feb 21 16:11:10 herta ceph-mds[128287]: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x16e) [0x7fd2d876e090] Feb 21 16:11:10 herta ceph-mds[128287]: 5: /usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7fd2d876e1d1] Feb 21 16:11:10 herta ceph-mds[128287]: 6: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x557ec94be365] Feb 21 16:11:10 herta ceph-mds[128287]: 7: (OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x557ec956645d] Feb 21 16:11:10 herta ceph-mds[128287]: 8: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]: 9: (void finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > >(ceph::common::CephContext*, std::vector<MDSContext*, std::allocator<MDSContext*> >&, int)+0x98) [0x557ec920dd58] Feb 21 16:11:10 herta ceph-mds[128287]: 10: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x138) [0x557ec935bfc8] Feb 21 16:11:10 herta ceph-mds[128287]: 11: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v15_2_0::list&, int)+0x277) [0x557ec9363717] Feb 21 16:11:10 herta ceph-mds[128287]: 12: (MDSContext::complete(int)+0x50) [0x557ec9537980] Feb 21 16:11:10 herta ceph-mds[128287]: 13: (MDSIOContextBase::complete(int)+0x524) [0x557ec95380f4] Feb 21 16:11:10 herta ceph-mds[128287]: 14: (Finisher::finisher_thread_entry()+0x18d) [0x7fd2d880bc0d] Feb 21 16:11:10 herta ceph-mds[128287]: 15: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7fd2d84c9ea7] Feb 21 16:11:10 herta ceph-mds[128287]: 16: clone() Feb 21 16:11:10 herta ceph-mds[128287]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Related issues
History
#1 Updated by Xiubo Li about 2 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 45143
#2 Updated by Venky Shankar over 1 year ago
- Status changed from Fix Under Review to Pending Backport
#3 Updated by Backport Bot over 1 year ago
- Copied to Backport #56461: quincy: mds: crash due to seemingly unrecoverable metadata error added
#4 Updated by Backport Bot over 1 year ago
- Copied to Backport #56462: pacific: mds: crash due to seemingly unrecoverable metadata error added
#5 Updated by Backport Bot over 1 year ago
- Tags set to backport_processed
#6 Updated by Xiubo Li over 1 year ago
- Status changed from Pending Backport to Resolved