Project

General

Profile

Bug #47012

mds: MDCache.cc: 6418: FAILED ceph_assert(r == 0 || r == -2)

Added by Hughen X 2 months ago. Updated about 2 months ago.

Status:
Need More Info
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature:

Description

My mds.0 service (standby, active mds: 4) cyclical crash, each time the stack information is as follows:
ceph version: v14.2.10
How to resume my mds.0?

debug    -11> 2020-08-18 14:40:04.013 7f9293628700 10 mds.0.cache find_stale_fragment_freeze
debug    -10> 2020-08-18 14:40:04.013 7f9293628700 10 mds.0.snap check_osd_map need_to_purge={}
debug     -9> 2020-08-18 14:40:04.013 7f928ee1f700 10 MDSContext::complete: 21C_IO_Dir_OMAP_Fetched
debug     -8> 2020-08-18 14:40:04.013 7f928ee1f700 10 mds.0.cache.dir(0x100) _fetched header 274 bytes 10 keys for [dir 0x100 ~mds0/ [2,head] auth pv=99475292 v=99475241 cv=99475241/0 dir_auth=0 ap=23+30 state=1610612832|committing|fetching f(v0 10=0+10) n(v292209 rc2106-02-07 06:28:11.000000 b2884710 259=227+32) hs=10+0,ss=0+0 dirty=10 | child=1 subtree=1 subtreetemp=0 dirty=1 waiter=1 authpin=1 0x55734e132500]
debug     -7> 2020-08-18 14:40:04.013 7f928ee1f700 10 mds.0.cache.dir(0x100) _fetched version 99475190
debug     -6> 2020-08-18 14:40:04.013 7f928ee1f700 10  mds.0.cache.snaprealm(0x100 seq 1 0x55734b71ef00) have_past_parents_open [1,head]
debug     -5> 2020-08-18 14:40:04.013 7f928ee1f700 10  mds.0.cache.snaprealm(0x100 seq 1 0x55734b71ef00) have_past_parents_open [1,head]
debug     -4> 2020-08-18 14:40:04.013 7f928ee1f700 10 mds.0.cache.dir(0x100) auth_unpin by 0x55734e132500 on [dir 0x100 ~mds0/ [2,head] auth pv=99475292 v=99475241 cv=99475241/0 dir_auth=0 ap=22+30 state=1610612769|complete|committing f(v0 10=0+10) n(v292209 rc2106-02-07 06:28:11.000000 b2884710 259=227+32) hs=10+0,ss=0+0 dirty=10 | child=1 subtree=1 subtreetemp=0 dirty=1 waiter=1 authpin=1 0x55734e132500] count now 22
debug     -3> 2020-08-18 14:40:04.013 7f928ee1f700 10 MDSIOContextBase::complete: 23C_IO_MDC_TruncateFinish
debug     -2> 2020-08-18 14:40:04.013 7f928ee1f700 10 MDSContext::complete: 23C_IO_MDC_TruncateFinish
debug     -1> 2020-08-18 14:40:04.017 7f928ee1f700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.10/rpm/el7/BUILD/ceph-14.2.10/src/mds/MDCache.cc: In function 'virtual void C_IO_MDC_TruncateFinish::finish(int)' thread 7f928ee1f700 time 2020-08-18 14:40:04.017373
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.10/rpm/el7/BUILD/ceph-14.2.10/src/mds/MDCache.cc: 6418: FAILED ceph_assert(r == 0 || r == -2)

 ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7f929eb412d5]
 2: (()+0x25449d) [0x7f929eb4149d]
 3: (()+0x2b308d) [0x557347e5908d]
 4: (MDSContext::complete(int)+0x74) [0x557347f74944]
 5: (MDSIOContextBase::complete(int)+0x16f) [0x557347f74b9f]
 6: (Finisher::finisher_thread_entry()+0x16f) [0x7f929ebcce5f]
 7: (()+0x7ea5) [0x7f929c9ffea5]
 8: (clone()+0x6d) [0x7f929b6ad8dd]

debug      0> 2020-08-18 14:40:04.021 7f928ee1f700 -1 *** Caught signal (Aborted) **
 in thread 7f928ee1f700 thread_name:fn_anonymous

 ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
 1: (()+0xf630) [0x7f929ca07630]
 2: (gsignal()+0x37) [0x7f929b5e5387]
 3: (abort()+0x148) [0x7f929b5e6a78]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x7f929eb41324]
 5: (()+0x25449d) [0x7f929eb4149d]
 6: (()+0x2b308d) [0x557347e5908d]
 7: (MDSContext::complete(int)+0x74) [0x557347f74944]
 8: (MDSIOContextBase::complete(int)+0x16f) [0x557347f74b9f]
 9: (Finisher::finisher_thread_entry()+0x16f) [0x7f929ebcce5f]
 10: (()+0x7ea5) [0x7f929c9ffea5]
 11: (clone()+0x6d) [0x7f929b6ad8dd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

mds.0-debug_ms-1.log View - the mds.0, debug_ms = 1 (589 KB) Hughen X, 08/18/2020 04:51 PM

History

#1 Updated by Zheng Yan 2 months ago

please try reproduce it again with debug_ms = 1

#2 Updated by Hughen X 2 months ago

the mds.0 debug_ms log level = 1, and log is in the attachment

#3 Updated by Patrick Donnelly 2 months ago

  • Subject changed from /MDCache.cc: 6418: FAILED ceph_assert(r == 0 || r == -2) to mds: MDCache.cc: 6418: FAILED ceph_assert(r == 0 || r == -2)
  • Status changed from New to Need More Info
  • Tags deleted (mds crash)

#4 Updated by Patrick Donnelly about 2 months ago

  • Assignee set to Zheng Yan

Also available in: Atom PDF