Actions
Bug #47012
openmds: MDCache.cc: 6418: FAILED ceph_assert(r == 0 || r == -2)
Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
My mds.0 service (standby, active mds: 4) cyclical crash, each time the stack information is as follows:
ceph version: v14.2.10
How to resume my mds.0?
debug -11> 2020-08-18 14:40:04.013 7f9293628700 10 mds.0.cache find_stale_fragment_freeze debug -10> 2020-08-18 14:40:04.013 7f9293628700 10 mds.0.snap check_osd_map need_to_purge={} debug -9> 2020-08-18 14:40:04.013 7f928ee1f700 10 MDSContext::complete: 21C_IO_Dir_OMAP_Fetched debug -8> 2020-08-18 14:40:04.013 7f928ee1f700 10 mds.0.cache.dir(0x100) _fetched header 274 bytes 10 keys for [dir 0x100 ~mds0/ [2,head] auth pv=99475292 v=99475241 cv=99475241/0 dir_auth=0 ap=23+30 state=1610612832|committing|fetching f(v0 10=0+10) n(v292209 rc2106-02-07 06:28:11.000000 b2884710 259=227+32) hs=10+0,ss=0+0 dirty=10 | child=1 subtree=1 subtreetemp=0 dirty=1 waiter=1 authpin=1 0x55734e132500] debug -7> 2020-08-18 14:40:04.013 7f928ee1f700 10 mds.0.cache.dir(0x100) _fetched version 99475190 debug -6> 2020-08-18 14:40:04.013 7f928ee1f700 10 mds.0.cache.snaprealm(0x100 seq 1 0x55734b71ef00) have_past_parents_open [1,head] debug -5> 2020-08-18 14:40:04.013 7f928ee1f700 10 mds.0.cache.snaprealm(0x100 seq 1 0x55734b71ef00) have_past_parents_open [1,head] debug -4> 2020-08-18 14:40:04.013 7f928ee1f700 10 mds.0.cache.dir(0x100) auth_unpin by 0x55734e132500 on [dir 0x100 ~mds0/ [2,head] auth pv=99475292 v=99475241 cv=99475241/0 dir_auth=0 ap=22+30 state=1610612769|complete|committing f(v0 10=0+10) n(v292209 rc2106-02-07 06:28:11.000000 b2884710 259=227+32) hs=10+0,ss=0+0 dirty=10 | child=1 subtree=1 subtreetemp=0 dirty=1 waiter=1 authpin=1 0x55734e132500] count now 22 debug -3> 2020-08-18 14:40:04.013 7f928ee1f700 10 MDSIOContextBase::complete: 23C_IO_MDC_TruncateFinish debug -2> 2020-08-18 14:40:04.013 7f928ee1f700 10 MDSContext::complete: 23C_IO_MDC_TruncateFinish debug -1> 2020-08-18 14:40:04.017 7f928ee1f700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.10/rpm/el7/BUILD/ceph-14.2.10/src/mds/MDCache.cc: In function 'virtual void C_IO_MDC_TruncateFinish::finish(int)' thread 7f928ee1f700 time 2020-08-18 14:40:04.017373 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.10/rpm/el7/BUILD/ceph-14.2.10/src/mds/MDCache.cc: 6418: FAILED ceph_assert(r == 0 || r == -2) ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7f929eb412d5] 2: (()+0x25449d) [0x7f929eb4149d] 3: (()+0x2b308d) [0x557347e5908d] 4: (MDSContext::complete(int)+0x74) [0x557347f74944] 5: (MDSIOContextBase::complete(int)+0x16f) [0x557347f74b9f] 6: (Finisher::finisher_thread_entry()+0x16f) [0x7f929ebcce5f] 7: (()+0x7ea5) [0x7f929c9ffea5] 8: (clone()+0x6d) [0x7f929b6ad8dd] debug 0> 2020-08-18 14:40:04.021 7f928ee1f700 -1 *** Caught signal (Aborted) ** in thread 7f928ee1f700 thread_name:fn_anonymous ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable) 1: (()+0xf630) [0x7f929ca07630] 2: (gsignal()+0x37) [0x7f929b5e5387] 3: (abort()+0x148) [0x7f929b5e6a78] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x7f929eb41324] 5: (()+0x25449d) [0x7f929eb4149d] 6: (()+0x2b308d) [0x557347e5908d] 7: (MDSContext::complete(int)+0x74) [0x557347f74944] 8: (MDSIOContextBase::complete(int)+0x16f) [0x557347f74b9f] 9: (Finisher::finisher_thread_entry()+0x16f) [0x7f929ebcce5f] 10: (()+0x7ea5) [0x7f929c9ffea5] 11: (clone()+0x6d) [0x7f929b6ad8dd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Files
Updated by Zheng Yan over 3 years ago
please try reproduce it again with debug_ms = 1
Updated by Hughen X over 3 years ago
- File mds.0-debug_ms-1.log mds.0-debug_ms-1.log added
the mds.0 debug_ms log level = 1, and log is in the attachment
Updated by Patrick Donnelly over 3 years ago
- Subject changed from /MDCache.cc: 6418: FAILED ceph_assert(r == 0 || r == -2) to mds: MDCache.cc: 6418: FAILED ceph_assert(r == 0 || r == -2)
- Status changed from New to Need More Info
- Tags deleted (
mds crash)
Actions