Project

General

Profile

Actions

Bug #37721

closed

mds crashes frequently when using snapshots in CephFS on mimic

Added by Soenke Schippmann over 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
mimic
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash, snapshots
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After we started to use snapshots in CephFS we expect frequent crashes (every couple of hours) of the active mds daemon. The FS was created with version Mimic (13.2.1), so snapshots were already enabled by default.

Here is an example backtrace:

(gdb) bt
#0  raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00005643386d53ae in reraise_fatal (signum=6) at ./src/global/signal_handler.cc:74
#2  handle_fatal_signal (signum=6) at ./src/global/signal_handler.cc:138
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#5  0x00007f10c61c5801 in __GI_abort () at abort.c:79
#6  0x00007f10c77b4080 in ceph::__ceph_assert_fail(char const*, char const*, int, char const*) () from /usr/lib/ceph/libceph-common.so.0
#7  0x00007f10c77b40f7 in ceph::__ceph_assert_fail(ceph::assert_data const&) () from /usr/lib/ceph/libceph-common.so.0
#8  0x000056433855c4a1 in Locker::snapflush_nudge (this=0x56433afa0640, in=0x5643d0467800) at ./src/mds/Locker.cc:2577
#9  0x000056433855c5f6 in Locker::caps_tick (this=0x56433afa0640) at ./src/mds/Locker.cc:3641
#10 0x0000564338560932 in Locker::tick (this=<optimized out>) at ./src/mds/Locker.cc:127
#11 0x00005643383fe754 in MDSRankDispatcher::tick (this=0x56433b24f000) at ./src/mds/MDSRank.cc:325
#12 0x00005643383ee96c in boost::function1<void, int>::operator() (a0=<optimized out>, this=<optimized out>)
    at ./obj-x86_64-linux-gnu/boost/include/boost/function/function_template.hpp:768
#13 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at ./src/include/Context.h:522
#14 0x00005643383ec719 in Context::complete (this=0x56433b1432c0, r=<optimized out>) at ./src/include/Context.h:77
#15 0x00007f10c77b08ab in SafeTimer::timer_thread() () from /usr/lib/ceph/libceph-common.so.0
#16 0x00007f10c77b1e6d in SafeTimerThread::entry() () from /usr/lib/ceph/libceph-common.so.0
#17 0x00007f10c70c06db in start_thread (arg=0x7f10bc3b7700) at pthread_create.c:463
#18 0x00007f10c62a688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

A coredump, the mds log and the ceph.conf have been uploaded with ceph-post-file:

  • ceph-mds.log: 7bc127e2-27e9-4a24-b126-b7bd930a82bd
  • core.safe_timer.2474: 7af3b92f-0970-4885-ad86-f5c860616554
  • ceph.conf: 089f90bf-83e5-4915-bc47-1cafeba0c3ff

Thanks for investigating this issue.


Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #37818: mimic: mds crashes frequently when using snapshots in CephFS on mimicResolvedPrashant DActions
Actions

Also available in: Atom PDF