Bug #11985: MDS asserts in objecter when transitioning from replay to DNE - CephFS - Ceph

Actions

Copy link

Bug #11985

closed

MDS asserts in objecter when transitioning from replay to DNE

Added by John Spray almost 9 years ago. Updated almost 8 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

John Spray

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Seen once:
http://pulpito.ceph.com/teuthology-2015-06-08_23:04:01-fs-master---basic-multi/926330/

ceph-mds.a-s.log
2015-06-12 03:33:12.905838 c603700 10 mds.beacon.a-s handle_mds_beacon down:dne seq 834 rtt 0.025118
2015-06-12 03:33:12.907941 10f0d700  1 mds.0.2 suicide.  wanted down:dne, now up:replay
2015-06-12 03:33:12.910650 10f0d700  5 mds.0.log shutdown
...
2015-06-12 03:33:21.177329 12c0f700 -1 osdc/Objecter.cc: In function 'ceph_tid_t Objecter::_op_submit_with_budget(Objecter::Op*, RWLock::Context&, int*)' thread 12c0f700 time 2015-06-12 03:33:21.097438
osdc/Objecter.cc: 2003: FAILED assert(initialized.read())

 ceph version 9.0.0-1340-g228ee47 (228ee47cfdef5371e29fa823ad2660e83535c7e1)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x96e37b]
 2: (Objecter::_op_submit_with_budget(Objecter::Op*, RWLock::Context&, int*)+0x276) [0x829946]
 3: (Objecter::op_submit(Objecter::Op*, int*)+0x9e) [0x829a5e]
 4: (Objecter::sg_read_trunc(std::vector<ObjectExtent, std::allocator<ObjectExtent> >&, snapid_t, ceph::buffer::list*, int, unsigned long, unsigned int, Context*)+0x8db) [0x811f6b]
 5: (Journaler::_issue_read(unsigned long)+0x24a) [0x80ea4a]
 6: (Journaler::_prefetch()+0x255) [0x80f3f5]
 7: (Journaler::try_read_entry(ceph::buffer::list&)+0x100) [0x80f840]
 8: (MDLog::_replay_thread()+0x1fa) [0x7f1e6a]
 9: (MDLog::ReplayThread::entry()+0xd) [0x5c73ed]
 10: (()+0x8182) [0x5089182]
 11: (clone()+0x6d) [0x695747d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Presumably this is a shutdown order thing where the objecter is being de-initialized prematurely.

Actions

Copy link

Updated by John Spray almost 9 years ago

Status changed from New to In Progress
Assignee set to John Spray

Actions

Copy link

Updated by John Spray over 8 years ago

Status changed from In Progress to Resolved

commit 39cf07118583166287ef0faa1811ae8efc9bef85
Author: John Spray <john.spray@redhat.com>
Date:   Thu Jun 18 11:07:46 2015 +0100

    mds: fix MDLog shutdown process

    We must join threads before completing ::shutdown,
    because otherwise these threads might try to use
    torn-down resources like the objecter.

    The replay/recovery threads may be blocking on
    journaler calls like wait_for_readable, so we
    must signal them using Journaler::shutdown.  In
    order for that to be safe, we must also protect
    the assignment of ::journaler from the threads
    using the mds_lock.

    Fixes: #11985
    Signed-off-by: John Spray <john.spray@redhat.com>

Actions

Copy link

Updated by Greg Farnum almost 8 years ago

Component(FS) MDS added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #11985

MDS asserts in objecter when transitioning from replay to DNE

Updated by John Spray almost 9 years ago

Updated by John Spray over 8 years ago

Updated by Greg Farnum almost 8 years ago