Actions
Bug #11985
closedMDS asserts in objecter when transitioning from replay to DNE
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Seen once:
http://pulpito.ceph.com/teuthology-2015-06-08_23:04:01-fs-master---basic-multi/926330/
ceph-mds.a-s.log 2015-06-12 03:33:12.905838 c603700 10 mds.beacon.a-s handle_mds_beacon down:dne seq 834 rtt 0.025118 2015-06-12 03:33:12.907941 10f0d700 1 mds.0.2 suicide. wanted down:dne, now up:replay 2015-06-12 03:33:12.910650 10f0d700 5 mds.0.log shutdown ... 2015-06-12 03:33:21.177329 12c0f700 -1 osdc/Objecter.cc: In function 'ceph_tid_t Objecter::_op_submit_with_budget(Objecter::Op*, RWLock::Context&, int*)' thread 12c0f700 time 2015-06-12 03:33:21.097438 osdc/Objecter.cc: 2003: FAILED assert(initialized.read()) ceph version 9.0.0-1340-g228ee47 (228ee47cfdef5371e29fa823ad2660e83535c7e1) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x96e37b] 2: (Objecter::_op_submit_with_budget(Objecter::Op*, RWLock::Context&, int*)+0x276) [0x829946] 3: (Objecter::op_submit(Objecter::Op*, int*)+0x9e) [0x829a5e] 4: (Objecter::sg_read_trunc(std::vector<ObjectExtent, std::allocator<ObjectExtent> >&, snapid_t, ceph::buffer::list*, int, unsigned long, unsigned int, Context*)+0x8db) [0x811f6b] 5: (Journaler::_issue_read(unsigned long)+0x24a) [0x80ea4a] 6: (Journaler::_prefetch()+0x255) [0x80f3f5] 7: (Journaler::try_read_entry(ceph::buffer::list&)+0x100) [0x80f840] 8: (MDLog::_replay_thread()+0x1fa) [0x7f1e6a] 9: (MDLog::ReplayThread::entry()+0xd) [0x5c73ed] 10: (()+0x8182) [0x5089182] 11: (clone()+0x6d) [0x695747d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Presumably this is a shutdown order thing where the objecter is being de-initialized prematurely.
Updated by John Spray almost 9 years ago
- Status changed from New to In Progress
- Assignee set to John Spray
Updated by John Spray over 8 years ago
- Status changed from In Progress to Resolved
commit 39cf07118583166287ef0faa1811ae8efc9bef85 Author: John Spray <john.spray@redhat.com> Date: Thu Jun 18 11:07:46 2015 +0100 mds: fix MDLog shutdown process We must join threads before completing ::shutdown, because otherwise these threads might try to use torn-down resources like the objecter. The replay/recovery threads may be blocking on journaler calls like wait_for_readable, so we must signal them using Journaler::shutdown. In order for that to be safe, we must also protect the assignment of ::journaler from the threads using the mds_lock. Fixes: #11985 Signed-off-by: John Spray <john.spray@redhat.com>
Actions