Bug #4660
mds: segfault in queue_backtrace_update
0%
Description
2013-04-04T23:35:33.146 INFO:teuthology.task.ceph.mds.a.err:*** Caught signal (Segmentation fault) ** 2013-04-04T23:35:33.146 INFO:teuthology.task.ceph.mds.a.err: in thread 7f528f7fe700 2013-04-04T23:35:33.165 INFO:teuthology.task.ceph.mds.a.err: ceph version 0.60-402-g3c0debf (3c0debf99d51a8ec1cbd76d96c436674d56dfc6e) 2013-04-04T23:35:33.165 INFO:teuthology.task.ceph.mds.a.err: 1: ceph-mds() [0x85e26a] 2013-04-04T23:35:33.165 INFO:teuthology.task.ceph.mds.a.err: 2: (()+0xfcb0) [0x7f52972cccb0] 2013-04-04T23:35:33.165 INFO:teuthology.task.ceph.mds.a.err: 3: (BacktraceInfo::BacktraceInfo(long, CInode*, LogSegment*, long)+0xce) [0x4f0f2e] 2013-04-04T23:35:33.166 INFO:teuthology.task.ceph.mds.a.err: 4: (LogSegment::queue_backtrace_update(CInode*, long, long)+0x4a) [0x4f100a] 2013-04-04T23:35:33.166 INFO:teuthology.task.ceph.mds.a.err: 5: (C_MDS_openc_finish::finish(int)+0x2ad) [0x5705fd] 2013-04-04T23:35:33.166 INFO:teuthology.task.ceph.mds.a.err: 6: (Context::complete(int)+0xa) [0x4bf04a] 2013-04-04T23:35:33.166 INFO:teuthology.task.ceph.mds.a.err: 7: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x95) [0x4e30d5] 2013-04-04T23:35:33.166 INFO:teuthology.task.ceph.mds.a.err: 8: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x19f) [0x6e1f8f] 2013-04-04T23:35:33.167 INFO:teuthology.task.ceph.mds.a.err: 9: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe55) [0x6fdd65] 2013-04-04T23:35:33.167 INFO:teuthology.task.ceph.mds.a.err: 10: (MDS::handle_core_message(Message*)+0xae8) [0x4dea08] 2013-04-04T23:35:33.167 INFO:teuthology.task.ceph.mds.a.err: 11: (MDS::_dispatch(Message*)+0x2f) [0x4debcf] 2013-04-04T23:35:33.167 INFO:teuthology.task.ceph.mds.a.err: 12: (MDS::ms_dispatch(Message*)+0x1d3) [0x4e0653] 2013-04-04T23:35:33.167 INFO:teuthology.task.ceph.mds.a.err: 13: (DispatchQueue::entry()+0x341) [0x82c7d1] 2013-04-04T23:35:33.167 INFO:teuthology.task.ceph.mds.a.err: 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x7a91bd] 2013-04-04T23:35:33.168 INFO:teuthology.task.ceph.mds.a.err: 15: (()+0x7e9a) [0x7f52972c4e9a] 2013-04-04T23:35:33.168 INFO:teuthology.task.ceph.mds.a.err: 16: (clone()+0x6d) [0x7f5295ce7cbd] 2013-04-04T23:35:33.168 INFO:teuthology.task.ceph.mds.a.err:2013-04-04 23:35:16.583951 7f528f7fe700 -1 *** Caught signal (Segmentation fault) ** 2013-04-04T23:35:33.168 INFO:teuthology.task.ceph.mds.a.err: in thread 7f528f7fe700
job was
ubuntu@teuthology:/a/teuthology-2013-04-04_19:47:57-kernel-next-testing-basic/9311$ cat orig.config.yaml kernel: kdb: true sha1: 85b6aabe740024f9f6aaa54afc3195940e5fa12c nuke-on-error: true overrides: ceph: conf: osd: osd op thread timeout: 60 fs: btrfs log-whitelist: - slow request sha1: 3c0debf99d51a8ec1cbd76d96c436674d56dfc6e s3tests: branch: next workunit: sha1: 3c0debf99d51a8ec1cbd76d96c436674d56dfc6e roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 tasks: - chef: null - clock: null - install: null - ceph: null - kclient: null - workunit: clients: all: - kernel_untar_build.sh
Associated revisions
mds: Keep LogSegment ref for openc backtrace
The MDRequest is destroyed once the client reply is sent, but
we need the reference to the LogSegment for updating the backtrace, so
store a temporary ref to the LogSegment for later.
Fixes #4660.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
History
#1 Updated by Greg Farnum almost 11 years ago
- Project changed from Ceph to CephFS
- Category deleted (
1)
No wonder this wasn't showing up in my bug queue!
#2 Updated by Sam Lang almost 11 years ago
- Assignee set to Sam Lang
#3 Updated by Sam Lang almost 11 years ago
- Status changed from New to In Progress
#4 Updated by Sam Lang almost 11 years ago
- Status changed from In Progress to Fix Under Review
Pushed a fix to wip-4660. The mdr was getting deleted before we queued the backtrace for update, so mdr->ls was invalid. Needs review.
#5 Updated by Sam Lang almost 11 years ago
Alex hit the same segfault with the next branch yesterday, looks like the commit 3cdc61ec doesn't fix this bug. The trace from Alex's run:
INFO:teuthology.task.ceph.mds.a.err:*** Caught signal (Segmentation fault)
INFO:teuthology.task.ceph.mds.a.err: in thread 7f9e34926700
INFO:teuthology.task.ceph.mds.a.err: ceph version 0.60-444-gc17b172 (c17b17229acd5090e19e46e8ebe4dfcf9a85db89)
INFO:teuthology.task.ceph.mds.a.err: 1: ceph-mds() [0x85ebca]
INFO:teuthology.task.ceph.mds.a.err: 2: (()+0xfcb0) [0x7f9e38804cb0]
INFO:teuthology.task.ceph.mds.a.err: 3: (BacktraceInfo::BacktraceInfo(long, CInode*, LogSegment*, long)+0xce) [0x4f0f6e]
INFO:teuthology.task.ceph.mds.a.err: 4: (LogSegment::queue_backtrace_update(CInode*, long, long)+0x4a) [0x4f104a]
INFO:teuthology.task.ceph.mds.a.err: 5: (C_MDS_openc_finish::finish(int)+0x2ad) [0x57063d]
INFO:teuthology.task.ceph.mds.a.err: 6: (Context::complete(int)+0xa) [0x4bef6a]
INFO:teuthology.task.ceph.mds.a.err: 7: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x95) [0x4e3115]
INFO:teuthology.task.ceph.mds.a.err: 8: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x19f) [0x6e28ef]
INFO:teuthology.task.ceph.mds.a.err: 9: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe55) [0x6fe6c5]
INFO:teuthology.task.ceph.mds.a.err: 10: (MDS::handle_core_message(Message*)+0xae8) [0x4dea48]
INFO:teuthology.task.ceph.mds.a.err: 11: (MDS::_dispatch(Message*)+0x2f) [0x4dec0f]
INFO:teuthology.task.ceph.mds.a.err: 12: (MDS::ms_dispatch(Message*)+0x1d3) [0x4e0693]
INFO:teuthology.task.ceph.mds.a.err: 13: (DispatchQueue::entry()+0x341) [0x82d131]
INFO:teuthology.task.ceph.mds.a.err: 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x7a9b1d]
INFO:teuthology.task.ceph.mds.a.err: 15: (()+0x7e9a) [0x7f9e387fce9a]
INFO:teuthology.task.ceph.mds.a.err: 16: (clone()+0x6d) [0x7f9e3721fcbd]
INFO:teuthology.task.ceph.mds.a.err:2013-04-08 14:39:12.810910 7f9e34926700 -1 Caught signal (Segmentation fault) *
#6 Updated by Sam Lang almost 11 years ago
- Status changed from Fix Under Review to In Progress
#7 Updated by Sam Lang almost 11 years ago
- Status changed from In Progress to Resolved
The commit that hit this segv above looks like it was off of master, whereas the fix went into next. I was able to reproduce the segv with all the runs I made (using next), so I'm going to assume the fix resolves the segv, and mark this bug resolved for now.
#8 Updated by Greg Farnum almost 11 years ago
- Status changed from Resolved to In Progress
ubuntu@teuthology:/a/teuthology-2013-04-13_01:00:48-fs-next-testing-basic/12134
#9 Updated by Sam Lang almost 11 years ago
- Status changed from In Progress to Resolved
That isn't the same bug. Opening #4726 for that issue.
#10 Updated by Greg Farnum almost 11 years ago
blink
Of course it's not; sorry about that.