Project

General

Profile

Bug #4660

mds: segfault in queue_backtrace_update

Added by Sage Weil almost 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2013-04-04T23:35:33.146 INFO:teuthology.task.ceph.mds.a.err:*** Caught signal (Segmentation fault) **
2013-04-04T23:35:33.146 INFO:teuthology.task.ceph.mds.a.err: in thread 7f528f7fe700
2013-04-04T23:35:33.165 INFO:teuthology.task.ceph.mds.a.err: ceph version 0.60-402-g3c0debf (3c0debf99d51a8ec1cbd76d96c436674d56dfc6e)
2013-04-04T23:35:33.165 INFO:teuthology.task.ceph.mds.a.err: 1: ceph-mds() [0x85e26a]
2013-04-04T23:35:33.165 INFO:teuthology.task.ceph.mds.a.err: 2: (()+0xfcb0) [0x7f52972cccb0]
2013-04-04T23:35:33.165 INFO:teuthology.task.ceph.mds.a.err: 3: (BacktraceInfo::BacktraceInfo(long, CInode*, LogSegment*, long)+0xce) [0x4f0f2e]
2013-04-04T23:35:33.166 INFO:teuthology.task.ceph.mds.a.err: 4: (LogSegment::queue_backtrace_update(CInode*, long, long)+0x4a) [0x4f100a]
2013-04-04T23:35:33.166 INFO:teuthology.task.ceph.mds.a.err: 5: (C_MDS_openc_finish::finish(int)+0x2ad) [0x5705fd]
2013-04-04T23:35:33.166 INFO:teuthology.task.ceph.mds.a.err: 6: (Context::complete(int)+0xa) [0x4bf04a]
2013-04-04T23:35:33.166 INFO:teuthology.task.ceph.mds.a.err: 7: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x95) [0x4e30d5]
2013-04-04T23:35:33.166 INFO:teuthology.task.ceph.mds.a.err: 8: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x19f) [0x6e1f8f]
2013-04-04T23:35:33.167 INFO:teuthology.task.ceph.mds.a.err: 9: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe55) [0x6fdd65]
2013-04-04T23:35:33.167 INFO:teuthology.task.ceph.mds.a.err: 10: (MDS::handle_core_message(Message*)+0xae8) [0x4dea08]
2013-04-04T23:35:33.167 INFO:teuthology.task.ceph.mds.a.err: 11: (MDS::_dispatch(Message*)+0x2f) [0x4debcf]
2013-04-04T23:35:33.167 INFO:teuthology.task.ceph.mds.a.err: 12: (MDS::ms_dispatch(Message*)+0x1d3) [0x4e0653]
2013-04-04T23:35:33.167 INFO:teuthology.task.ceph.mds.a.err: 13: (DispatchQueue::entry()+0x341) [0x82c7d1]
2013-04-04T23:35:33.167 INFO:teuthology.task.ceph.mds.a.err: 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x7a91bd]
2013-04-04T23:35:33.168 INFO:teuthology.task.ceph.mds.a.err: 15: (()+0x7e9a) [0x7f52972c4e9a]
2013-04-04T23:35:33.168 INFO:teuthology.task.ceph.mds.a.err: 16: (clone()+0x6d) [0x7f5295ce7cbd]
2013-04-04T23:35:33.168 INFO:teuthology.task.ceph.mds.a.err:2013-04-04 23:35:16.583951 7f528f7fe700 -1 *** Caught signal (Segmentation fault) **
2013-04-04T23:35:33.168 INFO:teuthology.task.ceph.mds.a.err: in thread 7f528f7fe700

job was
ubuntu@teuthology:/a/teuthology-2013-04-04_19:47:57-kernel-next-testing-basic/9311$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 85b6aabe740024f9f6aaa54afc3195940e5fa12c
nuke-on-error: true
overrides:
  ceph:
    conf:
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 3c0debf99d51a8ec1cbd76d96c436674d56dfc6e
  s3tests:
    branch: next
  workunit:
    sha1: 3c0debf99d51a8ec1cbd76d96c436674d56dfc6e
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock: null
- install: null
- ceph: null
- kclient: null
- workunit:
    clients:
      all:
      - kernel_untar_build.sh

Associated revisions

Revision 3cdc61ec (diff)
Added by Sam Lang almost 11 years ago

mds: Keep LogSegment ref for openc backtrace

The MDRequest is destroyed once the client reply is sent, but
we need the reference to the LogSegment for updating the backtrace, so
store a temporary ref to the LogSegment for later.

Fixes #4660.
Signed-off-by: Sam Lang <>

History

#1 Updated by Greg Farnum almost 11 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)

No wonder this wasn't showing up in my bug queue!

#2 Updated by Sam Lang almost 11 years ago

  • Assignee set to Sam Lang

#3 Updated by Sam Lang almost 11 years ago

  • Status changed from New to In Progress

#4 Updated by Sam Lang almost 11 years ago

  • Status changed from In Progress to Fix Under Review

Pushed a fix to wip-4660. The mdr was getting deleted before we queued the backtrace for update, so mdr->ls was invalid. Needs review.

#5 Updated by Sam Lang almost 11 years ago

Alex hit the same segfault with the next branch yesterday, looks like the commit 3cdc61ec doesn't fix this bug. The trace from Alex's run:

INFO:teuthology.task.ceph.mds.a.err:*** Caught signal (Segmentation fault)
INFO:teuthology.task.ceph.mds.a.err: in thread 7f9e34926700
INFO:teuthology.task.ceph.mds.a.err: ceph version 0.60-444-gc17b172 (c17b17229acd5090e19e46e8ebe4dfcf9a85db89)
INFO:teuthology.task.ceph.mds.a.err: 1: ceph-mds() [0x85ebca]
INFO:teuthology.task.ceph.mds.a.err: 2: (()+0xfcb0) [0x7f9e38804cb0]
INFO:teuthology.task.ceph.mds.a.err: 3: (BacktraceInfo::BacktraceInfo(long, CInode*, LogSegment*, long)+0xce) [0x4f0f6e]
INFO:teuthology.task.ceph.mds.a.err: 4: (LogSegment::queue_backtrace_update(CInode*, long, long)+0x4a) [0x4f104a]
INFO:teuthology.task.ceph.mds.a.err: 5: (C_MDS_openc_finish::finish(int)+0x2ad) [0x57063d]
INFO:teuthology.task.ceph.mds.a.err: 6: (Context::complete(int)+0xa) [0x4bef6a]
INFO:teuthology.task.ceph.mds.a.err: 7: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x95) [0x4e3115]
INFO:teuthology.task.ceph.mds.a.err: 8: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x19f) [0x6e28ef]
INFO:teuthology.task.ceph.mds.a.err: 9: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe55) [0x6fe6c5]
INFO:teuthology.task.ceph.mds.a.err: 10: (MDS::handle_core_message(Message*)+0xae8) [0x4dea48]
INFO:teuthology.task.ceph.mds.a.err: 11: (MDS::_dispatch(Message*)+0x2f) [0x4dec0f]
INFO:teuthology.task.ceph.mds.a.err: 12: (MDS::ms_dispatch(Message*)+0x1d3) [0x4e0693]
INFO:teuthology.task.ceph.mds.a.err: 13: (DispatchQueue::entry()+0x341) [0x82d131]
INFO:teuthology.task.ceph.mds.a.err: 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x7a9b1d]
INFO:teuthology.task.ceph.mds.a.err: 15: (()+0x7e9a) [0x7f9e387fce9a]
INFO:teuthology.task.ceph.mds.a.err: 16: (clone()+0x6d) [0x7f9e3721fcbd]
INFO:teuthology.task.ceph.mds.a.err:2013-04-08 14:39:12.810910 7f9e34926700 -1
Caught signal (Segmentation fault) *

#6 Updated by Sam Lang almost 11 years ago

  • Status changed from Fix Under Review to In Progress

#7 Updated by Sam Lang almost 11 years ago

  • Status changed from In Progress to Resolved

The commit that hit this segv above looks like it was off of master, whereas the fix went into next. I was able to reproduce the segv with all the runs I made (using next), so I'm going to assume the fix resolves the segv, and mark this bug resolved for now.

#8 Updated by Greg Farnum almost 11 years ago

  • Status changed from Resolved to In Progress

ubuntu@teuthology:/a/teuthology-2013-04-13_01:00:48-fs-next-testing-basic/12134

#9 Updated by Sam Lang almost 11 years ago

  • Status changed from In Progress to Resolved

That isn't the same bug. Opening #4726 for that issue.

#10 Updated by Greg Farnum almost 11 years ago

blink

Of course it's not; sorry about that.

Also available in: Atom PDF