Project

General

Profile

Bug #2017

osd: segfault in snap trimmer

Added by Alex Elder about 12 years ago. Updated about 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Testing some reasonably solid changes to the rbd code I ran across an OSD crash.
It looks like it happ
The YAML file for the test is below.

I will save some binaries, etc. on flak. Will update shortly to indicate where.

Here is the stack trace at the end of osd.0.log
---------------------
2012-02-02 14:15:23.751408 7f5206624700 log [INF] : 2.5 scrub ok
  • Caught signal (Segmentation fault) *
    in thread 7f5204e21700
    ceph version 0.41-68-g2a26295 (commit:2a262956947a3fa2f3d621b8af392bdecb653464)
    1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x5979a4]
    2: (()+0xfb40) [0x7f52154bdb40]
    3: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather
    )+0x1e) [0x4dfcae]
    4: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const&)+0x3a7) [0x4e0e47]
    5: (boost::statechart::simple_state<ReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xcb) [0x52818b]
    6: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x6b) [0x51f8cb]
    7: (ReplicatedPG::snap_trimmer()+0x51e) [0x4bc3de]
    8: (ThreadPool::worker()+0xa28) [0x5abdc8]
    9: (ThreadPool::WorkThread::entry()+0xd) [0x57913d]
    10: (()+0x7971) [0x7f52154b5971]
    11: (clone()+0x6d) [0x7f5213b4092d]
    ---------------------
    nuke-on-error:
    targets:
    : ...
    : ...
    : ...
    roles:
    - [mon.a, mon.c, osd.0]
    - [mon.b, mds.a, osd.1]
    - [client.0]
    kernel:
    osd:
    branch: wip-rbd-new-fixes
    client:
    branch: wip-rbd-new-fixes
    tasks:
    - ceph:
  1. - kclient:
    - rbd:
    all:
    - workunit:
    client.0:
    - rbd/copy.sh
    - rbd/import_export.sh # The next two produce an error # - rbd/kernel.sh # - rbd/test_librbd_python.sh
    - rbd/test_librbd.sh

History

#1 Updated by Josh Durgin about 12 years ago

The segfault was from trying to dereference repop->ctx->op, which was NULL.

#2 Updated by Alex Elder about 12 years ago

I bundled up the /tmp/cephtest directory in its entirety. It is here:
flak.ops.newdream.net:~elder/tracker_2017/sepia24-cephtest.tgz

#3 Updated by Sage Weil about 12 years ago

  • Status changed from New to Resolved

pushed fix for this (and another similar bug) to master.

#4 Updated by Alex Elder about 12 years ago

Since Sage has fixed this, I've deleted the archive of /tmp/cephtest I had saved.

Also available in: Atom PDF