Bug #16503
closedOSD's assert during snap trim osd/ReplicatedPG.cc: 2655: FAILED assert(0)
0%
Description
Was running Ceph 94.3 previously and was encountering issue with OSDs asserting due to snapset corruption during scrubbing (https://bugzilla.redhat.com/show_bug.cgi?id=1273127). Updated to Ceph 94.7 as belief was snapset corruption was caused by creating and/or deleting rbd snapshots during pg splitting. This use model creates and deletes thousands of rbd snapshots per day and they had very recently split pgs when this snapset corruption originally started happening.
The 0.94.7 upgrade allowed scrubbing to happen and marked the pgs inconsistent instead. (https://github.com/ceph/ceph/pull/7702) was then able to track down the inconsistencies and resolve them, so all of the pgs are now consistent and scrubbable. The issue is now seeing OSD's segfault during snap trimming.
OSD Assert for OSD.234:
2016-06-27 08:08:16.909337 7f19777c0700 -1 osd/ReplicatedPG.cc: In function 'ReplicatedPG::RepGather* ReplicatedPG::trim_object(const hobject_t&)' thread 7f19777c0700 time 2016-06-27 08:08:16.903355
osd/ReplicatedPG.cc: 2655: FAILED assert(0)
ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbb1fab]
2: (ReplicatedPG::trim_object(hobject_t const&)+0x1e4) [0x85bb64]
3: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const&)+0x427) [0x85e287]
4: (boost::statechart::simple_state<ReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xb4) [0x8bf1f4]
5: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x5f) [0x8ab92f]
6: (ReplicatedPG::snap_trimmer()+0x52c) [0x82f7fc]
7: (OSD::SnapTrimWQ::_process(PG*)+0x1a) [0x6c43aa]
8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xba2a0e]
9: (ThreadPool::WorkThread::entry()+0x10) [0xba3ab0]
10: (()+0x8182) [0x7f199ee58182]
11: (clone()+0x6d) [0x7f199d3c347d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
Logs are located: https://api.access.redhat.com/rs/cases/01658829/attachments/aa33247b-c123-4085-a276-f9b81c3e83a7
Version-Release number of selected component (if applicable):
Ceph 94.7