Bug #16503
OSD's assert during snap trim osd/ReplicatedPG.cc: 2655: FAILED assert(0)
0%
Description
Was running Ceph 94.3 previously and was encountering issue with OSDs asserting due to snapset corruption during scrubbing (https://bugzilla.redhat.com/show_bug.cgi?id=1273127). Updated to Ceph 94.7 as belief was snapset corruption was caused by creating and/or deleting rbd snapshots during pg splitting. This use model creates and deletes thousands of rbd snapshots per day and they had very recently split pgs when this snapset corruption originally started happening.
The 0.94.7 upgrade allowed scrubbing to happen and marked the pgs inconsistent instead. (https://github.com/ceph/ceph/pull/7702) was then able to track down the inconsistencies and resolve them, so all of the pgs are now consistent and scrubbable. The issue is now seeing OSD's segfault during snap trimming.
OSD Assert for OSD.234:
2016-06-27 08:08:16.909337 7f19777c0700 -1 osd/ReplicatedPG.cc: In function 'ReplicatedPG::RepGather* ReplicatedPG::trim_object(const hobject_t&)' thread 7f19777c0700 time 2016-06-27 08:08:16.903355
osd/ReplicatedPG.cc: 2655: FAILED assert(0)
ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbb1fab]
2: (ReplicatedPG::trim_object(hobject_t const&)+0x1e4) [0x85bb64]
3: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const&)+0x427) [0x85e287]
4: (boost::statechart::simple_state<ReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xb4) [0x8bf1f4]
5: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x5f) [0x8ab92f]
6: (ReplicatedPG::snap_trimmer()+0x52c) [0x82f7fc]
7: (OSD::SnapTrimWQ::_process(PG*)+0x1a) [0x6c43aa]
8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xba2a0e]
9: (ThreadPool::WorkThread::entry()+0x10) [0xba3ab0]
10: (()+0x8182) [0x7f199ee58182]
11: (clone()+0x6d) [0x7f199d3c347d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
Logs are located: https://api.access.redhat.com/rs/cases/01658829/attachments/aa33247b-c123-4085-a276-f9b81c3e83a7
Version-Release number of selected component (if applicable):
Ceph 94.7
Related issues
History
#1 Updated by Michael Hackett over 7 years ago
Similar stack trace seen in http://tracker.ceph.com/issues/6101 on dumpling with http://tracker.ceph.com/attachments/download/1630/ceph-trim-missing-snapshot.patch.
#2 Updated by Ian Colle over 7 years ago
- Assignee set to David Zafman
#3 Updated by David Zafman over 7 years ago
There is an object rbd_data.b77eb164a531e5.0000000000004fdf in pg 0.1ef1 which has a large snaptrimq, and the object info attr is missing.
#4 Updated by Vikhyat Umrao over 7 years ago
- Source changed from other to Support
#6 Updated by Kefu Chai about 7 years ago
- Duplicated by Bug #19320: Pg inconsistent make ceph osd down added
#7 Updated by Christian Theune over 6 years ago
I'd like to revisit this. Why is this not a bug? (I'm on Hammer 0.94.7.)
We experienced this previously and just have this again on a customer system, where a filesystem inconsistency leads us to crashing OSDs and this is marked as a non bug. I checked the current code on master and there the behaviour has changed (also it indicates that a repair would be needed, which Hammer likely wouldn't support anyway.)
#8 Updated by Nathan Cutler over 6 years ago
Hammer is EOL (End Of Life). Almost certainly, that means there will be no more hammer point releases.
Please consider upgrading to Jewel. Once you are on Jewel, you have the option of upgrading further to Luminous.