Project

General

Profile

Actions

Bug #16503

closed

OSD's assert during snap trim osd/ReplicatedPG.cc: 2655: FAILED assert(0)

Added by Michael Hackett almost 8 years ago. Updated over 6 years ago.

Status:
Rejected
Priority:
High
Assignee:
David Zafman
Category:
OSD
Target version:
% Done:

0%

Source:
Support
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Was running Ceph 94.3 previously and was encountering issue with OSDs asserting due to snapset corruption during scrubbing (https://bugzilla.redhat.com/show_bug.cgi?id=1273127). Updated to Ceph 94.7 as belief was snapset corruption was caused by creating and/or deleting rbd snapshots during pg splitting. This use model creates and deletes thousands of rbd snapshots per day and they had very recently split pgs when this snapset corruption originally started happening.

The 0.94.7 upgrade allowed scrubbing to happen and marked the pgs inconsistent instead. (https://github.com/ceph/ceph/pull/7702) was then able to track down the inconsistencies and resolve them, so all of the pgs are now consistent and scrubbable. The issue is now seeing OSD's segfault during snap trimming.

OSD Assert for OSD.234:

2016-06-27 08:08:16.909337 7f19777c0700 -1 osd/ReplicatedPG.cc: In function 'ReplicatedPG::RepGather* ReplicatedPG::trim_object(const hobject_t&)' thread 7f19777c0700 time 2016-06-27 08:08:16.903355
osd/ReplicatedPG.cc: 2655: FAILED assert(0)

ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbb1fab]
2: (ReplicatedPG::trim_object(hobject_t const&)+0x1e4) [0x85bb64]
3: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const&)+0x427) [0x85e287]
4: (boost::statechart::simple_state<ReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xb4) [0x8bf1f4]
5: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x5f) [0x8ab92f]
6: (ReplicatedPG::snap_trimmer()+0x52c) [0x82f7fc]
7: (OSD::SnapTrimWQ::_process(PG*)+0x1a) [0x6c43aa]
8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xba2a0e]
9: (ThreadPool::WorkThread::entry()+0x10) [0xba3ab0]
10: (()+0x8182) [0x7f199ee58182]
11: (clone()+0x6d) [0x7f199d3c347d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---

Logs are located: https://api.access.redhat.com/rs/cases/01658829/attachments/aa33247b-c123-4085-a276-f9b81c3e83a7

Version-Release number of selected component (if applicable):
Ceph 94.7


Related issues 1 (1 open0 closed)

Has duplicate RADOS - Bug #19320: Pg inconsistent make ceph osd downNew03/21/2017

Actions
Actions

Also available in: Atom PDF