Project

General

Profile

Bug #8048

osd/ReplicatedPG: FAILED assert(!parent->get_log().get_missing().is_missing(soid))

Added by Josh Durgin almost 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From effectively the master branch: http://qa-proxy.ceph.com/teuthology/joshd-2014-04-08_17:21:20-rados-wip-6480-testing-basic-plana/179672/


osd/ReplicatedPG.cc: 7297: FAILED assert(!parent->get_log().get_missing().is_missing(soid)) ceph version 0.79-87-gc2f37bb (c2f37bb723178e6ae5fe5121baa06aba994025c1)
1: (ReplicatedBackend::sub_op_modify(std::tr1::shared_ptr<OpRequest>)+0x11f5) [0x7ddb75]
2: (ReplicatedBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x55c) [0x90bc8c]
3: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1ee) [0x7bd04e]
4: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x34a) [0x61a3aa]
5: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x628) [0x6353e8]
6: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x67aefc]
7: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0xa57136]
8: (ThreadPool::WorkThread::entry()+0x10) [0xa58f40]
9: (()+0x7e9a) [0x7f8e9b07be9a]
10: (clone()+0x6d) [0x7f8e9963c3fd]

Associated revisions

Revision 3d0e80ac (diff)
Added by Sage Weil almost 10 years ago

osd/ReplicatedPG: check clones for degraded

We check whether the head is degraded, and we check whether a clone is
unreadable, but in the case where we have a cache op on a degraded object,
we don't check. That leads to an assert when the repop hits the replica
and the object is in the peer's missing set.

Fix this by adding a check on the clone when write_ordered is true. Note
that checking write_ordered is better than whether it is a cache op because
we want to preserve write ordering even for reads that are flagged by the
client.

Fixes: #8048
Signed-off-by: Sage Weil <>

History

#1 Updated by Sage Weil almost 10 years ago

  • Status changed from New to 12

#2 Updated by Sage Weil almost 10 years ago

  • Status changed from 12 to In Progress
  • Assignee set to Sage Weil

#3 Updated by Sage Weil almost 10 years ago

  • Status changed from In Progress to Fix Under Review

#4 Updated by Sage Weil almost 10 years ago

  • Assignee deleted (Sage Weil)

#5 Updated by Samuel Just almost 10 years ago

  • Status changed from Fix Under Review to Resolved

#6 Updated by Dmitry Smirnov almost 10 years ago

Please have a look at the comments of bug #8008 -- there may be some additional information related to this issue. I suspect that issues #8008 and this one (#8048) may be related. There is a possibility that either those two bugs are not completely fixed or there is another bug causing OSD crash on attempt to repair inconsistent PG.

Also available in: Atom PDF