Project

General

Profile

Actions

Bug #8588

closed

In the erasure-coded pool, primary OSD will crash at decoding if any data chunk's size is changed

Added by Zhi Zhang almost 10 years ago. Updated about 7 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In the EC pool, if any data chunk's size is changed due to some reason, this will make the total size retrieved from all data OSDs unequal to expected total chunk size. So after primary OSD received all data from other OSDs, it will crash because of following assert.

2014-06-12 02:56:06.073542 7f32ea246700 1 osd/ECUtil.cc: In function 'int ECUtil::decode(const ECUtil::stripe_info_t&, ceph::ErasureCodeInterfaceRef&, std::map<int, ceph::buffer::list, std::less<int>, std::allocator<std::pair<const int, ceph::buffer::list> > >&, ceph::bufferlist*)' thread 7f32ea246700 time 2014-06-12 02:56:05.978745
osd/ECUtil.cc: 23: FAILED assert(i
>second.length() == total_chunk_size)

ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
1: (ECUtil::decode(ECUtil::stripe_info_t const&, std::tr1::shared_ptr&lt;ceph::ErasureCodeInterface&gt;&, std::map&lt;int, ceph::buffer::list, std::less&lt;int&gt;, std::allocator&lt;std::pair&lt;int const, ceph::buffer::list&gt; > >&, ceph::buffer::list*)+0x548) [0x9922d8]
2: (CallClientContexts::finish(std::pair&lt;RecoveryMessages*, ECBackend::read_result_t&&gt;&)+0x270) [0x982650]
3: (GenContext&lt;std::pair&lt;RecoveryMessages*, ECBackend::read_result_t&&gt;&>::complete(std::pair&lt;RecoveryMessages*, ECBackend::read_result_t&&gt;&)+0x9) [0x977729]
4: (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x6c) [0x96481c]
5: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*)+0xde3) [0x96a573]
6: (ECBackend::handle_message(std::tr1::shared_ptr&lt;OpRequest&gt;)+0x4b6) [0x976c26]
7: (ReplicatedPG::do_request(std::tr1::shared_ptr&lt;OpRequest&gt;, ThreadPool::TPHandle&)+0x250) [0x83f400]
8: (OSD::dequeue_op(boost::intrusive_ptr&lt;PG&gt;, std::tr1::shared_ptr&lt;OpRequest&gt;, ThreadPool::TPHandle&)+0x37c) [0x60e82c]
9: (OSD::OpWQ::_process(boost::intrusive_ptr&lt;PG&gt;, ThreadPool::TPHandle&)+0x63d) [0x63eb6d]
10: (ThreadPool::WorkQueueVal&lt;std::pair&lt;boost::intrusive_ptr&lt;PG&gt;, std::tr1::shared_ptr&lt;OpRequest&gt; >, boost::intrusive_ptr&lt;PG&gt; >::_void_process(void*, ThreadPool::TPHandle&)+0xae) [0x67668e]
11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x551) [0xa66581]
12: (ThreadPool::WorkThread::entry()+0x10) [0xa695c0]
13: /lib64/libpthread.so.0() [0x3cefa07851]
14: (clone()+0x6d) [0x3cef6e890d]

Related issues 5 (2 open3 closed)

Related to RADOS - Feature #9328: osd: generalize the scrub workflowNew

Actions
Related to Ceph - Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC poolResolvedLoïc Dachary11/05/2014

Actions
Related to Ceph - Feature #9943: osd: mark pg and use replica on EIO from client read In ProgressWei Luo10/30/2014

Actions
Has duplicate Ceph - Bug #10042: OSD crash doing object recovery with EC poolDuplicateLoïc Dachary11/10/2014

Actions
Is duplicate of Ceph - Bug #12200: assert(hinfo.get_total_chunk_size() == (uint64_t)st.st_size)ResolvedDavid Zafman07/01/2015

Actions
Actions #1

Updated by kaifeng yao almost 10 years ago

Rather than crashing the OSD, it is better to fail the request, and possibly marking the PG as inconsistent. Even better it should automatically pick other coding strips to rebuild the data.

Actions #2

Updated by Greg Farnum almost 10 years ago

This is some kind of on-disk corruption on the replica that you're seeing?
I think you're probably right, as we're now at the point where this is more likely to be an external issue than a code bug, but just trying to understand the context.

Actions #3

Updated by Zhi Zhang almost 10 years ago

Hi Greg,

Sorry for the late reply.

We were just trying to simulate some potential disk error here, so we artificially produced this kind of corruption.

Actions #4

Updated by Loïc Dachary almost 10 years ago

Any update on this problem ?

Actions #5

Updated by Samuel Just almost 10 years ago

Yeah, this needs to be handled better. The biggest problem is that the crash is on the primary rather than the replica...

Actions #6

Updated by Samuel Just almost 10 years ago

  • Priority changed from Normal to High
Actions #7

Updated by Loïc Dachary over 9 years ago

  • Subject changed from In the EC pool, primary OSD will crash at decoding if any data chunk's size is changed to In the erasure-coded pool, primary OSD will crash at decoding if any data chunk's size is changed
Actions #8

Updated by Guang Yang over 9 years ago

Hi Sam,
Any suggestion in terms of how to fix this issue?

One potential solution is to validate the digest for each chunk when doing the sub read, and if that does not match, return a signal to upper layer to handle. Taking this way, the fix is similar to http://tracker.ceph.com/issues/9943.

Actions #9

Updated by Guang Yang over 9 years ago

Wei is working on this along with http://tracker.ceph.com/issues/9943 .

Actions #10

Updated by Loïc Dachary over 9 years ago

  • Assignee set to Loïc Dachary
Actions #11

Updated by Loïc Dachary over 9 years ago

  • Status changed from New to In Progress
Actions #12

Updated by Loïc Dachary about 9 years ago

  • Status changed from In Progress to 12
Actions #13

Updated by Samuel Just almost 9 years ago

  • Priority changed from High to Normal
  • Regression set to No
Actions #14

Updated by Loïc Dachary almost 9 years ago

  • Assignee deleted (Loïc Dachary)

I'm not making progress on this, unassigning.

Actions #15

Updated by David Zafman about 7 years ago

  • Is duplicate of Bug #12200: assert(hinfo.get_total_chunk_size() == (uint64_t)st.st_size) added
Actions #16

Updated by David Zafman about 7 years ago

  • Status changed from 12 to Duplicate

First, an assert was added at the reading side if hinfo size doesn't match. We later turned that into an EIO (12200) and handle it at the server by reading another shard.

Actions

Also available in: Atom PDF