Project

General

Profile

Actions

Bug #8588

closed

In the erasure-coded pool, primary OSD will crash at decoding if any data chunk's size is changed

Added by Zhi Zhang almost 10 years ago. Updated about 7 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In the EC pool, if any data chunk's size is changed due to some reason, this will make the total size retrieved from all data OSDs unequal to expected total chunk size. So after primary OSD received all data from other OSDs, it will crash because of following assert.

2014-06-12 02:56:06.073542 7f32ea246700 1 osd/ECUtil.cc: In function 'int ECUtil::decode(const ECUtil::stripe_info_t&, ceph::ErasureCodeInterfaceRef&, std::map<int, ceph::buffer::list, std::less<int>, std::allocator<std::pair<const int, ceph::buffer::list> > >&, ceph::bufferlist*)' thread 7f32ea246700 time 2014-06-12 02:56:05.978745
osd/ECUtil.cc: 23: FAILED assert(i
>second.length() == total_chunk_size)

ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
1: (ECUtil::decode(ECUtil::stripe_info_t const&, std::tr1::shared_ptr&lt;ceph::ErasureCodeInterface&gt;&, std::map&lt;int, ceph::buffer::list, std::less&lt;int&gt;, std::allocator&lt;std::pair&lt;int const, ceph::buffer::list&gt; > >&, ceph::buffer::list*)+0x548) [0x9922d8]
2: (CallClientContexts::finish(std::pair&lt;RecoveryMessages*, ECBackend::read_result_t&&gt;&)+0x270) [0x982650]
3: (GenContext&lt;std::pair&lt;RecoveryMessages*, ECBackend::read_result_t&&gt;&>::complete(std::pair&lt;RecoveryMessages*, ECBackend::read_result_t&&gt;&)+0x9) [0x977729]
4: (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x6c) [0x96481c]
5: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*)+0xde3) [0x96a573]
6: (ECBackend::handle_message(std::tr1::shared_ptr&lt;OpRequest&gt;)+0x4b6) [0x976c26]
7: (ReplicatedPG::do_request(std::tr1::shared_ptr&lt;OpRequest&gt;, ThreadPool::TPHandle&)+0x250) [0x83f400]
8: (OSD::dequeue_op(boost::intrusive_ptr&lt;PG&gt;, std::tr1::shared_ptr&lt;OpRequest&gt;, ThreadPool::TPHandle&)+0x37c) [0x60e82c]
9: (OSD::OpWQ::_process(boost::intrusive_ptr&lt;PG&gt;, ThreadPool::TPHandle&)+0x63d) [0x63eb6d]
10: (ThreadPool::WorkQueueVal&lt;std::pair&lt;boost::intrusive_ptr&lt;PG&gt;, std::tr1::shared_ptr&lt;OpRequest&gt; >, boost::intrusive_ptr&lt;PG&gt; >::_void_process(void*, ThreadPool::TPHandle&)+0xae) [0x67668e]
11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x551) [0xa66581]
12: (ThreadPool::WorkThread::entry()+0x10) [0xa695c0]
13: /lib64/libpthread.so.0() [0x3cefa07851]
14: (clone()+0x6d) [0x3cef6e890d]

Related issues 5 (2 open3 closed)

Related to RADOS - Feature #9328: osd: generalize the scrub workflowNew

Actions
Related to Ceph - Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC poolResolvedLoïc Dachary11/05/2014

Actions
Related to Ceph - Feature #9943: osd: mark pg and use replica on EIO from client read In ProgressWei Luo10/30/2014

Actions
Has duplicate Ceph - Bug #10042: OSD crash doing object recovery with EC poolDuplicateLoïc Dachary11/10/2014

Actions
Is duplicate of Ceph - Bug #12200: assert(hinfo.get_total_chunk_size() == (uint64_t)st.st_size)ResolvedDavid Zafman07/01/2015

Actions
Actions

Also available in: Atom PDF