Bug #8588: In the erasure-coded pool, primary OSD will crash at decoding if any data chunk's size is changed - Ceph - Ceph

Actions

Copy link

Bug #8588

closed

In the erasure-coded pool, primary OSD will crash at decoding if any data chunk's size is changed

Added by Zhi Zhang almost 10 years ago. Updated about 7 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Category:

OSD

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

In the EC pool, if any data chunk's size is changed due to some reason, this will make the total size retrieved from all data OSDs unequal to expected total chunk size. So after primary OSD received all data from other OSDs, it will crash because of following assert.

2014-06-12 02:56:06.073542 7f32ea246700 1 osd/ECUtil.cc: In function 'int ECUtil::decode(const ECUtil::stripe_info_t&, ceph::ErasureCodeInterfaceRef&, std::map<int, ceph::buffer::list, std::less<int>, std::allocator<std::pair<const int, ceph::buffer::list> > >&, ceph::bufferlist*)' thread 7f32ea246700 time 2014-06-12 02:56:05.978745
osd/ECUtil.cc: 23: FAILED assert(i>second.length() == total_chunk_size)

ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
 1: (ECUtil::decode(ECUtil::stripe_info_t const&, std::tr1::shared_ptr&lt;ceph::ErasureCodeInterface&gt;&, std::map&lt;int, ceph::buffer::list, std::less&lt;int&gt;, std::allocator&lt;std::pair&lt;int const, ceph::buffer::list&gt; > >&, ceph::buffer::list*)+0x548) [0x9922d8]
 2: (CallClientContexts::finish(std::pair&lt;RecoveryMessages*, ECBackend::read_result_t&&gt;&)+0x270) [0x982650]
 3: (GenContext&lt;std::pair&lt;RecoveryMessages*, ECBackend::read_result_t&&gt;&>::complete(std::pair&lt;RecoveryMessages*, ECBackend::read_result_t&&gt;&)+0x9) [0x977729]
 4: (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x6c) [0x96481c]
 5: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*)+0xde3) [0x96a573]
 6: (ECBackend::handle_message(std::tr1::shared_ptr&lt;OpRequest&gt;)+0x4b6) [0x976c26]
 7: (ReplicatedPG::do_request(std::tr1::shared_ptr&lt;OpRequest&gt;, ThreadPool::TPHandle&)+0x250) [0x83f400]
 8: (OSD::dequeue_op(boost::intrusive_ptr&lt;PG&gt;, std::tr1::shared_ptr&lt;OpRequest&gt;, ThreadPool::TPHandle&)+0x37c) [0x60e82c]
 9: (OSD::OpWQ::_process(boost::intrusive_ptr&lt;PG&gt;, ThreadPool::TPHandle&)+0x63d) [0x63eb6d]
 10: (ThreadPool::WorkQueueVal&lt;std::pair&lt;boost::intrusive_ptr&lt;PG&gt;, std::tr1::shared_ptr&lt;OpRequest&gt; >, boost::intrusive_ptr&lt;PG&gt; >::_void_process(void*, ThreadPool::TPHandle&)+0xae) [0x67668e]
 11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x551) [0xa66581]
 12: (ThreadPool::WorkThread::entry()+0x10) [0xa695c0]
 13: /lib64/libpthread.so.0() [0x3cefa07851]
 14: (clone()+0x6d) [0x3cef6e890d]

Related issues 5 (2 open — 3 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #8588

In the erasure-coded pool, primary OSD will crash at decoding if any data chunk's size is changed

Updated by kaifeng yao almost 10 years ago

Updated by Greg Farnum almost 10 years ago

Updated by Zhi Zhang almost 10 years ago

Updated by Loïc Dachary almost 10 years ago

Updated by Samuel Just almost 10 years ago

Updated by Samuel Just almost 10 years ago

Updated by Loïc Dachary over 9 years ago

Updated by Guang Yang over 9 years ago

Updated by Guang Yang over 9 years ago

Updated by Loïc Dachary over 9 years ago

Updated by Loïc Dachary over 9 years ago

Updated by Loïc Dachary about 9 years ago

Updated by Samuel Just almost 9 years ago

Updated by Loïc Dachary almost 9 years ago

Updated by David Zafman about 7 years ago

Updated by David Zafman about 7 years ago