Bug #8588
closedIn the erasure-coded pool, primary OSD will crash at decoding if any data chunk's size is changed
0%
Description
In the EC pool, if any data chunk's size is changed due to some reason, this will make the total size retrieved from all data OSDs unequal to expected total chunk size. So after primary OSD received all data from other OSDs, it will crash because of following assert.
2014-06-12 02:56:06.073542 7f32ea246700 1 osd/ECUtil.cc: In function 'int ECUtil::decode(const ECUtil::stripe_info_t&, ceph::ErasureCodeInterfaceRef&, std::map<int, ceph::buffer::list, std::less<int>, std::allocator<std::pair<const int, ceph::buffer::list> > >&, ceph::bufferlist*)' thread 7f32ea246700 time 2014-06-12 02:56:05.978745>second.length() == total_chunk_size)
osd/ECUtil.cc: 23: FAILED assert(i
ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
1: (ECUtil::decode(ECUtil::stripe_info_t const&, std::tr1::shared_ptr<ceph::ErasureCodeInterface>&, std::map<int, ceph::buffer::list, std::less<int>, std::allocator<std::pair<int const, ceph::buffer::list> > >&, ceph::buffer::list*)+0x548) [0x9922d8]
2: (CallClientContexts::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x270) [0x982650]
3: (GenContext<std::pair<RecoveryMessages*, ECBackend::read_result_t&>&>::complete(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x9) [0x977729]
4: (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x6c) [0x96481c]
5: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*)+0xde3) [0x96a573]
6: (ECBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x4b6) [0x976c26]
7: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x250) [0x83f400]
8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x37c) [0x60e82c]
9: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x63d) [0x63eb6d]
10: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0xae) [0x67668e]
11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x551) [0xa66581]
12: (ThreadPool::WorkThread::entry()+0x10) [0xa695c0]
13: /lib64/libpthread.so.0() [0x3cefa07851]
14: (clone()+0x6d) [0x3cef6e890d]
Updated by kaifeng yao almost 10 years ago
Rather than crashing the OSD, it is better to fail the request, and possibly marking the PG as inconsistent. Even better it should automatically pick other coding strips to rebuild the data.
Updated by Greg Farnum almost 10 years ago
This is some kind of on-disk corruption on the replica that you're seeing?
I think you're probably right, as we're now at the point where this is more likely to be an external issue than a code bug, but just trying to understand the context.
Updated by Zhi Zhang almost 10 years ago
Hi Greg,
Sorry for the late reply.
We were just trying to simulate some potential disk error here, so we artificially produced this kind of corruption.
Updated by Samuel Just almost 10 years ago
Yeah, this needs to be handled better. The biggest problem is that the crash is on the primary rather than the replica...
Updated by Loïc Dachary over 9 years ago
- Subject changed from In the EC pool, primary OSD will crash at decoding if any data chunk's size is changed to In the erasure-coded pool, primary OSD will crash at decoding if any data chunk's size is changed
Updated by Guang Yang over 9 years ago
Hi Sam,
Any suggestion in terms of how to fix this issue?
One potential solution is to validate the digest for each chunk when doing the sub read, and if that does not match, return a signal to upper layer to handle. Taking this way, the fix is similar to http://tracker.ceph.com/issues/9943.
Updated by Guang Yang over 9 years ago
Wei is working on this along with http://tracker.ceph.com/issues/9943 .
Updated by Loïc Dachary over 9 years ago
- Status changed from New to In Progress
Updated by Loïc Dachary about 9 years ago
- Status changed from In Progress to 12
Updated by Samuel Just almost 9 years ago
- Priority changed from High to Normal
- Regression set to No
Updated by Loïc Dachary almost 9 years ago
- Assignee deleted (
Loïc Dachary)
I'm not making progress on this, unassigning.
Updated by David Zafman about 7 years ago
- Is duplicate of Bug #12200: assert(hinfo.get_total_chunk_size() == (uint64_t)st.st_size) added
Updated by David Zafman about 7 years ago
- Status changed from 12 to Duplicate
First, an assert was added at the reading side if hinfo size doesn't match. We later turned that into an EIO (12200) and handle it at the server by reading another shard.