Project

General

Profile

Bug #11573

Updated by Kefu Chai about 4 years ago

To reproduce this bug, I deployed a ceph cluster with 6 osds then created a EC pool "TESTECPOOLDELETE" with (k,m) = (4,2), the ec plugin is isa, I set EC profile "ruleset-failure-domain=osd" and pool pg number to 1, the whole cluster is deployed on only one host to make things simple. Here is my EC profile detail:
> directory=/usr/lib64/ceph/erasure-code
> k=4
> m=2
> plugin=isa
> ruleset-failure-domain=osd
Now, I can break down 3 of my osds with the following script(Actually if you have 100 osds, 97 of them would be down and out if you don't stop the script):

<pre>
> filename=`date +%Y%m%d%H%M%S`
> dd if=/dev/zero of=$filename bs=64M count=1
> rados put -p TESTECPOOLDELETE $filename $filename &
> sleep 2
> rados rm -p TESTECPOOLDELETE $filename
</pre>


Here is the call trace:

> osd/ECUtil.h: 117: FAILED assert(old_size == total_chunk_size)
>
> ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
> 1: (TransGenerator::operator()(ECTransaction::AppendOp const&)+0xce0) [0x9f60f0]
> 2: (boost::detail::variant::invoke_visitor<TransGenerator>::result_type boost::detail::variant::visitation_impl<mpl_::int_<0>, boost::detail::variant::visita
> tion_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<10l>, ECTransaction::AppendOp, boost::mpl::l_item<mpl_::long_<9l>, ECTransaction::CloneOp, bo
> ost::mpl::l_item<mpl_::long_<8l>, ECTransaction::RenameOp, boost::mpl::l_item<mpl_::long_<7l>, ECTransaction::StashOp, boost::mpl::l_item<mpl_::long_<6l>, ECT
> ransaction::TouchOp, boost::mpl::l_item<mpl_::long_<5l>, ECTransaction::RemoveOp, boost::mpl::l_item<mpl_::long_<4l>, ECTransaction::SetAttrsOp, boost::mpl::l
> _item<mpl_::long_<3l>, ECTransaction::RmAttrOp, boost::mpl::l_item<mpl_::long_<2l>, ECTransaction::AllocHintOp, boost::mpl::l_item<mpl_::long_<1l>, ECTransact
> ion::NoOp, boost::mpl::l_end> > > > > > > > > > >, boost::mpl::l_iter<boost::mpl::l_end> >, boost::detail::variant::invoke_visitor<TransGenerator>, void const
> *, boost::variant<ECTransaction::AppendOp, ECTransaction::CloneOp, ECTransaction::RenameOp, ECTransaction::StashOp, ECTransaction::TouchOp, ECTransaction::Rem
> oveOp, ECTransaction::SetAttrsOp, ECTransaction::RmAttrOp, ECTransaction::AllocHintOp, ECTransaction::NoOp, boost::detail::variant::void_, boost::detail::vari
> ant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant
> ::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>::has_fallback_type_>(int, int, boost::detail::variant::i
> nvoke_visitor<TransGenerator>&, void const*, mpl_::bool_<false>, boost::variant<ECTransaction::AppendOp, ECTransaction::CloneOp, ECTransaction::RenameOp, ECTr
> ansaction::StashOp, ECTransaction::TouchOp, ECTransaction::RemoveOp, ECTransaction::SetAttrsOp, ECTransaction::RmAttrOp, ECTransaction::AllocHintOp, ECTransac
> tion::NoOp, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant
> ::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::v
> oid_>::has_fallback_type_, mpl_::int_<0>*, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<10l>, ECTransaction:
> :AppendOp, boost::mpl::l_item<mpl_::long_<9l>, ECTransaction::CloneOp, boost::mpl::l_item<mpl_::long_<8l>, ECTransaction::RenameOp, boost::mpl::l_item<mpl_::l
> ong_<7l>, ECTransaction::StashOp, boost::mpl::l_item<mpl_::long_<6l>, ECTransaction::TouchOp, boost::mpl::l_item<mpl_::long_<5l>, ECTransaction::RemoveOp, boo
> st::mpl::l_item<mpl_::long_<4l>, ECTransaction::SetAttrsOp, boost::mpl::l_item<mpl_::long_<3l>, ECTransaction::RmAttrOp, boost::mpl::l_item<mpl_::long_<2l>, E
> CTransaction::AllocHintOp, boost::mpl::l_item<mpl_::long_<1l>, ECTransaction::NoOp, boost::mpl::l_end> > > > > > > > > > >, boost::mpl::l_iter<boost::mpl::l_e
> nd> >*)+0x75) [0x9f6385]
> 3: (ECTransaction::generate_transactions(std::map<hobject_t, std::tr1::shared_ptr<ECUtil::HashInfo>, std::less<hobject_t>, std::allocator<std::pair<hobject_t
> const, std::tr1::shared_ptr<ECUtil::HashInfo> > > >&, std::tr1::shared_ptr<ceph::ErasureCodeInterface>&, pg_t, ECUtil::stripe_info_t const&, std::map<shard_i
> d_t, ObjectStore::Transaction, std::less<shard_id_t>, std::allocator<std::pair<shard_id_t const, ObjectStore::Transaction> > >*, std::set<hobject_t, std::less
> <hobject_t>, std::allocator<hobject_t> >*, std::set<hobject_t, std::less<hobject_t>, std::allocator<hobject_t> >*, std::basic_stringstream<char, std::char_tra
> its<char>, std::allocator<char> >*) const+0x1df) [0x9f4baf]
> 4: (ECBackend::start_write(ECBackend::Op*)+0x4d6) [0x9d9ca6]
> 5: (ECBackend::submit_transaction(hobject_t const&, eversion_t const&, PGBackend::PGTransaction*, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&, boost::optional<pg_hit_set_history_t>&, Context*, Context*, Context*, unsigned long, osd_reqid_t, std::tr1::shared_ptr<OpRequest>)+0x168e) [0x9dd39e]
> 6: (ReplicatedPG::issue_repop(ReplicatedPG::RepGather*, utime_t)+0x6fe) [0x8e57fe]
> 7: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x1d17) [0x924677]
> 8: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0x2b69) [0x928019]
> 9: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x4da) [0x8ae4ea]
> 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x178) [0x65da18]
> 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x59f) [0x65e3af]
> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x742) [0xaf5442]
> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xaf8d20]
> 14: /lib64/libpthread.so.0() [0x31070079d1]
> 15: (clone()+0x6d) [0x31068e88fd]

The call trace is quite straightforward in such a context. The "rados put" command line was appending data to the object, the offset was added with stripe size, after several seconds, the "rados rm" command line trying to delete the object and clear the offset to zero, but the "rados put" did not sense it at all and appending the data with a offset great than zero, so the assertion happened and the osd corrupted. The variable _old_size_ is the previous offset which is always greater than zero, while the variable _total_chunk_size_ has been set to zero.
Then the "rados put" would find another living osd and corrupted it with the same way, after 3 osds corrupted, the whole pg could not write and read any more.
By the way, I encountered this bug on giant-0.87, but I believe it still exist on upstream.

Back