Bug #11573
closedOSD assertion in erasure code environment
0%
Description
To reproduce this bug, I deployed a ceph cluster with 6 osds then created a EC pool "TESTECPOOLDELETE" with (k,m) = (4,2), the ec plugin is isa, I set EC profile "ruleset-failure-domain=osd" and pool pg number to 1, the whole cluster is deployed on only one host to make things simple. Here is my EC profile detail:
Now, I can break down 3 of my osds with the following script(Actually if you have 100 osds, 97 of them would be down and out if you don't stop the script):directory=/usr/lib64/ceph/erasure-code
k=4
m=2
plugin=isa
ruleset-failure-domain=osd
filename=`date +%Y%m%d%H%M%S` dd if=/dev/zero of=$filename bs=64M count=1 rados put -p TESTECPOOLDELETE $filename $filename & sleep 2 rados rm -p TESTECPOOLDELETE $filenameHere is the call trace:
osd/ECUtil.h: 117: FAILED assert(old_size == total_chunk_size)
ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
1: (TransGenerator::operator()(ECTransaction::AppendOp const&)+0xce0) [0x9f60f0]
2: (boost::detail::variant::invoke_visitor<TransGenerator>::result_type boost::detail::variant::visitation_impl<mpl_::int_<0>, boost::detail::variant::visita
tion_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<10l>, ECTransaction::AppendOp, boost::mpl::l_item<mpl_::long_<9l>, ECTransaction::CloneOp, bo
ost::mpl::l_item<mpl_::long_<8l>, ECTransaction::RenameOp, boost::mpl::l_item<mpl_::long_<7l>, ECTransaction::StashOp, boost::mpl::l_item<mpl_::long_<6l>, ECT
ransaction::TouchOp, boost::mpl::l_item<mpl_::long_<5l>, ECTransaction::RemoveOp, boost::mpl::l_item<mpl_::long_<4l>, ECTransaction::SetAttrsOp, boost::mpl::l
item<mpl::long_<3l>, ECTransaction::RmAttrOp, boost::mpl::l_item<mpl_::long_<2l>, ECTransaction::AllocHintOp, boost::mpl::l_item<mpl_::long_<1l>, ECTransact
ion::NoOp, boost::mpl::l_end> > > > > > > > > > >, boost::mpl::l_iter<boost::mpl::l_end> >, boost::detail::variant::invoke_visitor<TransGenerator>, void const
, boost::variant<ECTransaction::AppendOp, ECTransaction::CloneOp, ECTransaction::RenameOp, ECTransaction::StashOp, ECTransaction::TouchOp, ECTransaction::Rem
oveOp, ECTransaction::SetAttrsOp, ECTransaction::RmAttrOp, ECTransaction::AllocHintOp, ECTransaction::NoOp, boost::detail::variant::void_, boost::detail::vari
ant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant
::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>::has_fallback_type_>(int, int, boost::detail::variant::i
nvoke_visitor<TransGenerator>&, void const, mpl_::bool_<false>, boost::variant<ECTransaction::AppendOp, ECTransaction::CloneOp, ECTransaction::RenameOp, ECTr
ansaction::StashOp, ECTransaction::TouchOp, ECTransaction::RemoveOp, ECTransaction::SetAttrsOp, ECTransaction::RmAttrOp, ECTransaction::AllocHintOp, ECTransac
tion::NoOp, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant
::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::v
oid_>::has_fallback_type_, mpl_::int_<0>*, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<10l>, ECTransaction:
:AppendOp, boost::mpl::l_item<mpl_::long_<9l>, ECTransaction::CloneOp, boost::mpl::l_item<mpl_::long_<8l>, ECTransaction::RenameOp, boost::mpl::l_item<mpl_::l
ong_<7l>, ECTransaction::StashOp, boost::mpl::l_item<mpl_::long_<6l>, ECTransaction::TouchOp, boost::mpl::l_item<mpl_::long_<5l>, ECTransaction::RemoveOp, boo
st::mpl::l_item<mpl_::long_<4l>, ECTransaction::SetAttrsOp, boost::mpl::l_item<mpl_::long_<3l>, ECTransaction::RmAttrOp, boost::mpl::l_item<mpl_::long_<2l>, E
CTransaction::AllocHintOp, boost::mpl::l_item<mpl_::long_<1l>, ECTransaction::NoOp, boost::mpl::l_end> > > > > > > > > > >, boost::mpl::l_iter<boost::mpl::l_e
nd> >)+0x75) [0x9f6385]
3: (ECTransaction::generate_transactions(std::map<hobject_t, std::tr1::shared_ptr<ECUtil::HashInfo>, std::less<hobject_t>, std::allocator<std::pair<hobject_t
const, std::tr1::shared_ptr<ECUtil::HashInfo> > > >&, std::tr1::shared_ptr<ceph::ErasureCodeInterface>&, pg_t, ECUtil::stripe_info_t const&, std::map<shard_i
d_t, ObjectStore::Transaction, std::less<shard_id_t>, std::allocator<std::pair<shard_id_t const, ObjectStore::Transaction> > >, std::set<hobject_t, std::less
<hobject_t>, std::allocator<hobject_t> >, std::set<hobject_t, std::less<hobject_t>, std::allocator<hobject_t> >, std::basic_stringstream<char, std::char_tra
its<char>, std::allocator<char> >) const+0x1df) [0x9f4baf]
4: (ECBackend::start_write(ECBackend::Op)+0x4d6) [0x9d9ca6]
5: (ECBackend::submit_transaction(hobject_t const&, eversion_t const&, PGBackend::PGTransaction*, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&, boost::optional<pg_hit_set_history_t>&, Context*, Context*, Context*, unsigned long, osd_reqid_t, std::tr1::shared_ptr<OpRequest>)+0x168e) [0x9dd39e]
6: (ReplicatedPG::issue_repop(ReplicatedPG::RepGather*, utime_t)+0x6fe) [0x8e57fe]
7: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x1d17) [0x924677]
8: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0x2b69) [0x928019]
9: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x4da) [0x8ae4ea]
10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x178) [0x65da18]
11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x59f) [0x65e3af]
12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x742) [0xaf5442]
13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xaf8d20]
14: /lib64/libpthread.so.0() [0x31070079d1]
15: (clone()+0x6d) [0x31068e88fd]
The call trace is quite straightforward in such a context. The "rados put" command line was appending data to the object, the offset was added with stripe size, after several seconds, the "rados rm" command line trying to delete the object and clear the offset to zero, but the "rados put" did not sense it at all and appending the data with a offset great than zero, so the assertion happened and the osd corrupted. The variable old_size is the previous offset which is always greater than zero, while the variable total_chunk_size has been set to zero.
Then the "rados put" would find another living osd and corrupted it with the same way, after 3 osds corrupted, the whole pg could not write and read any more.
By the way, I encountered this bug on giant-0.87, but I believe it still exist on upstream.