Bug #11573
closedOSD assertion in erasure code environment
0%
Description
To reproduce this bug, I deployed a ceph cluster with 6 osds then created a EC pool "TESTECPOOLDELETE" with (k,m) = (4,2), the ec plugin is isa, I set EC profile "ruleset-failure-domain=osd" and pool pg number to 1, the whole cluster is deployed on only one host to make things simple. Here is my EC profile detail:
Now, I can break down 3 of my osds with the following script(Actually if you have 100 osds, 97 of them would be down and out if you don't stop the script):directory=/usr/lib64/ceph/erasure-code
k=4
m=2
plugin=isa
ruleset-failure-domain=osd
filename=`date +%Y%m%d%H%M%S` dd if=/dev/zero of=$filename bs=64M count=1 rados put -p TESTECPOOLDELETE $filename $filename & sleep 2 rados rm -p TESTECPOOLDELETE $filenameHere is the call trace:
osd/ECUtil.h: 117: FAILED assert(old_size == total_chunk_size)
ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
1: (TransGenerator::operator()(ECTransaction::AppendOp const&)+0xce0) [0x9f60f0]
2: (boost::detail::variant::invoke_visitor<TransGenerator>::result_type boost::detail::variant::visitation_impl<mpl_::int_<0>, boost::detail::variant::visita
tion_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<10l>, ECTransaction::AppendOp, boost::mpl::l_item<mpl_::long_<9l>, ECTransaction::CloneOp, bo
ost::mpl::l_item<mpl_::long_<8l>, ECTransaction::RenameOp, boost::mpl::l_item<mpl_::long_<7l>, ECTransaction::StashOp, boost::mpl::l_item<mpl_::long_<6l>, ECT
ransaction::TouchOp, boost::mpl::l_item<mpl_::long_<5l>, ECTransaction::RemoveOp, boost::mpl::l_item<mpl_::long_<4l>, ECTransaction::SetAttrsOp, boost::mpl::l
item<mpl::long_<3l>, ECTransaction::RmAttrOp, boost::mpl::l_item<mpl_::long_<2l>, ECTransaction::AllocHintOp, boost::mpl::l_item<mpl_::long_<1l>, ECTransact
ion::NoOp, boost::mpl::l_end> > > > > > > > > > >, boost::mpl::l_iter<boost::mpl::l_end> >, boost::detail::variant::invoke_visitor<TransGenerator>, void const
, boost::variant<ECTransaction::AppendOp, ECTransaction::CloneOp, ECTransaction::RenameOp, ECTransaction::StashOp, ECTransaction::TouchOp, ECTransaction::Rem
oveOp, ECTransaction::SetAttrsOp, ECTransaction::RmAttrOp, ECTransaction::AllocHintOp, ECTransaction::NoOp, boost::detail::variant::void_, boost::detail::vari
ant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant
::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>::has_fallback_type_>(int, int, boost::detail::variant::i
nvoke_visitor<TransGenerator>&, void const, mpl_::bool_<false>, boost::variant<ECTransaction::AppendOp, ECTransaction::CloneOp, ECTransaction::RenameOp, ECTr
ansaction::StashOp, ECTransaction::TouchOp, ECTransaction::RemoveOp, ECTransaction::SetAttrsOp, ECTransaction::RmAttrOp, ECTransaction::AllocHintOp, ECTransac
tion::NoOp, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant
::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::v
oid_>::has_fallback_type_, mpl_::int_<0>*, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<10l>, ECTransaction:
:AppendOp, boost::mpl::l_item<mpl_::long_<9l>, ECTransaction::CloneOp, boost::mpl::l_item<mpl_::long_<8l>, ECTransaction::RenameOp, boost::mpl::l_item<mpl_::l
ong_<7l>, ECTransaction::StashOp, boost::mpl::l_item<mpl_::long_<6l>, ECTransaction::TouchOp, boost::mpl::l_item<mpl_::long_<5l>, ECTransaction::RemoveOp, boo
st::mpl::l_item<mpl_::long_<4l>, ECTransaction::SetAttrsOp, boost::mpl::l_item<mpl_::long_<3l>, ECTransaction::RmAttrOp, boost::mpl::l_item<mpl_::long_<2l>, E
CTransaction::AllocHintOp, boost::mpl::l_item<mpl_::long_<1l>, ECTransaction::NoOp, boost::mpl::l_end> > > > > > > > > > >, boost::mpl::l_iter<boost::mpl::l_e
nd> >)+0x75) [0x9f6385]
3: (ECTransaction::generate_transactions(std::map<hobject_t, std::tr1::shared_ptr<ECUtil::HashInfo>, std::less<hobject_t>, std::allocator<std::pair<hobject_t
const, std::tr1::shared_ptr<ECUtil::HashInfo> > > >&, std::tr1::shared_ptr<ceph::ErasureCodeInterface>&, pg_t, ECUtil::stripe_info_t const&, std::map<shard_i
d_t, ObjectStore::Transaction, std::less<shard_id_t>, std::allocator<std::pair<shard_id_t const, ObjectStore::Transaction> > >, std::set<hobject_t, std::less
<hobject_t>, std::allocator<hobject_t> >, std::set<hobject_t, std::less<hobject_t>, std::allocator<hobject_t> >, std::basic_stringstream<char, std::char_tra
its<char>, std::allocator<char> >) const+0x1df) [0x9f4baf]
4: (ECBackend::start_write(ECBackend::Op)+0x4d6) [0x9d9ca6]
5: (ECBackend::submit_transaction(hobject_t const&, eversion_t const&, PGBackend::PGTransaction*, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&, boost::optional<pg_hit_set_history_t>&, Context*, Context*, Context*, unsigned long, osd_reqid_t, std::tr1::shared_ptr<OpRequest>)+0x168e) [0x9dd39e]
6: (ReplicatedPG::issue_repop(ReplicatedPG::RepGather*, utime_t)+0x6fe) [0x8e57fe]
7: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x1d17) [0x924677]
8: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0x2b69) [0x928019]
9: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x4da) [0x8ae4ea]
10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x178) [0x65da18]
11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x59f) [0x65e3af]
12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x742) [0xaf5442]
13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xaf8d20]
14: /lib64/libpthread.so.0() [0x31070079d1]
15: (clone()+0x6d) [0x31068e88fd]
The call trace is quite straightforward in such a context. The "rados put" command line was appending data to the object, the offset was added with stripe size, after several seconds, the "rados rm" command line trying to delete the object and clear the offset to zero, but the "rados put" did not sense it at all and appending the data with a offset great than zero, so the assertion happened and the osd corrupted. The variable old_size is the previous offset which is always greater than zero, while the variable total_chunk_size has been set to zero.
Then the "rados put" would find another living osd and corrupted it with the same way, after 3 osds corrupted, the whole pg could not write and read any more.
By the way, I encountered this bug on giant-0.87, but I believe it still exist on upstream.
Updated by xingyi wu almost 9 years ago
Patch uploaded at https://github.com/ceph/ceph/pull/4613
Updated by Kefu Chai almost 9 years ago
xingyi, i am not able to reproduce your crash on master. will try it on the firefly branch.
diff --git a/src/test/erasure-code/test-erasure-code.sh b/src/test/erasure-code/test-erasure-code.sh index 5ba2f8f..011c853 100755 --- a/src/test/erasure-code/test-erasure-code.sh +++ b/src/test/erasure-code/test-erasure-code.sh @@ -215,6 +215,23 @@ function TEST_rados_put_get_shec() { ./ceph osd erasure-code-profile rm $profile } +function TEST_put_rm() { + local poolname=pool-42 + local profile=profile-42 + ./ceph osd erasure-code-profile set $profile \ + plugin=isa \ + k=4 m=2 \ + ruleset-failure-domain=osd || return 1 + ./ceph osd pool create $poolname 1 1 erasure $profile \ + || return 1 + local filename=`date +Y%m%d%H%M%S` + dd if=/dev/zero of=$filename bs=64M count=1 + rados put -p $poolname $filename $filename x%x || return 1 + sleep 2 + rados rm -p $poolname $filename || return 1 + rm -f $filename +} + function TEST_alignment_constraints() { local payload=ABC echo "$payload" > $dir/ORIGINAL
Updated by xingyi wu almost 9 years ago
Hi?Samuel? could you please show me the duplicated bug?
Updated by Kefu Chai almost 9 years ago
xingyi, please see the "Related issues" above, it's #11507.