Bug #11573

OSD assertion in erasure code environment

Added by xingyi wu about 4 years ago. Updated about 4 years ago.

Community (dev)
1 - critical
To reproduce this bug, I deployed a ceph cluster with 6 osds then created a EC pool "TESTECPOOLDELETE" with (k,m) = (4,2), the ec plugin is isa, I set EC profile "ruleset-failure-domain=osd" and pool pg number to 1, the whole cluster is deployed on only one host to make things simple. Here is my EC profile detail:


Now, I can break down 3 of my osds with the following script(Actually if you have 100 osds, 97 of them would be down and out if you don't stop the script):
filename=`date +%Y%m%d%H%M%S`
dd if=/dev/zero of=$filename bs=64M count=1
rados put -p TESTECPOOLDELETE $filename $filename &
sleep 2 
rados rm -p TESTECPOOLDELETE $filename
Here is the call trace:

osd/ECUtil.h: 117: FAILED assert(old_size == total_chunk_size)

ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
1: (TransGenerator::operator()(ECTransaction::AppendOp const&)+0xce0) [0x9f60f0]
2: (boost::detail::variant::invoke_visitor<TransGenerator>::result_type boost::detail::variant::visitation_impl<mpl_::int_<0>, boost::detail::variant::visita
tion_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<10l>, ECTransaction::AppendOp, boost::mpl::l_item<mpl_::long_<9l>, ECTransaction::CloneOp, bo
ost::mpl::l_item<mpl_::long_<8l>, ECTransaction::RenameOp, boost::mpl::l_item<mpl_::long_<7l>, ECTransaction::StashOp, boost::mpl::l_item<mpl_::long_<6l>, ECT
ransaction::TouchOp, boost::mpl::l_item<mpl_::long_<5l>, ECTransaction::RemoveOp, boost::mpl::l_item<mpl_::long_<4l>, ECTransaction::SetAttrsOp, boost::mpl::l
item<mpl::long_<3l>, ECTransaction::RmAttrOp, boost::mpl::l_item<mpl_::long_<2l>, ECTransaction::AllocHintOp, boost::mpl::l_item<mpl_::long_<1l>, ECTransact
ion::NoOp, boost::mpl::l_end> > > > > > > > > > >, boost::mpl::l_iter<boost::mpl::l_end> >, boost::detail::variant::invoke_visitor<TransGenerator>, void const
, boost::variant<ECTransaction::AppendOp, ECTransaction::CloneOp, ECTransaction::RenameOp, ECTransaction::StashOp, ECTransaction::TouchOp, ECTransaction::Rem
oveOp, ECTransaction::SetAttrsOp, ECTransaction::RmAttrOp, ECTransaction::AllocHintOp, ECTransaction::NoOp, boost::detail::variant::void_, boost::detail::vari
ant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant
::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>::has_fallback_type_>(int, int, boost::detail::variant::i
nvoke_visitor<TransGenerator>&, void const
, mpl_::bool_<false>, boost::variant<ECTransaction::AppendOp, ECTransaction::CloneOp, ECTransaction::RenameOp, ECTr
ansaction::StashOp, ECTransaction::TouchOp, ECTransaction::RemoveOp, ECTransaction::SetAttrsOp, ECTransaction::RmAttrOp, ECTransaction::AllocHintOp, ECTransac
tion::NoOp, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant
::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::v
oid_>::has_fallback_type_, mpl_::int_<0>*, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<10l>, ECTransaction:
:AppendOp, boost::mpl::l_item<mpl_::long_<9l>, ECTransaction::CloneOp, boost::mpl::l_item<mpl_::long_<8l>, ECTransaction::RenameOp, boost::mpl::l_item<mpl_::l
ong_<7l>, ECTransaction::StashOp, boost::mpl::l_item<mpl_::long_<6l>, ECTransaction::TouchOp, boost::mpl::l_item<mpl_::long_<5l>, ECTransaction::RemoveOp, boo
st::mpl::l_item<mpl_::long_<4l>, ECTransaction::SetAttrsOp, boost::mpl::l_item<mpl_::long_<3l>, ECTransaction::RmAttrOp, boost::mpl::l_item<mpl_::long_<2l>, E
CTransaction::AllocHintOp, boost::mpl::l_item<mpl_::long_<1l>, ECTransaction::NoOp, boost::mpl::l_end> > > > > > > > > > >, boost::mpl::l_iter<boost::mpl::l_e
nd> >)+0x75) [0x9f6385]
3: (ECTransaction::generate_transactions(std::map<hobject_t, std::tr1::shared_ptr<ECUtil::HashInfo>, std::less<hobject_t>, std::allocator<std::pair<hobject_t
const, std::tr1::shared_ptr<ECUtil::HashInfo> > > >&, std::tr1::shared_ptr<ceph::ErasureCodeInterface>&, pg_t, ECUtil::stripe_info_t const&, std::map<shard_i
d_t, ObjectStore::Transaction, std::less<shard_id_t>, std::allocator<std::pair<shard_id_t const, ObjectStore::Transaction> > >
, std::set<hobject_t, std::less
<hobject_t>, std::allocator<hobject_t> >, std::set<hobject_t, std::less<hobject_t>, std::allocator<hobject_t> >, std::basic_stringstream<char, std::char_tra
its<char>, std::allocator<char> >) const+0x1df) [0x9f4baf]
4: (ECBackend::start_write(ECBackend::Op
)+0x4d6) [0x9d9ca6]
5: (ECBackend::submit_transaction(hobject_t const&, eversion_t const&, PGBackend::PGTransaction*, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&, boost::optional<pg_hit_set_history_t>&, Context*, Context*, Context*, unsigned long, osd_reqid_t, std::tr1::shared_ptr<OpRequest>)+0x168e) [0x9dd39e]
6: (ReplicatedPG::issue_repop(ReplicatedPG::RepGather*, utime_t)+0x6fe) [0x8e57fe]
7: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x1d17) [0x924677]
8: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0x2b69) [0x928019]
9: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x4da) [0x8ae4ea]
10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x178) [0x65da18]
11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x59f) [0x65e3af]
12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x742) [0xaf5442]
13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xaf8d20]
14: /lib64/ [0x31070079d1]
15: (clone()+0x6d) [0x31068e88fd]

The call trace is quite straightforward in such a context. The "rados put" command line was appending data to the object, the offset was added with stripe size, after several seconds, the "rados rm" command line trying to delete the object and clear the offset to zero, but the "rados put" did not sense it at all and appending the data with a offset great than zero, so the assertion happened and the osd corrupted. The variable old_size is the previous offset which is always greater than zero, while the variable total_chunk_size has been set to zero.
Then the "rados put" would find another living osd and corrupted it with the same way, after 3 osds corrupted, the whole pg could not write and read any more.
By the way, I encountered this bug on giant-0.87, but I believe it still exist on upstream.

Duplicates Ceph - Bug #11507: object creation by write cannot use an offset on an erasure coded pool Resolved 04/30/2015


#2 Updated by Kefu Chai about 4 years ago

xingyi, i am not able to reproduce your crash on master. will try it on the firefly branch.

diff --git a/src/test/erasure-code/ b/src/test/erasure-code/
index 5ba2f8f..011c853 100755
--- a/src/test/erasure-code/
+++ b/src/test/erasure-code/
@@ -215,6 +215,23 @@ function TEST_rados_put_get_shec() {
     ./ceph osd erasure-code-profile rm $profile

+function TEST_put_rm() {
+       local poolname=pool-42
+       local profile=profile-42
+       ./ceph osd erasure-code-profile set $profile \
+            plugin=isa \
+                k=4 m=2 \
+                ruleset-failure-domain=osd || return 1
+       ./ceph osd pool create $poolname 1 1 erasure $profile \
+        || return 1
+       local filename=`date +Y%m%d%H%M%S`
+       dd if=/dev/zero of=$filename bs=64M count=1
+       rados put -p $poolname $filename $filename x%x || return 1
+       sleep 2
+       rados rm -p $poolname $filename || return 1
+       rm -f $filename
 function TEST_alignment_constraints() {
     local payload=ABC
     echo "$payload" > $dir/ORIGINAL

#3 Updated by Kefu Chai about 4 years ago

  • Description updated (diff)

#4 Updated by Samuel Just about 4 years ago

  • Status changed from New to Duplicate

#5 Updated by xingyi wu about 4 years ago

Hi?Samuel? could you please show me the duplicated bug?

#6 Updated by Kefu Chai about 4 years ago

xingyi, please see the "Related issues" above, it's #11507.

