Bug #14695
osd: rados cppool omap to ec pool crashes osd
0%
Description
If we do cppool from a replicated pool to erasure in hammer 0.94.5, this will cause an OSD assert error when it tries to write the omap object.
User side:
# rados cppool .rgw.buckets.index test-ec .rgw.buckets.index:.dir.default.39904722.872 => test-ec:.dir.default.39904722.872 2016-02-08 17:31:08.652540 7f21aea12700 0 -- 128.142.36.227:0/3721633 >> 188.184.18.39:6851/2788949 pipe(0x7f21a405ffa0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f21a4065c00).fault ^C
ceph-osd assert failure:
-1> 2016-02-08 17:30:35.042187 7f66ecab6700 1 -- 188.184.18.39:0/2789844 <== osd.237 128.142.23.40:6891/842231 1 ==== osd_op_rep ly(1 .dir.default.39904722.872 [copy-get max 8388608] v0'0 uv1 ondisk = 0) v6 ==== 192+0+211 (2786091916 0 4072082031) 0x1dd6c940 con 0x2061e3c0 0> 2016-02-08 17:30:35.076894 7f66f5118700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::_write_copy_chunk(ReplicatedP G::CopyOpRef, PGBackend::PGTransaction*)' thread 7f66f5118700 time 2016-02-08 17:30:35.042276 osd/ReplicatedPG.cc: 6431: FAILED assert(cop->omap_header.length() == 0) ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 1: (ReplicatedPG::_write_copy_chunk(boost::shared_ptr<ReplicatedPG::CopyOp>, PGBackend::PGTransaction*)+0xb43) [0x895cd3] 2: (ReplicatedPG::_build_finish_copy_transaction(boost::shared_ptr<ReplicatedPG::CopyOp>, PGBackend::PGTransaction*)+0x114) [0x895e7 4] 3: (ReplicatedPG::process_copy_chunk(hobject_t, unsigned long, int)+0x56a) [0x8d4baa] 4: (C_Copyfrom::finish(int)+0xb7) [0x914517] 5: (Context::complete(int)+0x9) [0x69a4d9] 6: (Finisher::finisher_thread_entry()+0x188) [0xa4fc78] 7: /lib64/libpthread.so.0() [0x3358a07a51] 8: (clone()+0x6d) [0x33586e893d]
Probably this needs to be backported to hammer?
https://github.com/ceph/ceph/commit/2b4acfb1b808b98132a771cf1089063c0f7a75b5
Related issues
History
#1 Updated by Samuel Just about 8 years ago
- Status changed from New to Pending Backport
Yeah, that looks like the one that needs to be backported.
#2 Updated by Loïc Dachary about 8 years ago
- Copied to Backport #14718: hammer: rados cppool omap to ec pool crashes osd added
#3 Updated by Nathan Cutler about 8 years ago
- Target version deleted (
v0.94.7)
#4 Updated by Sage Weil almost 8 years ago
- Subject changed from hammer: rados cppool omap to ec pool crashes osd to osd: rados cppool omap to ec pool crashes osd
- Status changed from Pending Backport to 12
- Priority changed from High to Urgent
the original fix in 2b4acfb1b808b98132a771cf1089063c0f7a75b5 is not what we want.
#5 Updated by Kefu Chai almost 8 years ago
and the PR was https://github.com/ceph/ceph/pull/4393
#6 Updated by Sage Weil almost 8 years ago
- Status changed from 12 to Fix Under Review
#7 Updated by Sage Weil almost 8 years ago
- Status changed from Fix Under Review to Resolved
#8 Updated by Nathan Cutler almost 8 years ago
- Status changed from Resolved to Pending Backport
#9 Updated by Nathan Cutler almost 8 years ago
- Status changed from Pending Backport to Resolved
- Backport deleted (
hammer)
The "fix" was to revert the fix, and the fix never made it into hammer. Removing hammer backport flag and de-staging hammer backport.
#10 Updated by Nathan Cutler almost 8 years ago
- Copied to deleted (Backport #14718: hammer: rados cppool omap to ec pool crashes osd)
#11 Updated by Dan van der Ster almost 8 years ago
Wait, I'm confused. As I said at the beginning, 0.94.5 OSDs crash when you do rados cppool <replicated pool with omap> <ec pool>. If it never had the backport, then why does it crash?
#12 Updated by Nathan Cutler almost 8 years ago
- Status changed from Resolved to Need More Info
#13 Updated by Nathan Cutler almost 8 years ago
- Backport set to hammer
#14 Updated by Nathan Cutler almost 8 years ago
- Status changed from Need More Info to New
#15 Updated by Kefu Chai almost 8 years ago
- Status changed from New to Pending Backport
per the comment by sage at https://github.com/ceph/ceph/pull/8486#issue-146665460
We fixed this on the receiving end in fc51ce2 a few weeks after this commit was added.
we should backport fc51ce2 introduced by https://github.com/ceph/ceph/pull/4059, and we'd better backport the other two commits as well. as they also apply to hammer, and are good fixes:
- f6d76f948049a66214339d36f8835d88db99001a
- c7702bf85d3617b3e1c6619b8ebeff34932fc3e4
#16 Updated by Nathan Cutler almost 8 years ago
- Copied to Backport #15647: hammer: osd: rados cppool omap to ec pool crashes osd added
#17 Updated by Nathan Cutler almost 8 years ago
@Kefu Chai: Regarding https://github.com/ceph/ceph/commit/f6d76f948049a66214339d36f8835d88db99001a it looks like this is already in hammer. See:
https://github.com/ceph/ceph/blob/hammer/src/osd/ReplicatedPG.cc#L6731
(In other words, it was already backported by Sam in https://github.com/ceph/ceph/commit/0a5b8569e )
#18 Updated by Nathan Cutler over 7 years ago
- Status changed from Pending Backport to Resolved