Project

General

Profile

Bug #14695

osd: rados cppool omap to ec pool crashes osd

Added by Dan van der Ster about 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
hammer
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If we do cppool from a replicated pool to erasure in hammer 0.94.5, this will cause an OSD assert error when it tries to write the omap object.

User side:

# rados cppool .rgw.buckets.index test-ec
.rgw.buckets.index:.dir.default.39904722.872 => test-ec:.dir.default.39904722.872
2016-02-08 17:31:08.652540 7f21aea12700  0 -- 128.142.36.227:0/3721633 >> 188.184.18.39:6851/2788949 pipe(0x7f21a405ffa0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f21a4065c00).fault
^C

ceph-osd assert failure:

    -1> 2016-02-08 17:30:35.042187 7f66ecab6700  1 -- 188.184.18.39:0/2789844 <== osd.237 128.142.23.40:6891/842231 1 ==== osd_op_rep
ly(1 .dir.default.39904722.872 [copy-get max 8388608] v0'0 uv1 ondisk = 0) v6 ==== 192+0+211 (2786091916 0 4072082031) 0x1dd6c940 con
 0x2061e3c0
     0> 2016-02-08 17:30:35.076894 7f66f5118700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::_write_copy_chunk(ReplicatedP
G::CopyOpRef, PGBackend::PGTransaction*)' thread 7f66f5118700 time 2016-02-08 17:30:35.042276
osd/ReplicatedPG.cc: 6431: FAILED assert(cop->omap_header.length() == 0)

 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: (ReplicatedPG::_write_copy_chunk(boost::shared_ptr<ReplicatedPG::CopyOp>, PGBackend::PGTransaction*)+0xb43) [0x895cd3]
 2: (ReplicatedPG::_build_finish_copy_transaction(boost::shared_ptr<ReplicatedPG::CopyOp>, PGBackend::PGTransaction*)+0x114) [0x895e7
4]
 3: (ReplicatedPG::process_copy_chunk(hobject_t, unsigned long, int)+0x56a) [0x8d4baa]
 4: (C_Copyfrom::finish(int)+0xb7) [0x914517]
 5: (Context::complete(int)+0x9) [0x69a4d9]
 6: (Finisher::finisher_thread_entry()+0x188) [0xa4fc78]
 7: /lib64/libpthread.so.0() [0x3358a07a51]
 8: (clone()+0x6d) [0x33586e893d]

Probably this needs to be backported to hammer?
https://github.com/ceph/ceph/commit/2b4acfb1b808b98132a771cf1089063c0f7a75b5


Related issues

Copied to Ceph - Backport #15647: hammer: osd: rados cppool omap to ec pool crashes osd Resolved

History

#1 Updated by Samuel Just about 8 years ago

  • Status changed from New to Pending Backport

Yeah, that looks like the one that needs to be backported.

#2 Updated by Loïc Dachary about 8 years ago

  • Copied to Backport #14718: hammer: rados cppool omap to ec pool crashes osd added

#3 Updated by Nathan Cutler about 8 years ago

  • Target version deleted (v0.94.7)

#4 Updated by Sage Weil almost 8 years ago

  • Subject changed from hammer: rados cppool omap to ec pool crashes osd to osd: rados cppool omap to ec pool crashes osd
  • Status changed from Pending Backport to 12
  • Priority changed from High to Urgent

the original fix in 2b4acfb1b808b98132a771cf1089063c0f7a75b5 is not what we want.

#6 Updated by Sage Weil almost 8 years ago

  • Status changed from 12 to Fix Under Review

#7 Updated by Sage Weil almost 8 years ago

  • Status changed from Fix Under Review to Resolved

#8 Updated by Nathan Cutler almost 8 years ago

  • Status changed from Resolved to Pending Backport

#9 Updated by Nathan Cutler almost 8 years ago

  • Status changed from Pending Backport to Resolved
  • Backport deleted (hammer)

The "fix" was to revert the fix, and the fix never made it into hammer. Removing hammer backport flag and de-staging hammer backport.

#10 Updated by Nathan Cutler almost 8 years ago

  • Copied to deleted (Backport #14718: hammer: rados cppool omap to ec pool crashes osd)

#11 Updated by Dan van der Ster almost 8 years ago

Wait, I'm confused. As I said at the beginning, 0.94.5 OSDs crash when you do rados cppool <replicated pool with omap> <ec pool>. If it never had the backport, then why does it crash?

#12 Updated by Nathan Cutler almost 8 years ago

  • Status changed from Resolved to Need More Info

#13 Updated by Nathan Cutler almost 8 years ago

  • Backport set to hammer

#14 Updated by Nathan Cutler almost 8 years ago

  • Status changed from Need More Info to New

#15 Updated by Kefu Chai almost 8 years ago

  • Status changed from New to Pending Backport

per the comment by sage at https://github.com/ceph/ceph/pull/8486#issue-146665460

We fixed this on the receiving end in fc51ce2 a few weeks after this commit was added.

we should backport fc51ce2 introduced by https://github.com/ceph/ceph/pull/4059, and we'd better backport the other two commits as well. as they also apply to hammer, and are good fixes:

  • f6d76f948049a66214339d36f8835d88db99001a
  • c7702bf85d3617b3e1c6619b8ebeff34932fc3e4

#16 Updated by Nathan Cutler almost 8 years ago

  • Copied to Backport #15647: hammer: osd: rados cppool omap to ec pool crashes osd added

#18 Updated by Nathan Cutler over 7 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF