Project

General

Profile

Backport #18568

jewel: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))

Added by Nathan Cutler about 7 years ago. Updated about 6 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
David Zafman
Target version:
-
Release:
jewel
Crash signature (v1):
Crash signature (v2):

Related issues

Copied from RADOS - Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer)) Resolved 12/07/2016

History

#1 Updated by Nathan Cutler about 7 years ago

  • Copied from Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer)) added

#2 Updated by Nathan Cutler about 7 years ago

  • Status changed from New to Need More Info

In jewel, ReplicatedPG::failed_push() takes an argument "from" which is declared to be of type pg_shard_t. In master, the same function also takes an argument "from" but it's declared like this: const list<pg_shard_t> &from. One option would be to backport dd48b97 to jewel, but that seems like overkill.

#3 Updated by Shinobu Kinjo about 7 years ago

  • Assignee set to Shinobu Kinjo

#4 Updated by Shinobu Kinjo about 7 years ago

  • Status changed from Need More Info to New

#5 Updated by Nathan Cutler over 6 years ago

  • Status changed from New to Need More Info
  • Assignee changed from Shinobu Kinjo to Greg Farnum

non-trivial backport; needs input from a RADOS developer - Greg, could you read comment #2 in this tracker and suggest an approach?

#6 Updated by Greg Farnum about 6 years ago

  • Assignee changed from Greg Farnum to Nathan Cutler

Wow, sorry I missed this bug Nathan. :(

After digging through the tracker/PR I'm not entirely sure which bits need to be backported, and I don't think I understand the question. There's a failed_push() function in both the backends and the shared implementation (ReplicatedPG/PrimaryLogPG), and it looks like they differ between themselves (in whether it takes an individual shard or a list) but are the same across branches.

If this is still a standing issue, I'd ping David about it.

#7 Updated by Nathan Cutler about 6 years ago

Hi @Greg - yeah, sorry, I dropped the ball here, too. The "bits" that would need to be backported are in https://github.com/ceph/ceph/pull/14760

#8 Updated by David Zafman about 6 years ago

  • Assignee changed from Nathan Cutler to David Zafman

I'll take a crack at doing this backport.

#9 Updated by David Zafman about 6 years ago

  • Status changed from Need More Info to In Progress

#10 Updated by David Zafman about 6 years ago

Initial cut at a backport:

https://github.com/ceph/ceph/pull/20696

It may be that we should not move forward with this difficult backport.

#11 Updated by David Zafman about 6 years ago

  • Status changed from In Progress to Rejected

Also available in: Atom PDF