Project

General

Profile

Actions

Backport #18568

closed

jewel: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))

Added by Nathan Cutler over 7 years ago. Updated about 6 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
David Zafman
Target version:
-
Release:
jewel
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Related issues 1 (0 open1 closed)

Copied from RADOS - Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))ResolvedDavid Zafman12/07/2016

Actions
Actions #1

Updated by Nathan Cutler over 7 years ago

  • Copied from Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer)) added
Actions #2

Updated by Nathan Cutler over 7 years ago

  • Status changed from New to Need More Info

In jewel, ReplicatedPG::failed_push() takes an argument "from" which is declared to be of type pg_shard_t. In master, the same function also takes an argument "from" but it's declared like this: const list<pg_shard_t> &from. One option would be to backport dd48b97 to jewel, but that seems like overkill.

Actions #3

Updated by Shinobu Kinjo about 7 years ago

  • Assignee set to Shinobu Kinjo
Actions #4

Updated by Shinobu Kinjo about 7 years ago

  • Status changed from Need More Info to New
Actions #5

Updated by Nathan Cutler almost 7 years ago

  • Status changed from New to Need More Info
  • Assignee changed from Shinobu Kinjo to Greg Farnum

non-trivial backport; needs input from a RADOS developer - Greg, could you read comment #2 in this tracker and suggest an approach?

Actions #6

Updated by Greg Farnum about 6 years ago

  • Assignee changed from Greg Farnum to Nathan Cutler

Wow, sorry I missed this bug Nathan. :(

After digging through the tracker/PR I'm not entirely sure which bits need to be backported, and I don't think I understand the question. There's a failed_push() function in both the backends and the shared implementation (ReplicatedPG/PrimaryLogPG), and it looks like they differ between themselves (in whether it takes an individual shard or a list) but are the same across branches.

If this is still a standing issue, I'd ping David about it.

Actions #7

Updated by Nathan Cutler about 6 years ago

Hi @Greg Farnum - yeah, sorry, I dropped the ball here, too. The "bits" that would need to be backported are in https://github.com/ceph/ceph/pull/14760

Actions #8

Updated by David Zafman about 6 years ago

  • Assignee changed from Nathan Cutler to David Zafman

I'll take a crack at doing this backport.

Actions #9

Updated by David Zafman about 6 years ago

  • Status changed from Need More Info to In Progress
Actions #10

Updated by David Zafman about 6 years ago

Initial cut at a backport:

https://github.com/ceph/ceph/pull/20696

It may be that we should not move forward with this difficult backport.

Actions #11

Updated by David Zafman about 6 years ago

  • Status changed from In Progress to Rejected
Actions

Also available in: Atom PDF