Backport #18568
jewel: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer))
Related issues
History
#1 Updated by Nathan Cutler about 7 years ago
- Copied from Bug #18165: OSD crash with osd/ReplicatedPG.cc: 8485: FAILED assert(is_backfill_targets(peer)) added
#2 Updated by Nathan Cutler about 7 years ago
- Status changed from New to Need More Info
In jewel, ReplicatedPG::failed_push() takes an argument "from" which is declared to be of type pg_shard_t. In master, the same function also takes an argument "from" but it's declared like this: const list<pg_shard_t> &from
. One option would be to backport dd48b97 to jewel, but that seems like overkill.
#3 Updated by Shinobu Kinjo about 7 years ago
- Assignee set to Shinobu Kinjo
#4 Updated by Shinobu Kinjo about 7 years ago
- Status changed from Need More Info to New
#5 Updated by Nathan Cutler over 6 years ago
- Status changed from New to Need More Info
- Assignee changed from Shinobu Kinjo to Greg Farnum
non-trivial backport; needs input from a RADOS developer - Greg, could you read comment #2 in this tracker and suggest an approach?
#6 Updated by Greg Farnum about 6 years ago
- Assignee changed from Greg Farnum to Nathan Cutler
Wow, sorry I missed this bug Nathan. :(
After digging through the tracker/PR I'm not entirely sure which bits need to be backported, and I don't think I understand the question. There's a failed_push() function in both the backends and the shared implementation (ReplicatedPG/PrimaryLogPG), and it looks like they differ between themselves (in whether it takes an individual shard or a list) but are the same across branches.
If this is still a standing issue, I'd ping David about it.
#7 Updated by Nathan Cutler about 6 years ago
Hi @Greg - yeah, sorry, I dropped the ball here, too. The "bits" that would need to be backported are in https://github.com/ceph/ceph/pull/14760
#8 Updated by David Zafman about 6 years ago
- Assignee changed from Nathan Cutler to David Zafman
I'll take a crack at doing this backport.
#9 Updated by David Zafman about 6 years ago
- Status changed from Need More Info to In Progress
#10 Updated by David Zafman about 6 years ago
Initial cut at a backport:
https://github.com/ceph/ceph/pull/20696
It may be that we should not move forward with this difficult backport.
#11 Updated by David Zafman about 6 years ago
- Status changed from In Progress to Rejected