Project

General

Profile

Actions

Bug #65371

open

rados: PeeringState::calc_replicated_acting_stretch populate acting set before checking if < bucket_max

Added by Kamoltat (Junior) Sirivadhna 25 days ago. Updated 10 days ago.

Status:
Fix Under Review
Priority:
High
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I noticed that in the final stage of the func PeeringState::calc_replicated_acting_stretch we are populating the acting_set before checking whether the next OSD we are about to add into the acting set will exceed the bucket_max we imposed for the number of OSDs that shares the same ancestor. This leads to a scenario where we would have 3 OSDs from the same data center. Here is the evidence from a local test I ran where we enter the stretch pool and 2 DCs are down:

calc_replicated_acting_stretch
bucket_max: 2
 osd 8 primary accepted 15.6( v 971'2 (0'0,971'2] local-lis/les=982/983 n=0 ec=964/964 lis/c=982/969 les/c/f=983/971/0 sis=984)
 osd 8 (up) accepted 15.6( v 971'2 (0'0,971'2] local-lis/les=982/983 n=0 ec=964/964 lis/c=982/969 les/c/f=983/971/0 sis=984)
 osd 6 (up) accepted 15.6( v 971'2 (0'0,971'2] local-lis/les=982/983 n=0 ec=964/964 lis/c=982/969 les/c/f=983/971/0 sis=984)
want: [8,6]
acting: [8,6,7]
ancestors: {-9=candidates[<>]}
 up set insufficient, considering remaining osds
 acting candidate 7 15.6( v 971'2 (0'0,971'2] local-lis/les=982/983 n=0 ec=964/964 lis/c=982/969 les/c/f=983/971/0 sis=984)
 next: candidates[<0,971'2,7>]
pop_ancestor accepting candidate 7
want is now: [8,6,7]
acting_backfill is now: 6,7,8
 num_selected: 3

Now, we actually will get away with this because in:

 bool acting_set_writeable() {
   return (actingset.size() >= pool.info.min_size) &&
     (pool.info.stretch_set_can_peer(acting, *get_osdmap(), NULL));
 }

actingset.size() is definitely >= pool.info.min_size (assuming min_size=3)
We only go active if `stretch_set_can_peer` also returns True, which guess what … it will return False

Therefore, instead of this:


 while (!aheap.is_empty() && want->size() < pool.info.size) {
    auto next = aheap.pop();
    pop_ancestor(next.get());
    if (next.get().get_num_selected() < bucket_max) {
      aheap.push_if_nonempty(next);
    }
  }

we should do this:

  while (!aheap.is_empty() && want->size() < pool.info.size) {
    auto next = aheap.pop();
    if (next.get().get_num_selected() < bucket_max) {
      pop_ancestor(next.get());
      aheap.push_if_nonempty(next);
    }
  }

Related issues 1 (1 open0 closed)

Related to RADOS - Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch modeFix Under ReviewKamoltat (Junior) Sirivadhna

Actions
Actions #1

Updated by Kamoltat (Junior) Sirivadhna 25 days ago

  • Related to Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch mode added
Actions #2

Updated by Kamoltat (Junior) Sirivadhna 24 days ago

  • Description updated (diff)
Actions #3

Updated by Radoslaw Zarzynski 17 days ago

Bump up.

Actions #4

Updated by Radoslaw Zarzynski 10 days ago

In review.

Actions

Also available in: Atom PDF