Bug #65371
openrados: PeeringState::calc_replicated_acting_stretch populate acting set before checking if < bucket_max
0%
Description
I noticed that in the final stage of the func PeeringState::calc_replicated_acting_stretch we are populating the acting_set before checking whether the next OSD we are about to add into the acting set will exceed the bucket_max we imposed for the number of OSDs that shares the same ancestor. This leads to a scenario where we would have 3 OSDs from the same data center. Here is the evidence from a local test I ran where we enter the stretch pool and 2 DCs are down:
calc_replicated_acting_stretch bucket_max: 2 osd 8 primary accepted 15.6( v 971'2 (0'0,971'2] local-lis/les=982/983 n=0 ec=964/964 lis/c=982/969 les/c/f=983/971/0 sis=984) osd 8 (up) accepted 15.6( v 971'2 (0'0,971'2] local-lis/les=982/983 n=0 ec=964/964 lis/c=982/969 les/c/f=983/971/0 sis=984) osd 6 (up) accepted 15.6( v 971'2 (0'0,971'2] local-lis/les=982/983 n=0 ec=964/964 lis/c=982/969 les/c/f=983/971/0 sis=984) want: [8,6] acting: [8,6,7] ancestors: {-9=candidates[<>]} up set insufficient, considering remaining osds acting candidate 7 15.6( v 971'2 (0'0,971'2] local-lis/les=982/983 n=0 ec=964/964 lis/c=982/969 les/c/f=983/971/0 sis=984) next: candidates[<0,971'2,7>] pop_ancestor accepting candidate 7 want is now: [8,6,7] acting_backfill is now: 6,7,8 num_selected: 3
Now, we actually will get away with this because in:
bool acting_set_writeable() { return (actingset.size() >= pool.info.min_size) && (pool.info.stretch_set_can_peer(acting, *get_osdmap(), NULL)); }
actingset.size() is definitely >= pool.info.min_size (assuming min_size=3)
We only go active if `stretch_set_can_peer` also returns True, which guess what … it will return False
Therefore, instead of this:
while (!aheap.is_empty() && want->size() < pool.info.size) { auto next = aheap.pop(); pop_ancestor(next.get()); if (next.get().get_num_selected() < bucket_max) { aheap.push_if_nonempty(next); } }
we should do this:
while (!aheap.is_empty() && want->size() < pool.info.size) { auto next = aheap.pop(); if (next.get().get_num_selected() < bucket_max) { pop_ancestor(next.get()); aheap.push_if_nonempty(next); } }
Updated by Kamoltat (Junior) Sirivadhna 25 days ago
- Related to Bug #64802: rados: generalize stretch mode pg temp handling to be usable without stretch mode added