Bug #58107
mon-stretch: old stretch_marked_down_mons leads to ceph unresponsive
0%
Description
How to reproduce the issue¶
Set up:¶
mon.a (zone 1) rank=0
mon.b (zone 1) rank=1
mon.c (zone 2) rank=2
mon.d (zone 2) rank=3
mon.e (arbiter) rank=4
stretch_mode cluster with 2 zones 4 mons (2 each zones) and 4 OSDs (2 each zones).
shutdown zone 2 and wait til enter degraded stretch-mode
start zone 2
immediately shutdown zone1.
Result:¶
ceph becomes unresponsive
Explanation:¶
e0 quorum = {a, b, c, d, e} stretch_marked_down_mons = {} disallowed_leader {e}
e1 quorum = {a, b, e} stretch_marked_down_mons = {c, d} disallowed_leader {e}
mon.c starts back, up probe mon.b and gets map e1 (stretch_marked_down_mons = {c, d})
mon.d starts back, up probe mon.b and gets map e1 (stretch_marked_down_mons = {c, d})
we go into the function: Monitor::set_elector_disallowed_leaders() elector.disallowed_leaders = {c,d,e}
Within the same monmap we shutdown zone1
e1 quorum = { c, d, e} stretch_marked_down_mons = {c, d} disallowed_leader {e}
During an election every monitor is a disallowed_leader and no one will ever win an election. The only way we can get out of this is by starting back zone1.
The only way to clear monmap::stretch_marked_down_mons is through Monitor::trigger_healthy_stretch_mode(), which you need to be the leader to execute this function, and since we are in election when this happens, there is no chance we can go into trigger_healthy_stretch_mode().
Related issues
History
#1 Updated by Kamoltat (Junior) Sirivadhna about 1 year ago
- Related to Bug #58049: mon:stretch-cluster: mishandled removed_ranks -> inconsistent peer_tracker leading to unable to form quorum added
#2 Updated by Kamoltat (Junior) Sirivadhna about 1 year ago
- Status changed from New to In Progress
#3 Updated by Kamoltat (Junior) Sirivadhna about 1 year ago
- Status changed from In Progress to Closed
Closed due to this is not a corner case but quote from Greg Farnum:
``it’s that electing those two monitors means that neither side can go active
Even if we let the monitors on this site go active, the OSDs are known to be behind so no pgs can complete peering
Once all the pgs are caught up on both sites, the monitors won’t be disallowed_leaders
That’s part of the algorithm so that if there’s a netsplit, the up-to-date side is the one that wins
Also, monitors can’t make decisions based solely on local state (“I know I’m alive”) because that breaks the convergence guarantees on score-based elections``.
#4 Updated by Kamoltat (Junior) Sirivadhna about 1 year ago
Therefore, there is nothing we can do but wait for the other site to come back up, so pgs can complete peering and the monitors will clear themselves from disallowed_leaders.