heartbeat: heartbeat fails after connection race
Recently, I found that osds on my testbed constantly fails, here's the log:
ERROR 2020-08-25 10:41:02,855 [shard 0] osd - Heartbeat::Session::failed_since(): no reply from osd.0 ever on either front or back, first ping sent 2020-08-25 10:40:38 827414638 (oldest deadline 2020-08-25 10:40:58 827414638)
It seems that after a heartbeat connection is replaced, the osd can't send osd_ping message through that connection, which led to the failure of heartbeat and further made osds report the failure of their heartbeat peers.
Details are in the log file in the attachment.