Project

General

Profile

Actions

Bug #47124

closed

heartbeat: heartbeat fails after connection race

Added by Xuehan Xu over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Recently, I found that osds on my testbed constantly fails, here's the log:

ERROR 2020-08-25 10:41:02,855 [shard 0] osd - Heartbeat::Session::failed_since(): no reply from osd.0 ever on either front or back, first ping sent 2020-08-25 10:40:38 827414638 (oldest deadline 2020-08-25 10:40:58 827414638)

It seems that after a heartbeat connection is replaced, the osd can't send osd_ping message through that connection, which led to the failure of heartbeat and further made osds report the failure of their heartbeat peers.

Details are in the log file in the attachment.


Files

osd.1.stdout (691 KB) osd.1.stdout Xuehan Xu, 08/25/2020 02:48 AM
Actions #1

Updated by Xuehan Xu over 3 years ago

  • Priority changed from Normal to High
Actions #2

Updated by Kefu Chai over 3 years ago

  • Assignee set to Yingxin Cheng
Actions #4

Updated by Xuehan Xu over 3 years ago

  • Status changed from New to Resolved
  • Pull request ID set to 36842
Actions

Also available in: Atom PDF