Project

General

Profile

Bug #47124

heartbeat: heartbeat fails after connection race

Added by Xuehan Xu 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Recently, I found that osds on my testbed constantly fails, here's the log:

ERROR 2020-08-25 10:41:02,855 [shard 0] osd - Heartbeat::Session::failed_since(): no reply from osd.0 ever on either front or back, first ping sent 2020-08-25 10:40:38 827414638 (oldest deadline 2020-08-25 10:40:58 827414638)

It seems that after a heartbeat connection is replaced, the osd can't send osd_ping message through that connection, which led to the failure of heartbeat and further made osds report the failure of their heartbeat peers.

Details are in the log file in the attachment.

osd.1.stdout (691 KB) Xuehan Xu, 08/25/2020 02:48 AM

History

#1 Updated by Xuehan Xu 3 months ago

  • Priority changed from Normal to High

#2 Updated by Kefu Chai 3 months ago

  • Assignee set to Yingxin Cheng

#4 Updated by Xuehan Xu 3 months ago

  • Status changed from New to Resolved
  • Pull request ID set to 36842

Also available in: Atom PDF