Project

General

Profile

Bug #47124

heartbeat: heartbeat fails after connection race

Added by Xuehan Xu 25 days ago. Updated 19 days ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Recently, I found that osds on my testbed constantly fails, here's the log:

ERROR 2020-08-25 10:41:02,855 [shard 0] osd - Heartbeat::Session::failed_since(): no reply from osd.0 ever on either front or back, first ping sent 2020-08-25 10:40:38 827414638 (oldest deadline 2020-08-25 10:40:58 827414638)

It seems that after a heartbeat connection is replaced, the osd can't send osd_ping message through that connection, which led to the failure of heartbeat and further made osds report the failure of their heartbeat peers.

Details are in the log file in the attachment.

osd.1.stdout (691 KB) Xuehan Xu, 08/25/2020 02:48 AM

History

#1 Updated by Xuehan Xu 25 days ago

  • Priority changed from Normal to High

#2 Updated by Kefu Chai 24 days ago

  • Assignee set to Yingxin Cheng

#4 Updated by Xuehan Xu 19 days ago

  • Status changed from New to Resolved
  • Pull request ID set to 36842

Also available in: Atom PDF