Project

General

Profile

Actions

Bug #55484

open

Crimson: server_reconnect: stale exsiting connect_seq exist_cs

Added by Mark Nelson almost 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

During 60 OSD 3x replication RBD tests using crimson+alienstore+bluestore, writes eventually stall. An examination of all crimson stderr output shows that one of the OSDs has over 8GB of stderr output:

mako08: rw-r--r-. 1 root root 8763242190 Apr 28 15:21 /tmp/cbt/ceph/osd.44.stderr

This appears to be almost entirely due to an excessive number of server_reconnect events:

WARN  2022-04-28 15:22:06,746 [shard 0] osd - ms_handle_reset
WARN  2022-04-28 15:22:06,746 [shard 0] ms - [osd.44(cluster) v2:172.21.67.18:6809/3974488 >> osd.? v2:172.21.67.13:6820/4060002@58029] server_reconnect: stale exsiting connect_seq exist_cs(1040154) < peer_cs(1040155), reusing existing [osd.44(cluster) v2:172.21.67.18:6809/3974488 >> osd.17 v2:172.21.67.13:6820/4060002@59178]

With the default logging level there does not appear to be a specific recorded event that triggers this, though I have not yet tried to diagnose the issue further. Will update with more information as I find it.

No data to display

Actions

Also available in: Atom PDF