Actions
Bug #55484
openCrimson: server_reconnect: stale exsiting connect_seq exist_cs
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
During 60 OSD 3x replication RBD tests using crimson+alienstore+bluestore, writes eventually stall. An examination of all crimson stderr output shows that one of the OSDs has over 8GB of stderr output:
mako08: rw-r--r-. 1 root root 8763242190 Apr 28 15:21 /tmp/cbt/ceph/osd.44.stderr
This appears to be almost entirely due to an excessive number of server_reconnect events:
WARN 2022-04-28 15:22:06,746 [shard 0] osd - ms_handle_reset WARN 2022-04-28 15:22:06,746 [shard 0] ms - [osd.44(cluster) v2:172.21.67.18:6809/3974488 >> osd.? v2:172.21.67.13:6820/4060002@58029] server_reconnect: stale exsiting connect_seq exist_cs(1040154) < peer_cs(1040155), reusing existing [osd.44(cluster) v2:172.21.67.18:6809/3974488 >> osd.17 v2:172.21.67.13:6820/4060002@59178]
With the default logging level there does not appear to be a specific recorded event that triggers this, though I have not yet tried to diagnose the issue further. Will update with more information as I find it.
No data to display
Actions