Bug #55484: Crimson: server_reconnect: stale exsiting connect_seq exist_cs - crimson - Ceph

Actions

Copy link

Bug #55484

open

Crimson: server_reconnect: stale exsiting connect_seq exist_cs

Added by Mark Nelson almost 2 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

During 60 OSD 3x replication RBD tests using crimson+alienstore+bluestore, writes eventually stall. An examination of all crimson stderr output shows that one of the OSDs has over 8GB of stderr output:

mako08: ~~rw-r--r-~~. 1 root root 8763242190 Apr 28 15:21 /tmp/cbt/ceph/osd.44.stderr

This appears to be almost entirely due to an excessive number of server_reconnect events:

WARN  2022-04-28 15:22:06,746 [shard 0] osd - ms_handle_reset
WARN  2022-04-28 15:22:06,746 [shard 0] ms - [osd.44(cluster) v2:172.21.67.18:6809/3974488 >> osd.? v2:172.21.67.13:6820/4060002@58029] server_reconnect: stale exsiting connect_seq exist_cs(1040154) < peer_cs(1040155), reusing existing [osd.44(cluster) v2:172.21.67.18:6809/3974488 >> osd.17 v2:172.21.67.13:6820/4060002@59178]

With the default logging level there does not appear to be a specific recorded event that triggers this, though I have not yet tried to diagnose the issue further. Will update with more information as I find it.

No data to display

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » crimson

Custom queries

Bug #55484

Crimson: server_reconnect: stale exsiting connect_seq exist_cs