Bug #55850: libceph socket closed - Linux kernel client - Ceph

Actions

Copy link

Bug #55850

open

libceph socket closed

Added by Grant Peltier almost 2 years ago. Updated almost 2 years ago.

Status:

Need More Info

Priority:

Normal

Assignee:

Xiubo Li

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

4 - irritation

Reviewed:

Affected Versions:

Ceph - v14.2.10

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

We are having this issue on all of our Fuse clients, it looks like Ceph is closing it's socket to our mds0 randomly. There is not much info in the Ceph logs about this, and I am unable to find a root cause.

ceph version 14.2.10-392-gb3a13b81cb (b3a13b81cb4dfddec1cd59e7bab1e3e9984c8dd8) nautilus (stable)

dmesg logs

[Tue May 31 20:15:37 2022] libceph: mds0 (1)10.50.1.248:6800 socket closed (con state OPEN)
[Tue May 31 20:15:37 2022] libceph: mds0 (1)10.50.1.248:6800 socket closed (con state OPEN)
[Tue May 31 20:15:37 2022] libceph: mds0 (1)10.50.1.248:6800 socket closed (con state OPEN)
[Tue May 31 20:15:38 2022] libceph: wrong peer, want (1)10.50.1.248:6800/-1186287667, got (1)10.50.1.248:6800/-1535805463
[Tue May 31 20:15:38 2022] libceph: mds0 (1)10.50.1.248:6800 wrong peer at address
[Tue May 31 20:15:38 2022] libceph: wrong peer, want (1)10.50.1.248:6800/-1186287667, got (1)10.50.1.248:6800/-1535805463
[Tue May 31 20:15:38 2022] libceph: mds0 (1)10.50.1.248:6800 wrong peer at address
[Tue May 31 20:15:38 2022] libceph: wrong peer, want (1)10.50.1.248:6800/-1186287667, got (1)10.50.1.248:6800/-1535805463
[Tue May 31 20:15:38 2022] libceph: mds0 (1)10.50.1.248:6800 wrong peer at address
[Tue May 31 20:15:45 2022] ceph: mds0 reconnect start
[Tue May 31 20:15:45 2022] ceph: mds0 reconnect start
[Tue May 31 20:15:45 2022] ceph: mds0 reconnect start
[Tue May 31 20:15:45 2022] ceph: mds0 reconnect success
[Tue May 31 20:15:45 2022] ceph: mds0 reconnect success
[Tue May 31 20:15:45 2022] ceph: mds0 reconnect success
[Tue May 31 20:16:06 2022] ceph: mds0 recovery completed
[Tue May 31 20:16:06 2022] ceph: mds0 recovery completed
[Tue May 31 20:16:06 2022] ceph: mds0 recovery completed

Ceph config file

# DeepSea default configuration. Changes in this file will be overwritten on
# package update. Include custom configuration fragments in
# /srv/salt/ceph/configuration/files/ceph.conf.d/[global,osd,mon,mgr,mds,client].conf
[global]
fsid = b799274f-d309-4616-8320-a05dd147c602
mon_initial_members = ceph-node4, ceph-node1, ceph-node2, ceph-node3
mon_host = 10.50.1.250, 10.50.1.248, 10.50.1.249, 10.50.1.247
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public_network = 10.50.1.0/24
cluster_network = 10.50.4.0/24

# enable old ceph health format in the json output. This fixes the
# ceph_exporter. This option will only stay until the prometheus plugin takes
# over
mon_health_preluminous_compat = true
mon health preluminous compat warning = false

rbd default features = 3

[mon]
mgr inital modules = dashboard

[mds]
mds_cache_memory_limit = 8589934592

Thank you!

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Linux kernel client

Custom queries

Bug #55850

libceph socket closed

Updated by Venky Shankar almost 2 years ago

Updated by Jeff Layton almost 2 years ago

Updated by Grant Peltier almost 2 years ago

Updated by Jeff Layton almost 2 years ago

Updated by Jeff Layton almost 2 years ago

Updated by Xiubo Li almost 2 years ago