Project

General

Profile

Actions

Bug #56721

closed

mgr and client connection problem after upgrade from 16.2.9

Added by Rafał Dziwiński almost 2 years ago. Updated over 1 year ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello everyone,
i was performed upgrade from 16.2.9 to 17.2.2. After some problems and cluster down, after upgrade and restart all OSD clients started using cluster. I saw client and recovery usage at IO ceph -s section. After some minutes I restarted mgr and see no traffic with client. mgr is active but i think not working propertly.

  services:
    mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 58m)
    mgr: ceph02(active, since 42m), standbys: ceph01, ceph03
    mds: 2/2 daemons up, 1 standby
    osd: 29 osds: 28 up (since 34s), 29 in (since 11d)
         flags noscrub,nodeep-scrub

  data:
    volumes: 1/1 healthy
    pools:   10 pools, 1089 pgs
    objects: 6.34M objects, 21 TiB
    usage:   63 TiB used, 54 TiB / 117 TiB avail
    pgs:     603590/18609120 objects degraded (3.244%)
             995 active+clean
             90  active+undersized+degraded
             4   active+undersized

and systemctl status
Jul 27 06:43:09 ceph02.infra.k.pl ceph-mgr[7704]: 2022-07-27T06:43:09.639+0200 7fa75f290700 -1 client.0 error registering admin socket command: (17) File exists

Additionality I can not connect to cluster from outside server (i'm using kvm and rbd).I get auth timeout. I checked LAN connectivity and firewall and all seems ok.

root@node01:~# telnet 10.8.11.2 3300
Trying 10.8.11.2...
Connected to 10.8.11.2.
Escape character is '^]'.
ceph v2
rbd ls -n client.mypool mypool
2022-07-27T07:08:47.542+0200 7f3c4e609340  0 monclient(hunting): authenticate timed out after 300

Before this upgrade i performed upgrade from 15 to 16 and some within 16 version with no problems.

Problem with authentication suggests monitor issue but i think there is ok:

 "name": "ceph02",
    "rank": 1,
    "state": "peon",
    "election_epoch": 3782,
    "quorum": [
        0,
        1,
        2
    ],
    "quorum_age": 3893,

My cluster is now down, all client not working.

Actions

Also available in: Atom PDF