Actions
Bug #56721
closedmgr and client connection problem after upgrade from 16.2.9
Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Hello everyone,
i was performed upgrade from 16.2.9 to 17.2.2. After some problems and cluster down, after upgrade and restart all OSD clients started using cluster. I saw client and recovery usage at IO ceph -s section. After some minutes I restarted mgr and see no traffic with client. mgr is active but i think not working propertly.
services: mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 58m) mgr: ceph02(active, since 42m), standbys: ceph01, ceph03 mds: 2/2 daemons up, 1 standby osd: 29 osds: 28 up (since 34s), 29 in (since 11d) flags noscrub,nodeep-scrub data: volumes: 1/1 healthy pools: 10 pools, 1089 pgs objects: 6.34M objects, 21 TiB usage: 63 TiB used, 54 TiB / 117 TiB avail pgs: 603590/18609120 objects degraded (3.244%) 995 active+clean 90 active+undersized+degraded 4 active+undersized
and systemctl status
Jul 27 06:43:09 ceph02.infra.k.pl ceph-mgr[7704]: 2022-07-27T06:43:09.639+0200 7fa75f290700 -1 client.0 error registering admin socket command: (17) File exists
Additionality I can not connect to cluster from outside server (i'm using kvm and rbd).I get auth timeout. I checked LAN connectivity and firewall and all seems ok.
root@node01:~# telnet 10.8.11.2 3300 Trying 10.8.11.2... Connected to 10.8.11.2. Escape character is '^]'. ceph v2 rbd ls -n client.mypool mypool 2022-07-27T07:08:47.542+0200 7f3c4e609340 0 monclient(hunting): authenticate timed out after 300
Before this upgrade i performed upgrade from 15 to 16 and some within 16 version with no problems.
Problem with authentication suggests monitor issue but i think there is ok:
"name": "ceph02", "rank": 1, "state": "peon", "election_epoch": 3782, "quorum": [ 0, 1, 2 ], "quorum_age": 3893,
My cluster is now down, all client not working.
Actions