Bug #56721
closedmgr and client connection problem after upgrade from 16.2.9
0%
Description
Hello everyone,
i was performed upgrade from 16.2.9 to 17.2.2. After some problems and cluster down, after upgrade and restart all OSD clients started using cluster. I saw client and recovery usage at IO ceph -s section. After some minutes I restarted mgr and see no traffic with client. mgr is active but i think not working propertly.
services: mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 58m) mgr: ceph02(active, since 42m), standbys: ceph01, ceph03 mds: 2/2 daemons up, 1 standby osd: 29 osds: 28 up (since 34s), 29 in (since 11d) flags noscrub,nodeep-scrub data: volumes: 1/1 healthy pools: 10 pools, 1089 pgs objects: 6.34M objects, 21 TiB usage: 63 TiB used, 54 TiB / 117 TiB avail pgs: 603590/18609120 objects degraded (3.244%) 995 active+clean 90 active+undersized+degraded 4 active+undersized
and systemctl status
Jul 27 06:43:09 ceph02.infra.k.pl ceph-mgr[7704]: 2022-07-27T06:43:09.639+0200 7fa75f290700 -1 client.0 error registering admin socket command: (17) File exists
Additionality I can not connect to cluster from outside server (i'm using kvm and rbd).I get auth timeout. I checked LAN connectivity and firewall and all seems ok.
root@node01:~# telnet 10.8.11.2 3300 Trying 10.8.11.2... Connected to 10.8.11.2. Escape character is '^]'. ceph v2 rbd ls -n client.mypool mypool 2022-07-27T07:08:47.542+0200 7f3c4e609340 0 monclient(hunting): authenticate timed out after 300
Before this upgrade i performed upgrade from 15 to 16 and some within 16 version with no problems.
Problem with authentication suggests monitor issue but i think there is ok:
"name": "ceph02", "rank": 1, "state": "peon", "election_epoch": 3782, "quorum": [ 0, 1, 2 ], "quorum_age": 3893,
My cluster is now down, all client not working.
Updated by Rafał Dziwiński almost 2 years ago
More monitor logs with trying rbd connect:
2022-07-27T09:47:34.826+0200 7f9068c63700 10 mon.ceph01@0(leader) e12 _ms_dispatch new session 0x560d0f4326c0 MonSession(client.? v1:10.8.11.70:0/1504947558 is open , features 0x3f01cfbf7ffdffff (luminous)) features 0x3f01cfbf7ffdffff 2022-07-27T09:47:34.826+0200 7f9068c63700 20 mon.ceph01@0(leader) e12 entity_name global_id 0 (none) caps 2022-07-27T09:47:34.826+0200 7f9068c63700 10 mon.ceph01@0(leader).auth v36429 preprocess_query auth(proto 0 43 bytes epoch 0) v1 from client.? v1:10.8.11.70:0/1504947558 2022-07-27T09:47:34.826+0200 7f9068c63700 10 mon.ceph01@0(leader).auth v36429 prep_auth() blob_size=43 2022-07-27T09:47:34.826+0200 7f9068c63700 10 mon.ceph01@0(leader).auth v36429 _assign_global_id 221394210 (max 221404096) 2022-07-27T09:47:34.826+0200 7f9068c63700 2 mon.ceph01@0(leader) e12 send_reply 0x560d0dadfe00 0x560d0dcf44e0 auth_reply(proto 2 0 (0) Success) v1 2022-07-27T09:47:34.826+0200 7f9068c63700 20 mon.ceph01@0(leader) e12 _ms_dispatch existing session 0x560d0f4326c0 for client.? 2022-07-27T09:47:34.826+0200 7f9068c63700 20 mon.ceph01@0(leader) e12 entity_name client.nebula-korbank global_id 221394210 (new_pending) caps 2022-07-27T09:47:34.826+0200 7f9068c63700 10 mon.ceph01@0(leader).auth v36429 preprocess_query auth(proto 2 36 bytes epoch 0) v1 from client.? v1:10.8.11.70:0/1504947558 2022-07-27T09:47:34.826+0200 7f9068c63700 10 mon.ceph01@0(leader).auth v36429 prep_auth() blob_size=36 2022-07-27T09:47:34.826+0200 7f9068c63700 10 mon.ceph01@0(leader) e12 ms_handle_authentication session 0x560d0f4326c0 con 0x560d0e91ec00 addr v1:10.8.11.70:0/1504947558 MonSession(client.? v1:10.8.11.70:0/1504947558 is open , features 0x3f01cfbf7ffdffff (luminous)) 2022-07-27T09:47:34.826+0200 7f9068c63700 2 mon.ceph01@0(leader) e12 send_reply 0x560d0dfc0a50 0x560d0e750340 auth_reply(proto 2 0 (0) Success) v1 2022-07-27T09:47:34.826+0200 7f9068c63700 20 mon.ceph01@0(leader) e12 _ms_dispatch existing session 0x560d0f4326c0 for client.? 2022-07-27T09:47:34.826+0200 7f9068c63700 20 mon.ceph01@0(leader) e12 entity_name client.nebula-korbank global_id 221394210 (new_ok) caps profile rbd 2022-07-27T09:47:34.826+0200 7f9068c63700 10 mon.ceph01@0(leader) e12 handle_subscribe mon_subscribe({config=0+,monmap=0+}) v3 2022-07-27T09:47:34.826+0200 7f9068c63700 10 mon.ceph01@0(leader).config check_sub next 0 have 128 2022-07-27T09:47:34.826+0200 7f9068c63700 10 mon.ceph01@0(leader).config refresh_config crush_location for remote_host node01 is {} 2022-07-27T09:47:34.826+0200 7f9068c63700 20 mon.ceph01@0(leader).config refresh_config client.nebula-korbank crush {} device_class 2022-07-27T09:47:34.826+0200 7f9068c63700 10 mon.ceph01@0(leader).config maybe_send_config to client.? (changed) 2022-07-27T09:47:34.826+0200 7f9068c63700 10 mon.ceph01@0(leader).config send_config to client.? 2022-07-27T09:47:34.826+0200 7f9068c63700 10 mon.ceph01@0(leader).monmap v12 check_sub monmap next 0 have 12 2022-07-27T09:47:34.826+0200 7f9068c63700 10 mon.ceph01@0(leader) e12 ms_handle_reset 0x560d0e91ec00 v1:10.8.11.70:0/1504947558 2022-07-27T09:47:34.826+0200 7f9068c63700 10 mon.ceph01@0(leader) e12 reset/close on session client.? v1:10.8.11.70:0/1504947558 2022-07-27T09:47:34.826+0200 7f9068c63700 10 mon.ceph01@0(leader) e12 remove_session 0x560d0f4326c0 client.? v1:10.8.11.70:0/1504947558 features 0x3f01cfbf7ffdffff
Updated by Rafał Dziwiński over 1 year ago
cat >> /etc/ceph/ceph.conf <<EOF
[client]
ms_crc_data=false
ms_crc_header=false
EOF
^ this fix my issue.
Updated by Radoslaw Zarzynski over 1 year ago
- Status changed from New to Rejected
Hmm, looks the issue was transient and got solved.