Bug #57083
closedceph-fuse: monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
100%
Description
The ceph-fuse mounting failed with authenticating:
2022-07-24T15:47:30.290 DEBUG:teuthology.orchestra.run.smithi137:> sudo mount -t fusectl /sys/fs/fuse/connections /sys/fs/fuse/connections 2022-07-24T15:47:30.348 INFO:teuthology.orchestra.run.smithi137.stderr:mount: /sys/fs/fuse/connections: /sys/fs/fuse/connections already mounted or mount point busy. 2022-07-24T15:47:30.349 INFO:tasks.cephfs.fuse_mount.ceph-fuse.vol_data_isolated.smithi137.stderr:2022-07-24 15:47:30.343 7f3e031a7700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2] 2022-07-24T15:47:30.349 INFO:tasks.cephfs.fuse_mount.ceph-fuse.vol_data_isolated.smithi137.stderr:2022-07-24 15:47:30.343 7f3e029a6700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2] 2022-07-24T15:47:30.393 DEBUG:teuthology.orchestra.run:got remote process result: 32 2022-07-24T15:47:30.397 INFO:teuthology.orchestra.run:Running command with timeout 900 2022-07-24T15:47:30.398 DEBUG:teuthology.orchestra.run.smithi137:> ls /sys/fs/fuse/connections 2022-07-24T15:47:30.399 INFO:tasks.cephfs.fuse_mount.ceph-fuse.vol_data_isolated.smithi137.stderr:failed to fetch mon config (--no-mon-config to skip) 2022-07-24T15:47:30.530 INFO:tasks.cephfs.fuse_mount.ceph-fuse.vol_data_isolated.smithi137.stderr:daemon-helper: command failed with exit status 1 2022-07-24T15:47:30.549 DEBUG:teuthology.orchestra.run:got remote process result: 1 2022-07-24T15:47:30.550 INFO:tasks.cephfs.fuse_mount:mount command failed. 2022-07-24T15:47:30.551 ERROR:teuthology.run_tasks:Saw exception from tasks.
Updated by Xiubo Li over 1 year ago
From remote/smithi029/log/ceph-mon.a.log.gz:
2022-07-24 15:47:30.344 7f52d6623700 10 mon.a@0(leader) e1 handle_auth_request con 0x5568e49bfa80 (start) method 2 payload 34 2022-07-24 15:47:30.344 7f52d6623700 10 mon.a@0(leader).auth v3 _assign_global_id 4560 (max 14096) 2022-07-24 15:47:30.344 7f52d6623700 10 start_session entity_name=client.vol_data_isolated global_id=4560 is_new_global_id=1 2022-07-24 15:47:30.344 7f52d7625700 10 mon.a@0(leader) e1 ms_handle_accept con 0x5568e49be880 no session 2022-07-24 15:47:30.344 7f52d6623700 10 cephx server client.vol_data_isolated: start_session server_challenge aef60bfe7b4573b9 2022-07-24 15:47:30.344 7f52d6623700 10 mon.a@0(leader) e1 handle_auth_request con 0x5568e49bfa80 (more) method 2 payload 2 2022-07-24 15:47:30.344 7f52d6623700 10 cephx server client.vol_data_isolated: handle_request get_auth_session_key for client.vol_data_isolated 2022-07-24 15:47:30.344 7f52d7625700 1 -- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] <== client.? v1:172.21.15.137:0/2195577994 1 ==== auth(proto 0 42 bytes epoch 0) v1 ==== 72+0+0 (unknown 479712082 0 0) 0x5568e4736d80 con 0x5568e49be880 2022-07-24 15:47:30.344 7f52d6623700 0 cephx server client.vol_data_isolated: handle_request failed to decode CephXAuthenticate: buffer::end_of_buffer 2022-07-24 15:47:30.344 7f52d7625700 10 mon.a@0(leader) e1 _ms_dispatch new session 0x5568e4738400 MonSession(client.? v1:172.21.15.137:0/2195577994 is open , features 0x3ffddff8ffecffff (luminous)) features 0x3ffddff8ffecffff 2022-07-24 15:47:30.344 7f52d7625700 20 mon.a@0(leader) e1 entity_name global_id 0 (none) caps 2022-07-24 15:47:30.344 7f52d7625700 10 mon.a@0(leader).paxosservice(auth 1..3) dispatch 0x5568e4736d80 auth(proto 0 42 bytes epoch 0) v1 from client.? v1:172.21.15.137:0/2195577994 con 0x5568e49be880 2022-07-24 15:47:30.344 7f52d6623700 1 --2- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >> conn(0x5568e49bfa80 0x5568e4693100 secure :-1 s=AUTH_ACCEPTING_MORE pgs=0 cs=0 l=1 rev1=1 rx=0 tx=0)._auth_bad_method auth_method 2 r (1) Operation not permitted, allowed_methods [2], allowed_modes [2,1] 2022-07-24 15:47:30.344 7f52d7625700 5 mon.a@0(leader).paxos(paxos active c 1..100) is_readable = 1 - now=2022-07-24 15:47:30.347662 lease_expire=2022-07-24 15:47:34.440663 has v0 lc 100
The monitor logs said it failed when doing the authentication while decoding the auth info, which the payload length was 2 and the method was 2:
2022-07-24 15:47:30.344 7f52d6623700 10 mon.a@0(leader) e1 handle_auth_request con 0x5568e49bfa80 (more) method 2 payload 2
By going through the ceph source code, found that the length of the CephXRequestHeader is 2, that means the client only sent the header, no extra data, it should fail in Line#49 below:
38 int CephxClientHandler::build_request(bufferlist& bl) const 39 { 40 ldout(cct, 10) << "build_request" << dendl; 41 42 if (need & CEPH_ENTITY_TYPE_AUTH) { 43 /* authenticate */ 44 CephXRequestHeader header; 45 header.request_type = CEPHX_GET_AUTH_SESSION_KEY; 46 encode(header, bl); 47 48 CryptoKey secret; 49 const bool got = keyring->get_secret(cct->_conf->name, secret); 50 if (!got) { 51 ldout(cct, 20) << "no secret found for entity: " << cct->_conf->name << dendl; 52 return -ENOENT; 53 } 54
The keyring->get_secret(cct->_conf->name, secret); will get the authkey and caps from /etc/ceph/ceph.client.vol_data_isolated.keyring, but it failed.
From the teuthology.log:
2022-07-24T15:47:25.336 INFO:tasks.workunit.client.0.smithi137.stderr:+ sudo python3 2022-07-24T15:47:25.337 INFO:tasks.workunit.client.0.smithi137.stderr:+ import_key vol_data_isolated 2022-07-24T15:47:25.337 INFO:tasks.workunit.client.0.smithi137.stderr:+ local client=vol_data_isolated 2022-07-24T15:47:25.337 INFO:tasks.workunit.client.0.smithi137.stderr:+ '[' -n '' ']' 2022-07-24T15:47:25.400 INFO:tasks.workunit.client.0.smithi137.stderr:+ local keyring=/etc/ceph/ceph.client.vol_data_isolated.keyring 2022-07-24T15:47:25.401 INFO:tasks.workunit.client.0.smithi137.stderr:++ mktemp 2022-07-24T15:47:25.443 INFO:tasks.workunit.client.0.smithi137.stderr:+ local T=/tmp/tmp.t0CieZarQn 2022-07-24T15:47:25.444 INFO:tasks.workunit.client.0.smithi137.stderr:+ tee /tmp/tmp.t0CieZarQn 2022-07-24T15:47:25.444 INFO:tasks.workunit.client.0.smithi137.stderr:Traceback (most recent call last): 2022-07-24T15:47:25.444 INFO:tasks.workunit.client.0.smithi137.stderr: File "<stdin>", line 2, in <module> 2022-07-24T15:47:25.445 INFO:tasks.workunit.client.0.smithi137.stderr:ModuleNotFoundError: No module named 'ceph_volume_client' 2022-07-24T15:47:25.445 INFO:tasks.workunit.client.0.smithi137.stderr:+ sudo touch -- /etc/ceph/ceph.client.vol_data_isolated.keyring 2022-07-24T15:47:25.497 INFO:tasks.workunit.client.0.smithi137.stderr:+ sudo ceph-authtool /etc/ceph/ceph.client.vol_data_isolated.keyring --import-keyring /tmp/tmp.t0CieZarQn
We can see that it failed when generating the keyring contents for client.vol_data_isolated. The ceph_volume_client python module was missing.
Updated by Xiubo Li over 1 year ago
The root cause is that in nautilus the qa/workunits/fs/upgrade/volume_client script is using python2 to run the python paload script, while the pacific is using python3.
And the fs/upgrade/volumes/import-legacy/ test suite is from pacific when testing against the nautilus ceph.
Currently the test suite doesn't install the python3 packages for nautilus.
Updated by Xiubo Li over 1 year ago
- Target version set to v16.2.11
- Backport changed from quincy,pacific,octopus to octopus
From remote/smithi029/log/ceph-mon.a.log.gz:
2022-07-24 15:47:30.344 7f52d6623700 10 mon.a@0(leader) e1 handle_auth_request con 0x5568e49bfa80 (start) method 2 payload 34 2022-07-24 15:47:30.344 7f52d6623700 10 mon.a@0(leader).auth v3 _assign_global_id 4560 (max 14096) 2022-07-24 15:47:30.344 7f52d6623700 10 start_session entity_name=client.vol_data_isolated global_id=4560 is_new_global_id=1 2022-07-24 15:47:30.344 7f52d7625700 10 mon.a@0(leader) e1 ms_handle_accept con 0x5568e49be880 no session 2022-07-24 15:47:30.344 7f52d6623700 10 cephx server client.vol_data_isolated: start_session server_challenge aef60bfe7b4573b9 2022-07-24 15:47:30.344 7f52d6623700 10 mon.a@0(leader) e1 handle_auth_request con 0x5568e49bfa80 (more) method 2 payload 2 2022-07-24 15:47:30.344 7f52d6623700 10 cephx server client.vol_data_isolated: handle_request get_auth_session_key for client.vol_data_isolated 2022-07-24 15:47:30.344 7f52d7625700 1 -- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] <== client.? v1:172.21.15.137:0/2195577994 1 ==== auth(proto 0 42 bytes epoch 0) v1 ==== 72+0+0 (unknown 479712082 0 0) 0x5568e4736d80 con 0x5568e49be880 2022-07-24 15:47:30.344 7f52d6623700 0 cephx server client.vol_data_isolated: handle_request failed to decode CephXAuthenticate: buffer::end_of_buffer 2022-07-24 15:47:30.344 7f52d7625700 10 mon.a@0(leader) e1 _ms_dispatch new session 0x5568e4738400 MonSession(client.? v1:172.21.15.137:0/2195577994 is open , features 0x3ffddff8ffecffff (luminous)) features 0x3ffddff8ffecffff 2022-07-24 15:47:30.344 7f52d7625700 20 mon.a@0(leader) e1 entity_name global_id 0 (none) caps 2022-07-24 15:47:30.344 7f52d7625700 10 mon.a@0(leader).paxosservice(auth 1..3) dispatch 0x5568e4736d80 auth(proto 0 42 bytes epoch 0) v1 from client.? v1:172.21.15.137:0/2195577994 con 0x5568e49be880 2022-07-24 15:47:30.344 7f52d6623700 1 --2- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >> conn(0x5568e49bfa80 0x5568e4693100 secure :-1 s=AUTH_ACCEPTING_MORE pgs=0 cs=0 l=1 rev1=1 rx=0 tx=0)._auth_bad_method auth_method 2 r (1) Operation not permitted, allowed_methods [2], allowed_modes [2,1] 2022-07-24 15:47:30.344 7f52d7625700 5 mon.a@0(leader).paxos(paxos active c 1..100) is_readable = 1 - now=2022-07-24 15:47:30.347662 lease_expire=2022-07-24 15:47:34.440663 has v0 lc 100
The monitor logs said it failed when doing the authentication while decoding the auth info, which the payload length was 2 and the method was 2:
2022-07-24 15:47:30.344 7f52d6623700 10 mon.a@0(leader) e1 handle_auth_request con 0x5568e49bfa80 (more) method 2 payload 2
By going through the ceph source code, found that the length of the CephXRequestHeader is 2, that means the client only sent the header, no extra data, it should fail in Line#49 below:
38 int CephxClientHandler::build_request(bufferlist& bl) const 39 { 40 ldout(cct, 10) << "build_request" << dendl; 41 42 if (need & CEPH_ENTITY_TYPE_AUTH) { 43 /* authenticate */ 44 CephXRequestHeader header; 45 header.request_type = CEPHX_GET_AUTH_SESSION_KEY; 46 encode(header, bl); 47 48 CryptoKey secret; 49 const bool got = keyring->get_secret(cct->_conf->name, secret); 50 if (!got) { 51 ldout(cct, 20) << "no secret found for entity: " << cct->_conf->name << dendl; 52 return -ENOENT; 53 } 54
The keyring->get_secret(cct->_conf->name, secret); will get the authkey and caps from /etc/ceph/ceph.client.vol_data_isolated.keyring, but it failed.
From the teuthology.log:
2022-07-24T15:47:25.336 INFO:tasks.workunit.client.0.smithi137.stderr:+ sudo python3 2022-07-24T15:47:25.337 INFO:tasks.workunit.client.0.smithi137.stderr:+ import_key vol_data_isolated 2022-07-24T15:47:25.337 INFO:tasks.workunit.client.0.smithi137.stderr:+ local client=vol_data_isolated 2022-07-24T15:47:25.337 INFO:tasks.workunit.client.0.smithi137.stderr:+ '[' -n '' ']' 2022-07-24T15:47:25.400 INFO:tasks.workunit.client.0.smithi137.stderr:+ local keyring=/etc/ceph/ceph.client.vol_data_isolated.keyring 2022-07-24T15:47:25.401 INFO:tasks.workunit.client.0.smithi137.stderr:++ mktemp 2022-07-24T15:47:25.443 INFO:tasks.workunit.client.0.smithi137.stderr:+ local T=/tmp/tmp.t0CieZarQn 2022-07-24T15:47:25.444 INFO:tasks.workunit.client.0.smithi137.stderr:+ tee /tmp/tmp.t0CieZarQn 2022-07-24T15:47:25.444 INFO:tasks.workunit.client.0.smithi137.stderr:Traceback (most recent call last): 2022-07-24T15:47:25.444 INFO:tasks.workunit.client.0.smithi137.stderr: File "<stdin>", line 2, in <module> 2022-07-24T15:47:25.445 INFO:tasks.workunit.client.0.smithi137.stderr:ModuleNotFoundError: No module named 'ceph_volume_client' 2022-07-24T15:47:25.445 INFO:tasks.workunit.client.0.smithi137.stderr:+ sudo touch -- /etc/ceph/ceph.client.vol_data_isolated.keyring 2022-07-24T15:47:25.497 INFO:tasks.workunit.client.0.smithi137.stderr:+ sudo ceph-authtool /etc/ceph/ceph.client.vol_data_isolated.keyring --import-keyring /tmp/tmp.t0CieZarQn
We can see that it failed when generating the keyring contents for client.vol_data_isolated. The ceph_volume_client python module was missing.
Updated by Xiubo Li over 1 year ago
The nautilus is using python2, while the pacific qa suite is using python3 and the qa test suite seems skipped installing the python3 packages when installing the nautilus ceph.
tasks: - install: branch: nautilus exclude_packages: - librados3 - ceph-mgr-dashboard - ceph-mgr-diskprediction-local - ceph-mgr-rook - ceph-mgr-cephadm - cephadm - ceph-immutable-object-cache - python3-rados - python3-rgw - python3-rbd - python3-cephfs extra_packages: - librados2 - print: '**** done installing nautilus'
Updated by Xiubo Li over 1 year ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 47528
Updated by Yuri Weinstein over 1 year ago
Updated by Xiubo Li over 1 year ago
- Status changed from Fix Under Review to Resolved