Project

General

Profile

Bug #57083

ceph-fuse: monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]

Added by Xiubo Li 4 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

100%

Source:
Tags:
Backport:
octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client, ceph-fuse, qa-suite
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From https://pulpito.ceph.com/yuriw-2022-07-24_15:34:38-fs-wip-yuri2-testing-2022-07-15-0755-pacific-distro-default-smithi/6946337:

The ceph-fuse mounting failed with authenticating:

2022-07-24T15:47:30.290 DEBUG:teuthology.orchestra.run.smithi137:> sudo mount -t fusectl /sys/fs/fuse/connections /sys/fs/fuse/connections
2022-07-24T15:47:30.348 INFO:teuthology.orchestra.run.smithi137.stderr:mount: /sys/fs/fuse/connections: /sys/fs/fuse/connections already mounted or mount point busy.
2022-07-24T15:47:30.349 INFO:tasks.cephfs.fuse_mount.ceph-fuse.vol_data_isolated.smithi137.stderr:2022-07-24 15:47:30.343 7f3e031a7700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
2022-07-24T15:47:30.349 INFO:tasks.cephfs.fuse_mount.ceph-fuse.vol_data_isolated.smithi137.stderr:2022-07-24 15:47:30.343 7f3e029a6700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
2022-07-24T15:47:30.393 DEBUG:teuthology.orchestra.run:got remote process result: 32
2022-07-24T15:47:30.397 INFO:teuthology.orchestra.run:Running command with timeout 900
2022-07-24T15:47:30.398 DEBUG:teuthology.orchestra.run.smithi137:> ls /sys/fs/fuse/connections
2022-07-24T15:47:30.399 INFO:tasks.cephfs.fuse_mount.ceph-fuse.vol_data_isolated.smithi137.stderr:failed to fetch mon config (--no-mon-config to skip)
2022-07-24T15:47:30.530 INFO:tasks.cephfs.fuse_mount.ceph-fuse.vol_data_isolated.smithi137.stderr:daemon-helper: command failed with exit status 1
2022-07-24T15:47:30.549 DEBUG:teuthology.orchestra.run:got remote process result: 1
2022-07-24T15:47:30.550 INFO:tasks.cephfs.fuse_mount:mount command failed.
2022-07-24T15:47:30.551 ERROR:teuthology.run_tasks:Saw exception from tasks.

Subtasks

Bug #53360: pacific: client: "handle_auth_bad_method server allowed_methods [2] but i only support [2]"DuplicateVenky Shankar

History

#1 Updated by Xiubo Li 4 months ago

From remote/smithi029/log/ceph-mon.a.log.gz:

2022-07-24 15:47:30.344 7f52d6623700 10 mon.a@0(leader) e1 handle_auth_request con 0x5568e49bfa80 (start) method 2 payload 34
2022-07-24 15:47:30.344 7f52d6623700 10 mon.a@0(leader).auth v3 _assign_global_id 4560 (max 14096)
2022-07-24 15:47:30.344 7f52d6623700 10 start_session entity_name=client.vol_data_isolated global_id=4560 is_new_global_id=1
2022-07-24 15:47:30.344 7f52d7625700 10 mon.a@0(leader) e1 ms_handle_accept con 0x5568e49be880 no session
2022-07-24 15:47:30.344 7f52d6623700 10 cephx server client.vol_data_isolated: start_session server_challenge aef60bfe7b4573b9
2022-07-24 15:47:30.344 7f52d6623700 10 mon.a@0(leader) e1 handle_auth_request con 0x5568e49bfa80 (more) method 2 payload 2
2022-07-24 15:47:30.344 7f52d6623700 10 cephx server client.vol_data_isolated: handle_request get_auth_session_key for client.vol_data_isolated
2022-07-24 15:47:30.344 7f52d7625700  1 -- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] <== client.? v1:172.21.15.137:0/2195577994 1 ==== auth(proto 0 42 bytes epoch 0) v1 ==== 72+0+0 (unknown 479712082 0 0) 0x5568e4736d80 con 0x5568e49be880
2022-07-24 15:47:30.344 7f52d6623700  0 cephx server client.vol_data_isolated: handle_request failed to decode CephXAuthenticate: buffer::end_of_buffer
2022-07-24 15:47:30.344 7f52d7625700 10 mon.a@0(leader) e1 _ms_dispatch new session 0x5568e4738400 MonSession(client.? v1:172.21.15.137:0/2195577994 is open , features 0x3ffddff8ffecffff (luminous)) features 0x3ffddff8ffecffff
2022-07-24 15:47:30.344 7f52d7625700 20 mon.a@0(leader) e1  entity_name  global_id 0 (none) caps
2022-07-24 15:47:30.344 7f52d7625700 10 mon.a@0(leader).paxosservice(auth 1..3) dispatch 0x5568e4736d80 auth(proto 0 42 bytes epoch 0) v1 from client.? v1:172.21.15.137:0/2195577994 con 0x5568e49be880
2022-07-24 15:47:30.344 7f52d6623700  1 --2- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >>  conn(0x5568e49bfa80 0x5568e4693100 secure :-1 s=AUTH_ACCEPTING_MORE pgs=0 cs=0 l=1 rev1=1 rx=0 tx=0)._auth_bad_method auth_method 2 r (1) Operation not permitted, allowed_methods [2], allowed_modes [2,1]
2022-07-24 15:47:30.344 7f52d7625700  5 mon.a@0(leader).paxos(paxos active c 1..100) is_readable = 1 - now=2022-07-24 15:47:30.347662 lease_expire=2022-07-24 15:47:34.440663 has v0 lc 100

The monitor logs said it failed when doing the authentication while decoding the auth info, which the payload length was 2 and the method was 2:

2022-07-24 15:47:30.344 7f52d6623700 10 mon.a@0(leader) e1 handle_auth_request con 0x5568e49bfa80 (more) method 2 payload 2

By going through the ceph source code, found that the length of the CephXRequestHeader is 2, that means the client only sent the header, no extra data, it should fail in Line#49 below:

 38 int CephxClientHandler::build_request(bufferlist& bl) const
 39 {  
 40   ldout(cct, 10) << "build_request" << dendl;
 41    
 42   if (need & CEPH_ENTITY_TYPE_AUTH) { 
 43     /* authenticate */
 44     CephXRequestHeader header;
 45     header.request_type = CEPHX_GET_AUTH_SESSION_KEY;
 46     encode(header, bl);
 47 
 48     CryptoKey secret;
 49     const bool got = keyring->get_secret(cct->_conf->name, secret);                                                                                
 50     if (!got) {
 51       ldout(cct, 20) << "no secret found for entity: " << cct->_conf->name << dendl;
 52       return -ENOENT;
 53     }
 54 

The keyring->get_secret(cct->_conf->name, secret); will get the authkey and caps from /etc/ceph/ceph.client.vol_data_isolated.keyring, but it failed.

From the teuthology.log:

2022-07-24T15:47:25.336 INFO:tasks.workunit.client.0.smithi137.stderr:+ sudo python3
2022-07-24T15:47:25.337 INFO:tasks.workunit.client.0.smithi137.stderr:+ import_key vol_data_isolated
2022-07-24T15:47:25.337 INFO:tasks.workunit.client.0.smithi137.stderr:+ local client=vol_data_isolated
2022-07-24T15:47:25.337 INFO:tasks.workunit.client.0.smithi137.stderr:+ '[' -n '' ']'
2022-07-24T15:47:25.400 INFO:tasks.workunit.client.0.smithi137.stderr:+ local keyring=/etc/ceph/ceph.client.vol_data_isolated.keyring
2022-07-24T15:47:25.401 INFO:tasks.workunit.client.0.smithi137.stderr:++ mktemp
2022-07-24T15:47:25.443 INFO:tasks.workunit.client.0.smithi137.stderr:+ local T=/tmp/tmp.t0CieZarQn
2022-07-24T15:47:25.444 INFO:tasks.workunit.client.0.smithi137.stderr:+ tee /tmp/tmp.t0CieZarQn
2022-07-24T15:47:25.444 INFO:tasks.workunit.client.0.smithi137.stderr:Traceback (most recent call last):
2022-07-24T15:47:25.444 INFO:tasks.workunit.client.0.smithi137.stderr:  File "<stdin>", line 2, in <module>
2022-07-24T15:47:25.445 INFO:tasks.workunit.client.0.smithi137.stderr:ModuleNotFoundError: No module named 'ceph_volume_client'
2022-07-24T15:47:25.445 INFO:tasks.workunit.client.0.smithi137.stderr:+ sudo touch -- /etc/ceph/ceph.client.vol_data_isolated.keyring
2022-07-24T15:47:25.497 INFO:tasks.workunit.client.0.smithi137.stderr:+ sudo ceph-authtool /etc/ceph/ceph.client.vol_data_isolated.keyring --import-keyring /tmp/tmp.t0CieZarQn

We can see that it failed when generating the keyring contents for client.vol_data_isolated. The ceph_volume_client python module was missing.

#2 Updated by Xiubo Li 4 months ago

The root cause is that in nautilus the qa/workunits/fs/upgrade/volume_client script is using python2 to run the python paload script, while the pacific is using python3.

And the fs/upgrade/volumes/import-legacy/ test suite is from pacific when testing against the nautilus ceph.

Currently the test suite doesn't install the python3 packages for nautilus.

#3 Updated by Xiubo Li 4 months ago

  • Target version set to v16.2.11
  • Backport changed from quincy,pacific,octopus to octopus

From remote/smithi029/log/ceph-mon.a.log.gz:

2022-07-24 15:47:30.344 7f52d6623700 10 mon.a@0(leader) e1 handle_auth_request con 0x5568e49bfa80 (start) method 2 payload 34
2022-07-24 15:47:30.344 7f52d6623700 10 mon.a@0(leader).auth v3 _assign_global_id 4560 (max 14096)
2022-07-24 15:47:30.344 7f52d6623700 10 start_session entity_name=client.vol_data_isolated global_id=4560 is_new_global_id=1
2022-07-24 15:47:30.344 7f52d7625700 10 mon.a@0(leader) e1 ms_handle_accept con 0x5568e49be880 no session
2022-07-24 15:47:30.344 7f52d6623700 10 cephx server client.vol_data_isolated: start_session server_challenge aef60bfe7b4573b9
2022-07-24 15:47:30.344 7f52d6623700 10 mon.a@0(leader) e1 handle_auth_request con 0x5568e49bfa80 (more) method 2 payload 2
2022-07-24 15:47:30.344 7f52d6623700 10 cephx server client.vol_data_isolated: handle_request get_auth_session_key for client.vol_data_isolated
2022-07-24 15:47:30.344 7f52d7625700  1 -- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] <== client.? v1:172.21.15.137:0/2195577994 1 ==== auth(proto 0 42 bytes epoch 0) v1 ==== 72+0+0 (unknown 479712082 0 0) 0x5568e4736d80 con 0x5568e49be880
2022-07-24 15:47:30.344 7f52d6623700  0 cephx server client.vol_data_isolated: handle_request failed to decode CephXAuthenticate: buffer::end_of_buffer
2022-07-24 15:47:30.344 7f52d7625700 10 mon.a@0(leader) e1 _ms_dispatch new session 0x5568e4738400 MonSession(client.? v1:172.21.15.137:0/2195577994 is open , features 0x3ffddff8ffecffff (luminous)) features 0x3ffddff8ffecffff
2022-07-24 15:47:30.344 7f52d7625700 20 mon.a@0(leader) e1  entity_name  global_id 0 (none) caps
2022-07-24 15:47:30.344 7f52d7625700 10 mon.a@0(leader).paxosservice(auth 1..3) dispatch 0x5568e4736d80 auth(proto 0 42 bytes epoch 0) v1 from client.? v1:172.21.15.137:0/2195577994 con 0x5568e49be880
2022-07-24 15:47:30.344 7f52d6623700  1 --2- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >>  conn(0x5568e49bfa80 0x5568e4693100 secure :-1 s=AUTH_ACCEPTING_MORE pgs=0 cs=0 l=1 rev1=1 rx=0 tx=0)._auth_bad_method auth_method 2 r (1) Operation not permitted, allowed_methods [2], allowed_modes [2,1]
2022-07-24 15:47:30.344 7f52d7625700  5 mon.a@0(leader).paxos(paxos active c 1..100) is_readable = 1 - now=2022-07-24 15:47:30.347662 lease_expire=2022-07-24 15:47:34.440663 has v0 lc 100

The monitor logs said it failed when doing the authentication while decoding the auth info, which the payload length was 2 and the method was 2:

2022-07-24 15:47:30.344 7f52d6623700 10 mon.a@0(leader) e1 handle_auth_request con 0x5568e49bfa80 (more) method 2 payload 2

By going through the ceph source code, found that the length of the CephXRequestHeader is 2, that means the client only sent the header, no extra data, it should fail in Line#49 below:

 38 int CephxClientHandler::build_request(bufferlist& bl) const
 39 {  
 40   ldout(cct, 10) << "build_request" << dendl;
 41    
 42   if (need & CEPH_ENTITY_TYPE_AUTH) { 
 43     /* authenticate */
 44     CephXRequestHeader header;
 45     header.request_type = CEPHX_GET_AUTH_SESSION_KEY;
 46     encode(header, bl);
 47 
 48     CryptoKey secret;
 49     const bool got = keyring->get_secret(cct->_conf->name, secret);                                                                                
 50     if (!got) {
 51       ldout(cct, 20) << "no secret found for entity: " << cct->_conf->name << dendl;
 52       return -ENOENT;
 53     }
 54 

The keyring->get_secret(cct->_conf->name, secret); will get the authkey and caps from /etc/ceph/ceph.client.vol_data_isolated.keyring, but it failed.

From the teuthology.log:

2022-07-24T15:47:25.336 INFO:tasks.workunit.client.0.smithi137.stderr:+ sudo python3
2022-07-24T15:47:25.337 INFO:tasks.workunit.client.0.smithi137.stderr:+ import_key vol_data_isolated
2022-07-24T15:47:25.337 INFO:tasks.workunit.client.0.smithi137.stderr:+ local client=vol_data_isolated
2022-07-24T15:47:25.337 INFO:tasks.workunit.client.0.smithi137.stderr:+ '[' -n '' ']'
2022-07-24T15:47:25.400 INFO:tasks.workunit.client.0.smithi137.stderr:+ local keyring=/etc/ceph/ceph.client.vol_data_isolated.keyring
2022-07-24T15:47:25.401 INFO:tasks.workunit.client.0.smithi137.stderr:++ mktemp
2022-07-24T15:47:25.443 INFO:tasks.workunit.client.0.smithi137.stderr:+ local T=/tmp/tmp.t0CieZarQn
2022-07-24T15:47:25.444 INFO:tasks.workunit.client.0.smithi137.stderr:+ tee /tmp/tmp.t0CieZarQn
2022-07-24T15:47:25.444 INFO:tasks.workunit.client.0.smithi137.stderr:Traceback (most recent call last):
2022-07-24T15:47:25.444 INFO:tasks.workunit.client.0.smithi137.stderr:  File "<stdin>", line 2, in <module>
2022-07-24T15:47:25.445 INFO:tasks.workunit.client.0.smithi137.stderr:ModuleNotFoundError: No module named 'ceph_volume_client'
2022-07-24T15:47:25.445 INFO:tasks.workunit.client.0.smithi137.stderr:+ sudo touch -- /etc/ceph/ceph.client.vol_data_isolated.keyring
2022-07-24T15:47:25.497 INFO:tasks.workunit.client.0.smithi137.stderr:+ sudo ceph-authtool /etc/ceph/ceph.client.vol_data_isolated.keyring --import-keyring /tmp/tmp.t0CieZarQn

We can see that it failed when generating the keyring contents for client.vol_data_isolated. The ceph_volume_client python module was missing.

#4 Updated by Xiubo Li 4 months ago

The nautilus is using python2, while the pacific qa suite is using python3 and the qa test suite seems skipped installing the python3 packages when installing the nautilus ceph.

  tasks:
  - install:
      branch: nautilus
      exclude_packages:
      - librados3
      - ceph-mgr-dashboard
      - ceph-mgr-diskprediction-local
      - ceph-mgr-rook
      - ceph-mgr-cephadm
      - cephadm
      - ceph-immutable-object-cache
      - python3-rados
      - python3-rgw
      - python3-rbd
      - python3-cephfs
      extra_packages:
      - librados2
  - print: '**** done installing nautilus'

#5 Updated by Xiubo Li 4 months ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 47528

#7 Updated by Xiubo Li 2 months ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF