Bug #44691
mon/caps.sh fails with "Expected return 13, got 0"
Status:
New
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
mimic, nautilus, octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2020-03-19T23:49:59.548 INFO:tasks.mon_thrash:Sending STOP to mon b 2020-03-19T23:49:59.548 INFO:tasks.ceph.mon.b:Sent signal 19 2020-03-19T23:49:59.549 INFO:tasks.mon_thrash.mon_thrasher:waiting for 20.0 secs to unfreeze mons ... 2020-03-19T23:50:12.564 INFO:tasks.workunit.client.0.smithi029.stderr:+ ceph auth del client.foo 2020-03-19T23:50:13.017 INFO:tasks.workunit.client.0.smithi029.stderr:entity client.foo does not exist 2020-03-19T23:50:13.026 INFO:tasks.workunit.client.0.smithi029.stderr:+ expect 'ceph -k /tmp/cephtest-mon-caps-madness.foo.keyring --user foo quorum_status' 13 ... 2020-03-19T23:50:13.027 INFO:tasks.workunit.client.0.smithi029.stdout:ceph -k /tmp/cephtest-mon-caps-madness.foo.keyring --user foo quorum_status 2020-03-19T23:50:19.550 INFO:tasks.mon_thrash:Sending CONT to mon b 2020-03-19T23:50:19.551 INFO:tasks.ceph.mon.b:Sent signal 18 2020-03-19T23:50:19.551 INFO:tasks.mon_thrash.ceph_manager:waiting for quorum size 3 2020-03-19T23:50:19.551 INFO:teuthology.orchestra.run.smithi098:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early quorum_status ... 2020-03-19T23:50:19.883 INFO:tasks.mon_thrash.ceph_manager:quorum is size 3 ... 2020-03-19T23:50:19.926 INFO:tasks.workunit.client.0.smithi029.stderr:+ ret=0 2020-03-19T23:50:19.927 INFO:tasks.workunit.client.0.smithi029.stderr:+ [[ 0 -ne 13 ]] 2020-03-19T23:50:19.927 INFO:tasks.workunit.client.0.smithi029.stderr:+ echo 'Error: Expected return 13, got 0'
on mon.a side
"message": "1/3 mons down, quorum a,c", "message": "mon.b (rank 0) addr [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] is down (out of quorum)" ... 2020-03-19T23:50:12.870+0000 7facd6665700 1 -- [v2:172.21.15.98:3300/0,v1:172.21.15.98:6789/0] <== client.194102 172.21.15.29:0/3380804526 4 ==== mon_command({"prefix": "auth del", "entity": "client.foo"} v 0) v1 ==== 88+0+0 (secure 0 0 0) 0x5592699a2d00 con 0x55926b104c00 ... 2020-03-19T23:50:12.870+0000 7facd6665700 10 mon.a@1(leader).auth v30 prepare_update mon_command({"prefix": "auth del", "entity": "client.foo"} v 0) v1 from client.194102 172.21.15.29:0/3380804526 ... 2020-03-19T23:50:12.870+0000 7facd6665700 10 mon.a@1(leader).paxos(paxos updating c 252..934) now 1,2 have accepted 2020-03-19T23:50:12.870+0000 7facd6665700 10 mon.a@1(leader).paxos(paxos updating c 252..934) got majority, committing, done with update 2020-03-19T23:50:12.870+0000 7facd6665700 10 mon.a@1(leader).paxos(paxos updating c 252..934) commit_start 935 ... 2020-03-19T23:50:12.870+0000 7facd4e62700 10 mon.a@1(leader).auth v31 update_from_paxos version 31 keys ver 30 latest 21 2020-03-19T23:50:12.870+0000 7facd4e62700 10 mon.a@1(leader).auth v31 update_from_paxos key server version 30 2020-03-19T23:50:12.870+0000 7facd4e62700 20 mon.a@1(leader).auth v31 update_from_paxos walking through version 31 len 60 ... 2020-03-19T23:50:13.006+0000 7facd6665700 10 mon.a@1(leader).paxosservice(auth 6..31) dispatch 0x5592699a2880 mon_command({"prefix": "auth del", "entity": "client.foo"} v 0) v1 from client.194102 172.21.15.29:0/3380804526 con 0x55926b104c00 2020-03-19T23:50:13.006+0000 7facd6665700 5 mon.a@1(leader).paxos(paxos active c 252..935) is_readable = 1 - now=2020-03-19T23:50:13.013126+0000 lease_expire=2020-03-19T23:50:17.880320+0000 has v0 lc 935 2020-03-19T23:50:13.006+0000 7facd6665700 10 mon.a@1(leader).auth v31 preprocess_query mon_command({"prefix": "auth del", "entity": "client.foo"} v 0) v1 from client.194102 172.21.15.29:0/3380804526 2020-03-19T23:50:13.006+0000 7facd6665700 10 mon.a@1(leader).auth v31 prepare_update mon_command({"prefix": "auth del", "entity": "client.foo"} v 0) v1 from client.194102 172.21.15.29:0/3380804526 2020-03-19T23:50:13.006+0000 7facd6665700 2 mon.a@1(leader) e1 send_reply 0x55926a9b3590 0x55926c7addc0 mon_command_ack([{"prefix": "auth del", "entity": "client.foo"}]=0 entity client.foo does not exist v31) v1
so in auth v31, client.foo was not listed in the entity list in KeyServer.
on mon.b side
2020-03-19T23:50:19.573+0000 7f12f2f7a700 0 log_channel(cluster) log [INF] : mon.b is new leader, mons b,a,c in quorum (ranks 0,1,2) ... 2020-03-19T23:50:19.573+0000 7f12f2f7a700 10 mon.b@0(leader) e1 reset/close on session client.? 172.21.15.29:0/1214127697 2020-03-19T23:50:19.573+0000 7f12f2f7a700 10 mon.b@0(leader) e1 remove_session 0x5592202b4640 client.? 172.21.15.29:0/1214127697 features 0x3f01cfb8ffadffff 2020-03-19T23:50:19.577+0000 7f12f7783700 1 --2- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >> [v2:172.21.15.98:3300/0,v1:172.21.15.98:6789/0] conn(0x55921f173400 0x55921f1a0d00 secure :-1 s=THROTTLE_DONE pgs=14 cs=1 l=0 rx=0x559221a351d0 tx=0x559229259a40).handle_r ead_frame_epilogue_main read frame epilogue bytes=32 2020-03-19T23:50:19.577+0000 7f12f7783700 1 --2- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >> [v2:172.21.15.98:3300/0,v1:172.21.15.98:6789/0] conn(0x55921f173400 0x55921f1a0d00 secure :-1 s=THROTTLE_DONE pgs=14 cs=1 l=0 rx=0x559221a351d0 tx=0x559229259a40).handle_r ead_frame_epilogue_main read frame epilogue bytes=32 2020-03-19T23:50:19.577+0000 7f12f2f7a700 1 -- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] <== mon.1 v2:172.21.15.98:3300/0 185 ==== paxos(last lc 944 fc 252 pn 3701 opn 0) v4 ==== 172314+0+0 (secure 0 0 0) 0x559220c3b600 con 0x55921f173400 2020-03-19T23:50:19.577+0000 7f12f2f7a700 20 mon.b@0(leader) e1 _ms_dispatch existing session 0x559220001840 for mon.1 2020-03-19T23:50:19.577+0000 7f12f2f7a700 20 mon.b@0(leader) e1 entity caps allow * 2020-03-19T23:50:19.577+0000 7f12f2f7a700 20 is_capable service=mon command= read addr v2:172.21.15.98:3300/0 on cap allow * 2020-03-19T23:50:19.577+0000 7f12f2f7a700 20 allow so far , doing grant allow * 2020-03-19T23:50:19.577+0000 7f12f2f7a700 20 allow all 2020-03-19T23:50:19.577+0000 7f12f2f7a700 20 is_capable service=mon command= exec addr v2:172.21.15.98:3300/0 on cap allow * 2020-03-19T23:50:19.577+0000 7f12f2f7a700 20 allow so far , doing grant allow * 2020-03-19T23:50:19.577+0000 7f12f2f7a700 20 allow all 2020-03-19T23:50:19.577+0000 7f12f2f7a700 10 mon.b@0(leader).paxos(paxos recovering c 252..925) handle_last paxos(last lc 944 fc 252 pn 3701 opn 0) v4 2020-03-19T23:50:19.577+0000 7f12f2f7a700 10 mon.b@0(leader).paxos(paxos recovering c 252..944) store_state [926..944] 2020-03-19T23:50:19.577+0000 7f12f2f7a700 10 mon.b@0(leader).paxos(paxos recovering c 252..944) forgetting obsolete uncommitted value 926 pn 3600 2020-03-19T23:50:19.577+0000 7f12f6f82700 1 --2- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >> [v2:172.21.15.98:3301/0,v1:172.21.15.98:6790/0] conn(0x55921f173800 0x55921f1a1180 secure :-1 s=THROTTLE_DONE pgs=22 cs=0 l=0 rx=0x5592235840c0 tx=0x5592218c0cb0).handle_r ead_frame_epilogue_main read frame epilogue bytes=32 2020-03-19T23:50:19.577+0000 7f12f6f82700 1 --2- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >> [v2:172.21.15.98:3301/0,v1:172.21.15.98:6790/0] conn(0x55921f173800 0x55921f1a1180 secure :-1 s=THROTTLE_DONE pgs=22 cs=0 l=0 rx=0x5592235840c0 tx=0x5592218c0cb0).handle_r ead_frame_epilogue_main read frame epilogue bytes=32 2020-03-19T23:50:19.577+0000 7f12f1f78700 1 --2- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >> conn(0x55921f172800 0x55921ffd1600 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).accept 2020-03-19T23:50:19.577+0000 7f12f1f78700 1 --2- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >> conn(0x55921f172800 0x55921ffd1600 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0 2020-03-19T23:50:19.577+0000 7f12f1f78700 1 --1- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >> conn(0x559220bcc800 0x55921ff77000 :6789 s=ACCEPTING pgs=0 cs=0 l=0).send_server_banner sd=18 legacy v1:172.21.15.29:6789/0 socket_addr v1:172.21.15.29:6789/0 target_addr v1:172.21.15.29:58186/0 2020-03-19T23:50:19.577+0000 7f12f1f78700 10 mon.b@0(leader) e1 handle_auth_request con 0x559220bcc800 (start) method 0 payload 0 2020-03-19T23:50:19.577+0000 7f12f1f78700 10 In get_auth_session_handler for protocol 0 2020-03-19T23:50:19.577+0000 7f12f1f78700 10 mon.b@0(leader) e1 handle_auth_request con 0x55921f172800 (start) method 2 payload 20 2020-03-19T23:50:19.577+0000 7f12f1f78700 10 mon.b@0(leader).auth v28 _assign_global_id 115719 (max 194096) 2020-03-19T23:50:19.577+0000 7f12f1f78700 10 mon.b@0(leader) e1 handle_auth_request assigned global_id 115719 2020-03-19T23:50:19.577+0000 7f12f1f78700 10 cephx server client.foo: start_session server_challenge 3de21c7717ee23d2 2020-03-19T23:50:19.577+0000 7f12f1f78700 10 mon.b@0(leader) e1 handle_auth_request con 0x55921f172800 (more) method 2 payload 36 2020-03-19T23:50:19.577+0000 7f12f1f78700 10 cephx server client.foo: handle_request get_auth_session_key for client.foo 2020-03-19T23:50:19.577+0000 7f12f1f78700 20 cephx server client.foo: checking key: req.key=ed327df32b51ef24 expected_key=ed327df32b51ef24 2020-03-19T23:50:19.577+0000 7f12f1f78700 10 cephx server client.foo: handle_request auth ticket global_id 115719 2020-03-19T23:50:19.577+0000 7f12f1f78700 10 cephx: build_service_ticket_reply encoding 1 tickets with secret AQAGBHRepTHpGRAAoGu4GvVP7g2a9NAHROUp4g== 2020-03-19T23:50:19.577+0000 7f12f1f78700 10 cephx: build_service_ticket service auth secret_id 2 ticket_info.ticket.name=client.foo ticket.global_id 115719 2020-03-19T23:50:19.577+0000 7f12f1f78700 10 cephx keyserverdata: get_caps: name=client.foo 2020-03-19T23:50:19.577+0000 7f12f1f78700 10 cephx keyserverdata: get_secret: num of caps=1 2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx server client.foo: adding key for service mon 2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx server client.foo: adding key for service osd 2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx keyserverdata: get_caps: name=client.foo 2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx keyserverdata: get_secret: num of caps=1 2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx server client.foo: adding key for service mgr 2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx keyserverdata: get_caps: name=client.foo 2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx keyserverdata: get_secret: num of caps=1 2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx server client.foo: adding key for service auth 2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx keyserverdata: get_caps: name=client.foo 2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx keyserverdata: get_secret: num of caps=1 ... 2020-03-19T23:50:19.581+0000 7f12f2f7a700 10 mon.b@0(leader).paxosservice(auth 11..31) refresh 2020-03-19T23:50:19.581+0000 7f12f2f7a700 10 mon.b@0(leader).auth v31 update_from_paxos 2020-03-19T23:50:19.581+0000 7f12f2f7a700 10 mon.b@0(leader).auth v31 update_from_paxos version 31 keys ver 28 latest 21 2020-03-19T23:50:19.581+0000 7f12f2f7a700 10 mon.b@0(leader).auth v31 update_from_paxos key server version 28 2020-03-19T23:50:19.581+0000 7f12f2f7a700 20 mon.b@0(leader).auth v31 update_from_paxos walking through version 29 len 19 2020-03-19T23:50:19.581+0000 7f12f2f7a700 20 mon.b@0(leader).auth v31 update_from_paxos walking through version 30 len 801 2020-03-19T23:50:19.581+0000 7f12f2f7a700 20 mon.b@0(leader).auth v31 update_from_paxos walking through version 31 len 60 2020-03-19T23:50:19.581+0000 7f12f2f7a700 10 mon.b@0(leader).auth v31 update_from_paxos max_global_id=204096 format_version 3 ... 2020-03-19T23:50:19.909+0000 7f12f2f7a700 1 -- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] <== client.? 172.21.15.29:0/2456649890 7 ==== mon_command({"prefix": "quorum_status"} v 0) v1 ==== 69+0+0 (secure 0 0 0) 0x55921f130a00 con 0x55921f172800 2020-03-19T23:50:19.909+0000 7f12f2f7a700 20 mon.b@0(leader) e1 entity client.foo caps allow command "auth ls", allow command quorum_status 2020-03-19T23:50:19.909+0000 7f12f2f7a700 0 mon.b@0(leader) e1 handle_command mon_command({"prefix": "quorum_status"} v 0) v1 2020-03-19T23:50:19.909+0000 7f12f2f7a700 20 is_capable service=mon command=quorum_status read addr 172.21.15.29:0/2456649890 on cap allow command "auth ls", allow command quorum_status 2020-03-19T23:50:19.909+0000 7f12f2f7a700 20 allow so far , doing grant allow command quorum_status 2020-03-19T23:50:19.909+0000 7f12f2f7a700 0 log_channel(audit) log [DBG] : from='client.? 172.21.15.29:0/2456649890' entity='client.foo' cmd=[{"prefix": "quorum_status"}]: dispatch 2020-03-19T23:50:19.913+0000 7f12f2f7a700 2 mon.b@0(leader) e1 send_reply 0x559226cbe580 0x559227212340 mon_command_ack([{"prefix": "quorum_status"}]=0 v0) v1
in short.
- mon.b was paused upon SIGSTOP
- new quorum formed without mon.b
- mon.b was unpaused upon SIGCONT
- mon.b was elected leader
- client.foo sent "quorum_status" to mon.b and expected EACCES. the caps of the established session was set using the stale "key_server".
- mon.b was sync'ed with other monitors in the quorum
- mon.b replied client.foo with retcode 0.
/a/yuriw-2020-03-19_19:01:06-rados-wip-yuri2-testing-2020-03-16-1919-master-distro-basic-smithi/4868795/
History
#1 Updated by Kefu Chai over 3 years ago
- Description updated (diff)
#2 Updated by Kefu Chai over 3 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 34075
#3 Updated by Kefu Chai over 3 years ago
- Backport changed from minic, nautilus to minic, nautilus, octopus
#4 Updated by Neha Ojha over 3 years ago
- Backport changed from minic, nautilus, octopus to mimic, nautilus, octopus
#5 Updated by Kefu Chai over 3 years ago
- Status changed from Fix Under Review to New
- Assignee deleted (
Kefu Chai) - Pull request ID deleted (
34075)