Project

General

Profile

Bug #44691

mon/caps.sh fails with "Expected return 13, got 0"

Added by Kefu Chai about 4 years ago. Updated almost 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
mimic, nautilus, octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-03-19T23:49:59.548 INFO:tasks.mon_thrash:Sending STOP to mon b
2020-03-19T23:49:59.548 INFO:tasks.ceph.mon.b:Sent signal 19
2020-03-19T23:49:59.549 INFO:tasks.mon_thrash.mon_thrasher:waiting for 20.0 secs to unfreeze mons
...
2020-03-19T23:50:12.564 INFO:tasks.workunit.client.0.smithi029.stderr:+ ceph auth del client.foo
2020-03-19T23:50:13.017 INFO:tasks.workunit.client.0.smithi029.stderr:entity client.foo does not exist
2020-03-19T23:50:13.026 INFO:tasks.workunit.client.0.smithi029.stderr:+ expect 'ceph -k /tmp/cephtest-mon-caps-madness.foo.keyring --user foo quorum_status' 13
...
2020-03-19T23:50:13.027 INFO:tasks.workunit.client.0.smithi029.stdout:ceph -k /tmp/cephtest-mon-caps-madness.foo.keyring --user foo quorum_status
2020-03-19T23:50:19.550 INFO:tasks.mon_thrash:Sending CONT to mon b
2020-03-19T23:50:19.551 INFO:tasks.ceph.mon.b:Sent signal 18
2020-03-19T23:50:19.551 INFO:tasks.mon_thrash.ceph_manager:waiting for quorum size 3
2020-03-19T23:50:19.551 INFO:teuthology.orchestra.run.smithi098:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early quorum_status
...
2020-03-19T23:50:19.883 INFO:tasks.mon_thrash.ceph_manager:quorum is size 3
...
2020-03-19T23:50:19.926 INFO:tasks.workunit.client.0.smithi029.stderr:+ ret=0
2020-03-19T23:50:19.927 INFO:tasks.workunit.client.0.smithi029.stderr:+ [[ 0 -ne 13 ]]
2020-03-19T23:50:19.927 INFO:tasks.workunit.client.0.smithi029.stderr:+ echo 'Error: Expected return 13, got 0'

on mon.a side

                    "message": "1/3 mons down, quorum a,c",
                    "message": "mon.b (rank 0) addr [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] is down (out of quorum)" 
...                 
2020-03-19T23:50:12.870+0000 7facd6665700  1 -- [v2:172.21.15.98:3300/0,v1:172.21.15.98:6789/0] <== client.194102 172.21.15.29:0/3380804526 4 ==== mon_command({"prefix": "auth del", "entity": "client.foo"} v 0) v1 ==== 88+0+0 (secure 0 0 0) 0x5592699a2d00 con 0x55926b104c00
...
2020-03-19T23:50:12.870+0000 7facd6665700 10 mon.a@1(leader).auth v30 prepare_update mon_command({"prefix": "auth del", "entity": "client.foo"} v 0) v1 from client.194102 172.21.15.29:0/3380804526
...
2020-03-19T23:50:12.870+0000 7facd6665700 10 mon.a@1(leader).paxos(paxos updating c 252..934)  now 1,2 have accepted
2020-03-19T23:50:12.870+0000 7facd6665700 10 mon.a@1(leader).paxos(paxos updating c 252..934)  got majority, committing, done with update
2020-03-19T23:50:12.870+0000 7facd6665700 10 mon.a@1(leader).paxos(paxos updating c 252..934) commit_start 935
...
2020-03-19T23:50:12.870+0000 7facd4e62700 10 mon.a@1(leader).auth v31 update_from_paxos version 31 keys ver 30 latest 21
2020-03-19T23:50:12.870+0000 7facd4e62700 10 mon.a@1(leader).auth v31 update_from_paxos key server version 30
2020-03-19T23:50:12.870+0000 7facd4e62700 20 mon.a@1(leader).auth v31 update_from_paxos walking through version 31 len 60
...
2020-03-19T23:50:13.006+0000 7facd6665700 10 mon.a@1(leader).paxosservice(auth 6..31) dispatch 0x5592699a2880 mon_command({"prefix": "auth del", "entity": "client.foo"} v 0) v1 from client.194102 172.21.15.29:0/3380804526 con 0x55926b104c00
2020-03-19T23:50:13.006+0000 7facd6665700  5 mon.a@1(leader).paxos(paxos active c 252..935) is_readable = 1 - now=2020-03-19T23:50:13.013126+0000 lease_expire=2020-03-19T23:50:17.880320+0000 has v0 lc 935
2020-03-19T23:50:13.006+0000 7facd6665700 10 mon.a@1(leader).auth v31 preprocess_query mon_command({"prefix": "auth del", "entity": "client.foo"} v 0) v1 from client.194102 172.21.15.29:0/3380804526
2020-03-19T23:50:13.006+0000 7facd6665700 10 mon.a@1(leader).auth v31 prepare_update mon_command({"prefix": "auth del", "entity": "client.foo"} v 0) v1 from client.194102 172.21.15.29:0/3380804526
2020-03-19T23:50:13.006+0000 7facd6665700  2 mon.a@1(leader) e1 send_reply 0x55926a9b3590 0x55926c7addc0 mon_command_ack([{"prefix": "auth del", "entity": "client.foo"}]=0 entity client.foo does not exist v31) v1

so in auth v31, client.foo was not listed in the entity list in KeyServer.

on mon.b side

2020-03-19T23:50:19.573+0000 7f12f2f7a700  0 log_channel(cluster) log [INF] : mon.b is new leader, mons b,a,c in quorum (ranks 0,1,2)
...
2020-03-19T23:50:19.573+0000 7f12f2f7a700 10 mon.b@0(leader) e1 reset/close on session client.? 172.21.15.29:0/1214127697
2020-03-19T23:50:19.573+0000 7f12f2f7a700 10 mon.b@0(leader) e1 remove_session 0x5592202b4640 client.? 172.21.15.29:0/1214127697 features 0x3f01cfb8ffadffff
2020-03-19T23:50:19.577+0000 7f12f7783700  1 --2- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >> [v2:172.21.15.98:3300/0,v1:172.21.15.98:6789/0] conn(0x55921f173400 0x55921f1a0d00 secure :-1 s=THROTTLE_DONE pgs=14 cs=1 l=0 rx=0x559221a351d0 tx=0x559229259a40).handle_r
ead_frame_epilogue_main read frame epilogue bytes=32
2020-03-19T23:50:19.577+0000 7f12f7783700  1 --2- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >> [v2:172.21.15.98:3300/0,v1:172.21.15.98:6789/0] conn(0x55921f173400 0x55921f1a0d00 secure :-1 s=THROTTLE_DONE pgs=14 cs=1 l=0 rx=0x559221a351d0 tx=0x559229259a40).handle_r
ead_frame_epilogue_main read frame epilogue bytes=32
2020-03-19T23:50:19.577+0000 7f12f2f7a700  1 -- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] <== mon.1 v2:172.21.15.98:3300/0 185 ==== paxos(last lc 944 fc 252 pn 3701 opn 0) v4 ==== 172314+0+0 (secure 0 0 0) 0x559220c3b600 con 0x55921f173400
2020-03-19T23:50:19.577+0000 7f12f2f7a700 20 mon.b@0(leader) e1 _ms_dispatch existing session 0x559220001840 for mon.1
2020-03-19T23:50:19.577+0000 7f12f2f7a700 20 mon.b@0(leader) e1  entity  caps allow *
2020-03-19T23:50:19.577+0000 7f12f2f7a700 20 is_capable service=mon command= read addr v2:172.21.15.98:3300/0 on cap allow *
2020-03-19T23:50:19.577+0000 7f12f2f7a700 20  allow so far , doing grant allow *
2020-03-19T23:50:19.577+0000 7f12f2f7a700 20  allow all
2020-03-19T23:50:19.577+0000 7f12f2f7a700 20 is_capable service=mon command= exec addr v2:172.21.15.98:3300/0 on cap allow *
2020-03-19T23:50:19.577+0000 7f12f2f7a700 20  allow so far , doing grant allow *
2020-03-19T23:50:19.577+0000 7f12f2f7a700 20  allow all
2020-03-19T23:50:19.577+0000 7f12f2f7a700 10 mon.b@0(leader).paxos(paxos recovering c 252..925) handle_last paxos(last lc 944 fc 252 pn 3701 opn 0) v4
2020-03-19T23:50:19.577+0000 7f12f2f7a700 10 mon.b@0(leader).paxos(paxos recovering c 252..944) store_state [926..944]
2020-03-19T23:50:19.577+0000 7f12f2f7a700 10 mon.b@0(leader).paxos(paxos recovering c 252..944)  forgetting obsolete uncommitted value 926 pn 3600
2020-03-19T23:50:19.577+0000 7f12f6f82700  1 --2- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >> [v2:172.21.15.98:3301/0,v1:172.21.15.98:6790/0] conn(0x55921f173800 0x55921f1a1180 secure :-1 s=THROTTLE_DONE pgs=22 cs=0 l=0 rx=0x5592235840c0 tx=0x5592218c0cb0).handle_r
ead_frame_epilogue_main read frame epilogue bytes=32
2020-03-19T23:50:19.577+0000 7f12f6f82700  1 --2- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >> [v2:172.21.15.98:3301/0,v1:172.21.15.98:6790/0] conn(0x55921f173800 0x55921f1a1180 secure :-1 s=THROTTLE_DONE pgs=22 cs=0 l=0 rx=0x5592235840c0 tx=0x5592218c0cb0).handle_r
ead_frame_epilogue_main read frame epilogue bytes=32
2020-03-19T23:50:19.577+0000 7f12f1f78700  1 --2- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >>  conn(0x55921f172800 0x55921ffd1600 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).accept
2020-03-19T23:50:19.577+0000 7f12f1f78700  1 --2- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >>  conn(0x55921f172800 0x55921ffd1600 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2020-03-19T23:50:19.577+0000 7f12f1f78700  1 --1- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] >>  conn(0x559220bcc800 0x55921ff77000 :6789 s=ACCEPTING pgs=0 cs=0 l=0).send_server_banner sd=18 legacy v1:172.21.15.29:6789/0 socket_addr v1:172.21.15.29:6789/0 target_addr
 v1:172.21.15.29:58186/0
2020-03-19T23:50:19.577+0000 7f12f1f78700 10 mon.b@0(leader) e1 handle_auth_request con 0x559220bcc800 (start) method 0 payload 0
2020-03-19T23:50:19.577+0000 7f12f1f78700 10 In get_auth_session_handler for protocol 0
2020-03-19T23:50:19.577+0000 7f12f1f78700 10 mon.b@0(leader) e1 handle_auth_request con 0x55921f172800 (start) method 2 payload 20
2020-03-19T23:50:19.577+0000 7f12f1f78700 10 mon.b@0(leader).auth v28 _assign_global_id 115719 (max 194096)
2020-03-19T23:50:19.577+0000 7f12f1f78700 10 mon.b@0(leader) e1 handle_auth_request  assigned global_id 115719
2020-03-19T23:50:19.577+0000 7f12f1f78700 10 cephx server client.foo: start_session server_challenge 3de21c7717ee23d2
2020-03-19T23:50:19.577+0000 7f12f1f78700 10 mon.b@0(leader) e1 handle_auth_request con 0x55921f172800 (more) method 2 payload 36
2020-03-19T23:50:19.577+0000 7f12f1f78700 10 cephx server client.foo: handle_request get_auth_session_key for client.foo
2020-03-19T23:50:19.577+0000 7f12f1f78700 20 cephx server client.foo:  checking key: req.key=ed327df32b51ef24 expected_key=ed327df32b51ef24
2020-03-19T23:50:19.577+0000 7f12f1f78700 10 cephx server client.foo: handle_request auth ticket global_id 115719
2020-03-19T23:50:19.577+0000 7f12f1f78700 10 cephx: build_service_ticket_reply encoding 1 tickets with secret AQAGBHRepTHpGRAAoGu4GvVP7g2a9NAHROUp4g==
2020-03-19T23:50:19.577+0000 7f12f1f78700 10 cephx: build_service_ticket service auth secret_id 2 ticket_info.ticket.name=client.foo ticket.global_id 115719
2020-03-19T23:50:19.577+0000 7f12f1f78700 10 cephx keyserverdata: get_caps: name=client.foo
2020-03-19T23:50:19.577+0000 7f12f1f78700 10 cephx keyserverdata: get_secret: num of caps=1
2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx server client.foo:  adding key for service mon
2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx server client.foo:  adding key for service osd
2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx keyserverdata: get_caps: name=client.foo
2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx keyserverdata: get_secret: num of caps=1
2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx server client.foo:  adding key for service mgr
2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx keyserverdata: get_caps: name=client.foo
2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx keyserverdata: get_secret: num of caps=1
2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx server client.foo:  adding key for service auth
2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx keyserverdata: get_caps: name=client.foo
2020-03-19T23:50:19.581+0000 7f12f1f78700 10 cephx keyserverdata: get_secret: num of caps=1
...
2020-03-19T23:50:19.581+0000 7f12f2f7a700 10 mon.b@0(leader).paxosservice(auth 11..31) refresh
2020-03-19T23:50:19.581+0000 7f12f2f7a700 10 mon.b@0(leader).auth v31 update_from_paxos
2020-03-19T23:50:19.581+0000 7f12f2f7a700 10 mon.b@0(leader).auth v31 update_from_paxos version 31 keys ver 28 latest 21
2020-03-19T23:50:19.581+0000 7f12f2f7a700 10 mon.b@0(leader).auth v31 update_from_paxos key server version 28
2020-03-19T23:50:19.581+0000 7f12f2f7a700 20 mon.b@0(leader).auth v31 update_from_paxos walking through version 29 len 19
2020-03-19T23:50:19.581+0000 7f12f2f7a700 20 mon.b@0(leader).auth v31 update_from_paxos walking through version 30 len 801
2020-03-19T23:50:19.581+0000 7f12f2f7a700 20 mon.b@0(leader).auth v31 update_from_paxos walking through version 31 len 60
2020-03-19T23:50:19.581+0000 7f12f2f7a700 10 mon.b@0(leader).auth v31 update_from_paxos max_global_id=204096 format_version 3
...
2020-03-19T23:50:19.909+0000 7f12f2f7a700  1 -- [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] <== client.? 172.21.15.29:0/2456649890 7 ==== mon_command({"prefix": "quorum_status"} v 0) v1 ==== 69+0+0 (secure 0 0 0) 0x55921f130a00 con 0x55921f172800
2020-03-19T23:50:19.909+0000 7f12f2f7a700 20 mon.b@0(leader) e1  entity client.foo caps allow command "auth ls", allow command quorum_status
2020-03-19T23:50:19.909+0000 7f12f2f7a700  0 mon.b@0(leader) e1 handle_command mon_command({"prefix": "quorum_status"} v 0) v1
2020-03-19T23:50:19.909+0000 7f12f2f7a700 20 is_capable service=mon command=quorum_status read addr 172.21.15.29:0/2456649890 on cap allow command "auth ls", allow command quorum_status
2020-03-19T23:50:19.909+0000 7f12f2f7a700 20  allow so far , doing grant allow command quorum_status
2020-03-19T23:50:19.909+0000 7f12f2f7a700  0 log_channel(audit) log [DBG] : from='client.? 172.21.15.29:0/2456649890' entity='client.foo' cmd=[{"prefix": "quorum_status"}]: dispatch
2020-03-19T23:50:19.913+0000 7f12f2f7a700  2 mon.b@0(leader) e1 send_reply 0x559226cbe580 0x559227212340 mon_command_ack([{"prefix": "quorum_status"}]=0  v0) v1

in short.

  1. mon.b was paused upon SIGSTOP
  2. new quorum formed without mon.b
  3. mon.b was unpaused upon SIGCONT
  4. mon.b was elected leader
  5. client.foo sent "quorum_status" to mon.b and expected EACCES. the caps of the established session was set using the stale "key_server".
  6. mon.b was sync'ed with other monitors in the quorum
  7. mon.b replied client.foo with retcode 0.

/a/yuriw-2020-03-19_19:01:06-rados-wip-yuri2-testing-2020-03-16-1919-master-distro-basic-smithi/4868795/

History

#1 Updated by Kefu Chai about 4 years ago

  • Description updated (diff)

#2 Updated by Kefu Chai about 4 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 34075

#3 Updated by Kefu Chai about 4 years ago

  • Backport changed from minic, nautilus to minic, nautilus, octopus

#4 Updated by Neha Ojha about 4 years ago

  • Backport changed from minic, nautilus, octopus to mimic, nautilus, octopus

#5 Updated by Kefu Chai almost 4 years ago

  • Status changed from Fix Under Review to New
  • Assignee deleted (Kefu Chai)
  • Pull request ID deleted (34075)

Also available in: Atom PDF