Project

General

Profile

Actions

Bug #43553

closed

mon: client mon_status fails

Added by Patrick Donnelly over 4 years ago. Updated about 2 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
octopus
Regression:
No
Severity:
4 - irritation
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-01-10T07:21:42.355 INFO:teuthology.orchestra.run.smithi027:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mon.b mon_status
...
2020-01-10T07:23:42.370 DEBUG:teuthology.orchestra.run:got remote process result: 124
2020-01-10T07:23:42.370 ERROR:tasks.mon_thrash.mon_thrasher:exception:
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200109.233534/qa/tasks/mon_thrash.py", line 232, in do_thrash
    self._do_thrash()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200109.233534/qa/tasks/mon_thrash.py", line 264, in _do_thrash
    s = self.manager.get_mon_status(m)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200109.233534/qa/tasks/ceph_manager.py", line 2764, in get_mon_status
    out = self.raw_cluster_cmd('tell', 'mon.%s' % mon, 'mon_status')
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200109.233534/qa/tasks/ceph_manager.py", line 1276, in raw_cluster_cmd
    stdout=StringIO(),
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 198, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 433, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 158, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 180, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed on smithi027 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mon.b mon_status'
...
2020-01-10T07:23:48.133 INFO:tasks.daemonwatchdog.daemon_watchdog:MonitorThrasher failed
2020-01-10T07:23:48.133 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons

From: /ceph/teuthology-archive/pdonnell-2020-01-10_02:39:18-fs-wip-pdonnell-testing-20200109.233534-distro-basic-smithi/4651189/teuthology.log

Monitor log shows:

2020-01-10T07:21:42.464+0000 7f977475a700  1 --1- [v2:172.21.15.168:3300/0,v1:172.21.15.168:6789/0] >>  conn(0x561eb0eb9a80 0x561eb3197800 :6789 s=ACCEPTING pgs=0 cs=0 l=0).send_server_banner sd=37 legacy v1:172.21.15.168:6789/0 socket_addr v1:172.21.15.168:6789/0 target_addr v1:172.21.15.27:33884/0
2020-01-10T07:21:42.464+0000 7f977475a700 10 mon.b@1(peon) e1 handle_auth_request con 0x561eb0eb9a80 (start) method 0 payload 0
2020-01-10T07:21:42.464+0000 7f977475a700 10 In get_auth_session_handler for protocol 0
2020-01-10T07:21:42.464+0000 7f977575c700 10 mon.b@1(peon) e1 ms_handle_accept con 0x561eb0eb9a80 no session
2020-01-10T07:21:42.464+0000 7f977575c700  1 -- [v2:172.21.15.168:3300/0,v1:172.21.15.168:6789/0] <== client.? v1:172.21.15.27:0/4192507940 1 ==== auth(proto 0 30 bytes epoch 0) v1 ==== 60+0+0 (unknown 673663173 0 0) 0x561eb1053f80 con 0x561eb0eb9a80
2020-01-10T07:21:42.464+0000 7f977575c700 10 mon.b@1(peon) e1 _ms_dispatch new session 0x561eb1054ac0 MonSession(client.? v1:172.21.15.27:0/4192507940 is open , features 0x3f01cfb8ffadffff (luminous)) features 0x3f01cfb8ffadffff
2020-01-10T07:21:42.464+0000 7f977575c700 20 mon.b@1(peon) e1  entity  caps
2020-01-10T07:21:42.464+0000 7f977575c700 10 mon.b@1(peon).paxosservice(auth 1..110) dispatch 0x561eb1053f80 auth(proto 0 30 bytes epoch 0) v1 from client.? v1:172.21.15.27:0/4192507940 con 0x561eb0eb9a80
2020-01-10T07:21:42.464+0000 7f977575c700  5 mon.b@1(peon).paxos(paxos active c 3013..3574) is_readable = 1 - now=2020-01-10T07:21:42.468324+0000 lease_expire=2020-01-10T07:21:47.214219+0000 has v0 lc 3574
2020-01-10T07:21:42.464+0000 7f977575c700 10 mon.b@1(peon).auth v110 preprocess_query auth(proto 0 30 bytes epoch 0) v1 from client.? v1:172.21.15.27:0/4192507940
2020-01-10T07:21:42.464+0000 7f977575c700 10 mon.b@1(peon).auth v110 prep_auth() blob_size=30
2020-01-10T07:21:42.464+0000 7f977575c700 10 mon.b@1(peon).auth v110 _assign_global_id 794284 (max 814096)
2020-01-10T07:21:42.464+0000 7f977575c700 10 cephx server client.admin: start_session server_challenge e9971bdbd8f321b0
2020-01-10T07:21:42.464+0000 7f977575c700  2 mon.b@1(peon) e1 send_reply 0x561eb00e2a30 0x561eb311bd40 auth_reply(proto 2 0 (0) Success) v1
2020-01-10T07:21:42.464+0000 7f977575c700  1 -- [v2:172.21.15.168:3300/0,v1:172.21.15.168:6789/0] --> v1:172.21.15.27:0/4192507940 -- auth_reply(proto 2 0 (0) Success) v1 -- 0x561eb311bd40 con 0x561eb0eb9a80
2020-01-10T07:21:42.464+0000 7f977575c700  1 -- [v2:172.21.15.168:3300/0,v1:172.21.15.168:6789/0] <== client.? v1:172.21.15.27:0/4192507940 2 ==== auth(proto 2 36 bytes epoch 0) v1 ==== 66+0+0 (unknown 2018028300 0 0) 0x561eb311bd40 con 0x561eb0eb9a80
2020-01-10T07:21:42.464+0000 7f977575c700 20 mon.b@1(peon) e1 _ms_dispatch existing session 0x561eb1054ac0 for client.?
2020-01-10T07:21:42.464+0000 7f977575c700 20 mon.b@1(peon) e1  entity client.admin caps
2020-01-10T07:21:42.464+0000 7f977575c700 10 mon.b@1(peon).paxosservice(auth 1..110) dispatch 0x561eb311bd40 auth(proto 2 36 bytes epoch 0) v1 from client.? v1:172.21.15.27:0/4192507940 con 0x561eb0eb9a80
2020-01-10T07:21:42.464+0000 7f977575c700  5 mon.b@1(peon).paxos(paxos active c 3013..3574) is_readable = 1 - now=2020-01-10T07:21:42.468649+0000 lease_expire=2020-01-10T07:21:47.214219+0000 has v0 lc 3574
2020-01-10T07:21:42.464+0000 7f977575c700 10 mon.b@1(peon).auth v110 preprocess_query auth(proto 2 36 bytes epoch 0) v1 from client.? v1:172.21.15.27:0/4192507940
2020-01-10T07:21:42.464+0000 7f977575c700 10 mon.b@1(peon).auth v110 prep_auth() blob_size=36
2020-01-10T07:21:42.464+0000 7f977575c700 10 cephx server client.admin: handle_request get_auth_session_key for client.admin
2020-01-10T07:21:42.464+0000 7f977575c700 20 cephx server client.admin:  checking key: req.key=c3a9158496aea5b5 expected_key=c3a9158496aea5b5
2020-01-10T07:21:42.464+0000 7f977575c700 10 cephx server client.admin: handle_request auth ticket global_id 794284
2020-01-10T07:21:42.464+0000 7f977575c700 10 cephx: build_service_ticket_reply encoding 1 tickets with secret AQCSExheOoCKDRAAfTqSvgpxnoX8nfiz6eO8aA==
2020-01-10T07:21:42.464+0000 7f977575c700 10 cephx: build_service_ticket service auth secret_id 2 ticket_info.ticket.name=client.admin ticket.global_id 794284
2020-01-10T07:21:42.464+0000 7f977575c700 10 cephx keyserverdata: get_caps: name=client.admin
2020-01-10T07:21:42.464+0000 7f977575c700 10 cephx keyserverdata: get_secret: num of caps=4
2020-01-10T07:21:42.464+0000 7f977475a700  1 -- [v2:172.21.15.168:3300/0,v1:172.21.15.168:6789/0] >> v1:172.21.15.27:0/4192507940 conn(0x561eb0eb9a80 legacy=0x561eb3197800 unknown :6789 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 37
2020-01-10T07:21:42.464+0000 7f977575c700 10 cephx server client.admin:  adding key for service auth
2020-01-10T07:21:42.464+0000 7f977575c700 10 cephx keyserverdata: get_caps: name=client.admin
2020-01-10T07:21:42.464+0000 7f977575c700 10 cephx keyserverdata: get_secret: num of caps=4
2020-01-10T07:21:42.464+0000 7f977475a700  1 -- [v2:172.21.15.168:3300/0,v1:172.21.15.168:6789/0] >> v1:172.21.15.27:0/4192507940 conn(0x561eb0eb9a80 legacy=0x561eb3197800 unknown :6789 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
2020-01-10T07:21:42.464+0000 7f977575c700 10 cephx: build_service_ticket_reply encoding 1 tickets with secret AQAGJhhe56XvGxAAybRxRpL8wgVWQLJF3g1f8A==
2020-01-10T07:21:42.464+0000 7f977575c700 10 cephx: build_service_ticket service auth secret_id 2 ticket_info.ticket.name=client.admin ticket.global_id 794284
2020-01-10T07:21:42.464+0000 7f977475a700  1 --1- [v2:172.21.15.168:3300/0,v1:172.21.15.168:6789/0] >> v1:172.21.15.27:0/4192507940 conn(0x561eb0eb9a80 0x561eb3197800 :6789 s=OPENED pgs=2 cs=1 l=1).handle_message read tag failed
2020-01-10T07:21:42.464+0000 7f977475a700  1 --1- [v2:172.21.15.168:3300/0,v1:172.21.15.168:6789/0] >> v1:172.21.15.27:0/4192507940 conn(0x561eb0eb9a80 0x561eb3197800 :6789 s=OPENED pgs=2 cs=1 l=1).fault on lossy channel, failing
2020-01-10T07:21:42.464+0000 7f977575c700 10 mon.b@1(peon) e1 ms_handle_authentication session 0x561eb1054ac0 con 0x561eb0eb9a80 addr v1:172.21.15.27:0/4192507940 MonSession(client.? v1:172.21.15.27:0/4192507940 is open , features 0x3f01cfb8ffadffff (luminous))
2020-01-10T07:21:42.464+0000 7f977575c700  2 mon.b@1(peon) e1 send_reply 0x561eb00e3390 0x561eb311bf80 auth_reply(proto 2 0 (0) Success) v1
2020-01-10T07:21:42.464+0000 7f977575c700  1 -- [v2:172.21.15.168:3300/0,v1:172.21.15.168:6789/0] --> v1:172.21.15.27:0/4192507940 -- auth_reply(proto 2 0 (0) Success) v1 -- 0x561eb311bf80 con 0x561eb0eb9a80
2020-01-10T07:21:42.464+0000 7f977575c700  1 -- [v2:172.21.15.168:3300/0,v1:172.21.15.168:6789/0] --> v1:172.21.15.27:0/4192507940 -- mon_map magic: 0 v1 -- 0x561eb0f71000 con 0x561eb0eb9a80
2020-01-10T07:21:42.464+0000 7f977575c700 10 mon.b@1(peon) e1 ms_handle_reset 0x561eb0eb9a80 v1:172.21.15.27:0/4192507940
2020-01-10T07:21:42.464+0000 7f977575c700 10 mon.b@1(peon) e1 reset/close on session client.? v1:172.21.15.27:0/4192507940
2020-01-10T07:21:42.464+0000 7f977575c700 10 mon.b@1(peon) e1 remove_session 0x561eb1054ac0 client.? v1:172.21.15.27:0/4192507940 features 0x3f01cfb8ffadffff

From: /ceph/teuthology-archive/pdonnell-2020-01-10_02:39:18-fs-wip-pdonnell-testing-20200109.233534-distro-basic-smithi/4651189/remote/smithi168/log/ceph-mon.b.log.gz

Actions #1

Updated by Patrick Donnelly over 4 years ago

  • Project changed from CephFS to RADOS
  • Component(RADOS) Monitor added
Actions #2

Updated by Patrick Donnelly almost 4 years ago

  • Target version changed from v15.0.0 to v16.0.0
  • Backport set to octopus

/ceph/teuthology-archive/yuriw-2020-06-25_22:31:00-fs-octopus-distro-basic-smithi/5180260/teuthology.log

Actions #3

Updated by Neha Ojha about 2 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF