Project

General

Profile

Actions

Bug #49143

closed

rados/upgrade/pacific-x/parallel: monclient(hunting): authenticate timed out after 300 after mon upgrade

Added by Neha Ojha about 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-02-02T13:28:28.669 INFO:teuthology.orchestra.run.smithi201.stdout:mon.a            smithi201  running (7m)   4s ago     7m   16.1.0-100-g74275226  docker.io/ceph/daemon-base:latest-pacific                                                          056ce08ef7cb  3420041c238a
2021-02-02T13:28:28.670 INFO:teuthology.orchestra.run.smithi201.stdout:mon.b            smithi107  running (5m)   4s ago     5m   16.1.0-100-g74275226  docker.io/ceph/daemon-base:latest-pacific                                                          056ce08ef7cb  83ce0448132d
2021-02-02T13:28:28.670 INFO:teuthology.orchestra.run.smithi201.stdout:mon.c            smithi201  running (6m)   4s ago     6m   16.1.0-100-g74275226  docker.io/ceph/daemon-base:latest-pacific  
.
.
2021-02-02T13:28:48.250 INFO:journalctl@ceph.mon.b.smithi107.stdout:Feb 02 13:28:47 smithi107 systemd[1]: Stopping Ceph mon.b for 63ea491c-6559-11eb-8fb9-001a4aab830c...
2021-02-02T13:28:48.683 INFO:journalctl@ceph.mon.b.smithi107.stdout:Feb 02 13:28:48 smithi107 podman[55365]: 83ce0448132dc84ae7853ce0e208c97fc85c9c326ccde69f84e93842bd00109b
2021-02-02T13:28:48.684 INFO:journalctl@ceph.mon.b.smithi107.stdout:Feb 02 13:28:48 smithi107 systemd[1]: Stopped Ceph mon.b for 63ea491c-6559-11eb-8fb9-001a4aab830c.
2021-02-02T13:28:49.249 INFO:journalctl@ceph.mon.b.smithi107.stdout:Feb 02 13:28:48 smithi107 systemd[1]: Starting Ceph mon.b for 63ea491c-6559-11eb-8fb9-001a4aab830c...
2021-02-02T13:28:49.250 INFO:journalctl@ceph.mon.b.smithi107.stdout:Feb 02 13:28:49 smithi107 bash[55490]: e4b549e1780b328267af827a44263f83b4228e71bb857005754ad77fdb62484b
2021-02-02T13:28:49.250 INFO:journalctl@ceph.mon.b.smithi107.stdout:Feb 02 13:28:49 smithi107 systemd[1]: Started Ceph mon.b for 63ea491c-6559-11eb-8fb9-001a4aab830c.
2021-02-02T13:28:51.419 INFO:journalctl@ceph.mon.a.smithi201.stdout:Feb 02 13:28:51 smithi201 podman[70485]: Error: no container with name or ID ceph-63ea491c-6559-11eb-8fb9-001a4aab830c-mon.a found: no such container
2021-02-02T13:28:51.419 INFO:journalctl@ceph.mon.c.smithi201.stdout:Feb 02 13:28:51 smithi201 podman[70442]: Error: no container with name or ID ceph-63ea491c-6559-11eb-8fb9-001a4aab830c-mon.c found: no such container
2021-02-02T13:33:59.105 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T13:33:59.102+0000 7f043d91e700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T13:38:59.105 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T13:38:59.103+0000 7f043d91e700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T13:43:59.106 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T13:43:59.104+0000 7f043d91e700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T13:48:59.107 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T13:48:59.104+0000 7f043d91e700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T13:53:59.108 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T13:53:59.105+0000 7f043d91e700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T13:58:59.109 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T13:58:59.105+0000 7f043d91e700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T14:03:59.112 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T14:03:59.106+0000 7f043d91e700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T14:08:59.112 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T14:08:59.107+0000 7f043d91e700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T14:13:59.111 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T14:13:59.107+0000 7f043d91e700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T14:18:59.111 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T14:18:59.107+0000 7f043d91e700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T14:18:59.112 INFO:teuthology.orchestra.run.smithi201.stderr:[errno 110] RADOS timed out (error connecting to the cluster)
2021-02-02T14:18:59.692 DEBUG:teuthology.orchestra.run.smithi201:> sudo /home/ubuntu/cephtest/cephadm --image docker.io/ceph/daemon-base:latest-pacific shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 63ea491c-6559-11eb-8fb9-001a4aab830c -e sha1=f6074e7976b2cfdff5312862fb91ee0065bc9d83 -- bash -c 'ceph orch ps'
2021-02-02T14:24:01.305 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T14:24:01.302+0000 7f3c58bd8700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T14:29:01.304 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T14:29:01.303+0000 7f3c58bd8700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T14:34:01.306 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T14:34:01.303+0000 7f3c58bd8700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T14:39:01.305 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T14:39:01.304+0000 7f3c58bd8700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T14:44:01.305 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T14:44:01.305+0000 7f3c58bd8700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T14:49:01.306 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T14:49:01.305+0000 7f3c58bd8700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T14:54:01.307 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T14:54:01.305+0000 7f3c58bd8700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T14:59:01.307 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T14:59:01.305+0000 7f3c58bd8700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T15:04:01.308 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T15:04:01.307+0000 7f3c58bd8700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T15:09:01.308 INFO:teuthology.orchestra.run.smithi201.stderr:2021-02-02T15:09:01.308+0000 7f3c58bd8700  0 monclient(hunting): authenticate timed out after 300
2021-02-02T15:09:01.309 INFO:teuthology.orchestra.run.smithi201.stderr:[errno 110] RADOS timed out (error connecting to the cluster)
2021-02-02T15:09:01.838 DEBUG:teuthology.orchestra.run:got remote process result: 1
2021-02-02T15:09:01.840 ERROR:teuthology.run_tasks:Saw exception from tasks.

/a/nojha-2021-02-01_21:31:14-rados-wip-39145-distro-basic-smithi/5847037

Actions #1

Updated by Neha Ojha about 3 years ago

  • Priority changed from Normal to High
2021-02-08T02:39:11.975 INFO:journalctl@ceph.mon.a.smithi033.stdout:Feb 08 02:39:11 smithi033 systemd[1]: Stopping Ceph mon.a for 96db86c8-69b5-11eb-8fde-001a4aab830c...
2021-02-08T02:39:12.716 INFO:journalctl@ceph.mon.a.smithi033.stdout:Feb 08 02:39:12 smithi033 podman[68436]: 84f73e131233fe3064877a7caa4ab8cc95e2f55e532c5635030288e19fe5be0b
2021-02-08T02:39:12.716 INFO:journalctl@ceph.mon.a.smithi033.stdout:Feb 08 02:39:12 smithi033 systemd[1]: ceph-96db86c8-69b5-11eb-8fde-001a4aab830c@mon.a.service: Succeeded.
2021-02-08T02:39:12.716 INFO:journalctl@ceph.mon.a.smithi033.stdout:Feb 08 02:39:12 smithi033 systemd[1]: Stopped Ceph mon.a for 96db86c8-69b5-11eb-8fde-001a4aab830c.
2021-02-08T02:39:13.047 INFO:journalctl@ceph.mon.a.smithi033.stdout:Feb 08 02:39:12 smithi033 systemd[1]: Starting Ceph mon.a for 96db86c8-69b5-11eb-8fde-001a4aab830c...
2021-02-08T02:39:13.547 INFO:journalctl@ceph.mon.a.smithi033.stdout:Feb 08 02:39:13 smithi033 bash[68563]: a8e11484e86e060c29eb5658354a312d84b8d9932e41f0d66d7cd07047de96c6
2021-02-08T02:39:13.547 INFO:journalctl@ceph.mon.a.smithi033.stdout:Feb 08 02:39:13 smithi033 systemd[1]: Started Ceph mon.a for 96db86c8-69b5-11eb-8fde-001a4aab830c.
2021-02-08T02:44:21.622 INFO:teuthology.orchestra.run.smithi033.stderr:2021-02-08T02:44:21.617+0000 7f0975c73700  0 monclient(hunting): authenticate timed out after 300
2021-02-08T02:49:21.621 INFO:teuthology.orchestra.run.smithi033.stderr:2021-02-08T02:49:21.617+0000 7f0975c73700  0 monclient(hunting): authenticate timed out after 300
2021-02-08T02:54:21.621 INFO:teuthology.orchestra.run.smithi033.stderr:2021-02-08T02:54:21.617+0000 7f0975c73700  0 monclient(hunting): authenticate timed out after 300
2021-02-08T02:59:21.621 INFO:teuthology.orchestra.run.smithi033.stderr:2021-02-08T02:59:21.618+0000 7f0975c73700  0 monclient(hunting): authenticate timed out after 300
2021-02-08T03:04:21.624 INFO:teuthology.orchestra.run.smithi033.stderr:2021-02-08T03:04:21.619+0000 7f0975c73700  0 monclient(hunting): authenticate timed out after 300
2021-02-08T03:09:21.622 INFO:teuthology.orchestra.run.smithi033.stderr:2021-02-08T03:09:21.620+0000 7f0975c73700  0 monclient(hunting): authenticate timed out after 300
2021-02-08T03:14:21.623 INFO:teuthology.orchestra.run.smithi033.stderr:2021-02-08T03:14:21.621+0000 7f0975c73700  0 monclient(hunting): authenticate timed out after 300
2021-02-08T03:19:21.623 INFO:teuthology.orchestra.run.smithi033.stderr:2021-02-08T03:19:21.621+0000 7f0975c73700  0 monclient(hunting): authenticate timed out after 300
2021-02-08T03:24:21.623 INFO:teuthology.orchestra.run.smithi033.stderr:2021-02-08T03:24:21.621+0000 7f0975c73700  0 monclient(hunting): authenticate timed out after 300
2021-02-08T03:29:21.624 INFO:teuthology.orchestra.run.smithi033.stderr:2021-02-08T03:29:21.622+0000 7f0975c73700  0 monclient(hunting): authenticate timed out after 300
2021-02-08T03:29:21.625 INFO:teuthology.orchestra.run.smithi033.stderr:[errno 110] RADOS timed out (error connecting to the cluster)
2021-02-08T03:29:22.325 DEBUG:teuthology.orchestra.run.smithi033:> sudo /home/ubuntu/cephtest/cephadm --image docker.io/ceph/daemon-base:latest-pacific shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 96db86c8-69b5-11eb-8fde-001a4aab830c -e sha1=4dd6b3b79bfa494927e50d8f05ec33bcbc31a5c5 -- bash -c 'ceph orch ps'
2021-02-08T03:34:24.458 INFO:teuthology.orchestra.run.smithi033.stderr:2021-02-08T03:34:24.454+0000 7f02b1034700  0 monclient(hunting): authenticate timed out after 300
2021-02-08T03:39:24.457 INFO:teuthology.orchestra.run.smithi033.stderr:2021-02-08T03:39:24.455+0000 7f02b1034700  0 monclient(hunting): authenticate timed out after 300
2021-02-08T03:40:08.165 DEBUG:teuthology.exit:Got signal 15; running 1 handler...

/a/kchai-2021-02-08_02:14:21-rados-wip-kefu2-testing-2021-02-08-0023-distro-basic-smithi/5865427
/a/kchai-2021-02-08_02:14:21-rados-wip-kefu2-testing-2021-02-08-0023-distro-basic-smithi/5865429

Actions #2

Updated by Neha Ojha about 3 years ago

  • Subject changed from rados/upgrade/pacific-x/parallel: no container with name mon.a and mon.c to rados/upgrade/pacific-x/parallel: monclient(hunting): authenticate timed out after 300 after mon upgrade

The problem seems to occur when the first mon is restarted after upgrade.

2021-02-09T23:22:42.871 DEBUG:teuthology.orchestra.run.smithi062:> sudo /home/ubuntu/cephtest/cephadm --image docker.io/ceph/daemon-base:latest-pacific shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid ba2881c6-6b2c-11eb-8fe8-001a4aab830c -e sha1=8bf6cf6ec50e9b5f8323a7750b68c287b546028c -- bash -c 'while ceph orch upgrade status | jq '"'"'.in_progress'"'"' | grep true ; do ceph orch ps ; ceph versions ; sleep 30 ; done'
2021-02-09T23:22:43.779 INFO:tasks.rados:starting run 0 out of 1
.
.
2021-02-09T23:24:04.930 INFO:journalctl@ceph.mgr.x.smithi066.stdout:Feb 09 23:24:04 smithi066 systemd[1]: Stopped Ceph mgr.x for ba2881c6-6b2c-11eb-8fe8-001a4aab830c.
2021-02-09T23:24:05.965 INFO:journalctl@ceph.mgr.x.smithi066.stdout:Feb 09 23:24:05 smithi066 systemd[1]: Started Ceph mgr.x for ba2881c6-6b2c-11eb-8fe8-001a4aab830c.
.
.
2021-02-09T23:24:31.428 INFO:journalctl@ceph.mgr.y.smithi062.stdout:Feb 09 23:24:31 smithi062 systemd[1]: Stopped Ceph mgr.y for ba2881c6-6b2c-11eb-8fe8-001a4aab830c.
2021-02-09T23:24:32.447 INFO:journalctl@ceph.mgr.y.smithi062.stdout:Feb 09 23:24:32 smithi062 systemd[1]: Started Ceph mgr.y for ba2881c6-6b2c-11eb-8fe8-001a4aab830c.
.
.
2021-02-09T23:24:40.334 INFO:journalctl@ceph.mon.a.smithi062.stdout:Feb 09 23:24:40 smithi062 systemd[1]: Stopped Ceph mon.a for ba2881c6-6b2c-11eb-8fe8-001a4aab830c.
2021-02-09T23:24:40.948 INFO:journalctl@ceph.mon.a.smithi062.stdout:Feb 09 23:24:40 smithi062 systemd[1]: Started Ceph mon.a for ba2881c6-6b2c-11eb-8fe8-001a4aab830c.
.
.
2021-02-09T23:29:50.851 INFO:teuthology.orchestra.run.smithi062.stderr:2021-02-09T23:29:50.846+0000 7fec34997700  0 monclient(hunting): authenticate timed out after 300
.
.
2021-02-10T01:04:53.094 INFO:teuthology.orchestra.run.smithi062.stderr:2021-02-10T01:04:53.092+0000 7f4536b19700  0 monclient(hunting): authenticate timed out after 300
2021-02-10T01:04:53.095 INFO:teuthology.orchestra.run.smithi062.stderr:[errno 110] RADOS timed out (error connecting to the cluster)
2021-02-10T01:04:53.671 DEBUG:teuthology.orchestra.run:got remote process result: 1
2021-02-10T01:04:53.672 ERROR:teuthology.run_tasks:Saw exception from tasks.

/a/yuriw-2021-02-09_22:48:58-rados-wip-yuri8-testing-2021-02-08-0950-distro-basic-smithi/5872138

Actions #3

Updated by Neha Ojha about 3 years ago

  • Priority changed from High to Urgent

/a/nojha-2021-02-16_17:14:48-rados-master-distro-basic-smithi/5887669

Actions #4

Updated by Sage Weil about 3 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 39582
Actions #5

Updated by Sebastian Wagner about 3 years ago

  • Related to Bug #49190: LibRadosMiscConnectFailure_ConnectFailure_Test: FAILED ceph_assert(p != obs_call_gate.end()) added
Actions #6

Updated by Sebastian Wagner about 3 years ago

  • Related to deleted (Bug #49190: LibRadosMiscConnectFailure_ConnectFailure_Test: FAILED ceph_assert(p != obs_call_gate.end()))
Actions #7

Updated by Sage Weil about 3 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF