Project

General

Profile

Actions

Bug #48469

closed

upgrade:octopus-x-master: "Failed to start Ceph mon.smithi116": Failed with result 'start-limit-hit'.

Added by Yuri Weinstein over 3 years ago. Updated about 3 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run: https://pulpito.ceph.com/teuthology-2020-12-04_19:57:53-upgrade:octopus-x-master-distro-basic-smithi/
Job:5681009, 5681010
Logs:/a/teuthology-2020-12-04_19:57:53-upgrade:octopus-x-master-distro-basic-smithi/5681010/teuthology.log

2020-12-04T20:53:16.470 INFO:teuthology.orchestra.run.smithi116:> sudo systemctl start ceph-a0fd247a-366d-11eb-980d-001a4aab830c@mon.smithi116
2020-12-04T20:53:16.485 INFO:journalctl@ceph.mon.smithi116.smithi116.stdout:-- Logs begin at Sun 2020-11-29 12:19:31 UTC. --
2020-12-04T20:53:16.504 INFO:teuthology.orchestra.run.smithi116.stderr:Job for ceph-a0fd247a-366d-11eb-980d-001a4aab830c@mon.smithi116.service failed.
2020-12-04T20:53:16.505 INFO:teuthology.orchestra.run.smithi116.stderr:See "systemctl status ceph-a0fd247a-366d-11eb-980d-001a4aab830c@mon.smithi116.service" and "journalctl -xe" for details.
2020-12-04T20:53:16.506 DEBUG:teuthology.orchestra.run:got remote process result: 1
2020-12-04T20:53:16.506 ERROR:tasks.mon_thrash.mon_thrasher:exception:
Traceback (most recent call last):
  File "/home/teuthworker/src/github.com_yuriw_ceph_wip-yuriw-octopus-x-parallel-master/qa/tasks/mon_thrash.py", line 232, in do_thrash
    self._do_thrash()
  File "/home/teuthworker/src/github.com_yuriw_ceph_wip-yuriw-octopus-x-parallel-master/qa/tasks/mon_thrash.py", line 312, in _do_thrash
    self.revive_mon(mon)
  File "/home/teuthworker/src/github.com_yuriw_ceph_wip-yuriw-octopus-x-parallel-master/qa/tasks/mon_thrash.py", line 215, in revive_mon
    self.manager.revive_mon(mon)
  File "/home/teuthworker/src/github.com_yuriw_ceph_wip-yuriw-octopus-x-parallel-master/qa/tasks/ceph_manager.py", line 2834, in revive_mon
    self.ctx.daemons.get_daemon('mon', mon, self.cluster).restart()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/daemon/cephadmunit.py", line 95, in restart
    self.remote.sh(self.start_cmd)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 259, in sh
    proc = self.run(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 215, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 446, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 160, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 182, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi116 with status 1: 'sudo systemctl start ceph-a0fd247a-366d-11eb-980d-001a4aab830c@mon.smithi116'
2020-12-04T20:53:16.710 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Dec 04 20:53:16 smithi135 bash[24913]: audit 2020-12-04T20:53:15.430282+0000 mon.smithi151 (mon.0) 144 : audit [INF] from='client.154209 172.21.15.151:0/3316206744' entity='client.admin' cmd='[{"prefix": "osd tier set-overlay", "pool": "test-rados-api-smithi151-47528-76", "overlaypool": "test-rados-api-smithi151-47528-87"}]': finished
2020-12-04T20:53:16.711 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Dec 04 20:53:16 smithi135 bash[24913]: cluster 2020-12-04T20:53:15.430347+0000 mon.smithi151 (mon.0) 145 : cluster [DBG] osdmap e1154: 8 total, 8 up, 8 in
2020-12-04T20:53:16.711 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Dec 04 20:53:16 smithi135 bash[24913]: audit 2020-12-04T20:53:15.431189+0000 mon.smithi151 (mon.0) 146 : audit [INF] from='client.154209 172.21.15.151:0/3316206744' entity='client.admin' cmd=[{"prefix": "osd tier cache-mode", "pool": "test-rados-api-smithi151-47528-87", "mode": "writeback"}]: dispatch
2020-12-04T20:53:16.711 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Dec 04 20:53:16 smithi135 bash[24913]: cluster 2020-12-04T20:53:16.427548+0000 mon.smithi151 (mon.0) 147 : cluster [WRN] Health check failed: 1 cache pools are missing hit_sets (CACHE_POOL_NO_HIT_SET)
2020-12-04T20:53:16.751 INFO:journalctl@ceph.mon.smithi151.smithi151.stdout:Dec 04 20:53:16 smithi151 bash[122979]: audit 2020-12-04T20:53:15.430282+0000 mon.smithi151 (mon.0) 144 : audit [INF] from='client.154209 172.21.15.151:0/3316206744' entity='client.admin' cmd='[{"prefix": "osd tier set-overlay", "pool": "test-rados-api-smithi151-47528-76", "overlaypool": "test-rados-api-smithi151-47528-87"}]': finished
2020-12-04T20:53:16.751 INFO:journalctl@ceph.mon.smithi151.smithi151.stdout:Dec 04 20:53:16 smithi151 bash[122979]: cluster 2020-12-04T20:53:15.430347+0000 mon.smithi151 (mon.0) 145 : cluster [DBG] osdmap e1154: 8 total, 8 up, 8 in
2020-12-04T20:53:16.752 INFO:journalctl@ceph.mon.smithi151.smithi151.stdout:Dec 04 20:53:16 smithi151 bash[122979]: audit 2020-12-04T20:53:15.431189+0000 mon.smithi151 (mon.0) 146 : audit [INF] from='client.154209 172.21.15.151:0/3316206744' entity='client.admin' cmd=[{"prefix": "osd tier cache-mode", "pool": "test-rados-api-smithi151-47528-87", "mode": "writeback"}]: dispatch
2020-12-04T20:53:16.752 INFO:journalctl@ceph.mon.smithi151.smithi151.stdout:Dec 04 20:53:16 smithi151 bash[122979]: cluster 2020-12-04T20:53:16.427548+0000 mon.smithi151 (mon.0) 147 : cluster [WRN] Health check failed: 1 cache pools are missing hit_sets (CACHE_POOL_NO_HIT_SET)
2020-12-04T20:53:16.918 INFO:journalctl@ceph.mon.smithi116.smithi116.stdout:Dec 04 20:53:16 smithi116 systemd[1]: ceph-a0fd247a-366d-11eb-980d-001a4aab830c@mon.smithi116.service: Start request repeated too quickly.
2020-12-04T20:53:16.918 INFO:journalctl@ceph.mon.smithi116.smithi116.stdout:Dec 04 20:53:16 smithi116 systemd[1]: ceph-a0fd247a-366d-11eb-980d-001a4aab830c@mon.smithi116.service: Failed with result 'start-limit-hit'.
2020-12-04T20:53:16.919 INFO:journalctl@ceph.mon.smithi116.smithi116.stdout:Dec 04 20:53:16 smithi116 systemd[1]: Failed to start Ceph mon.smithi116 for a0fd247a-366d-11eb-980d-001a4aab830c.

Related issues 1 (0 open1 closed)

Related to Orchestrator - Bug #48754: "failed xx 'sudo systemctl start ceph-None@rgw.client.1'" in upgrade:octopus-x-masterResolvedSage Weil

Actions
Actions #1

Updated by Neha Ojha over 3 years ago

  • Project changed from Ceph to Orchestrator
Actions #3

Updated by Sebastian Wagner over 3 years ago

  • Subject changed from "Failed to start Ceph mon.smithi116 for a0fd247a-366d-11eb-980d-001a4aab830c." in upgrade:octopus-x-master to upgrade:octopus-x-master: "Failed to start Ceph mon.smithi116": Failed with result 'start-limit-hit'.

mon thrasher:

2020-12-04T20:47:32.450 INFO:journalctl@ceph.mon.smithi183.smithi183.stdout:Dec 04 20:47:32 smithi183 systemd[1]: ceph-75a14fc2-366d-11eb-980d-001a4aab830c@mon.smithi183.service: Start request repeated too quickly.
2020-12-04T20:47:32.450 INFO:journalctl@ceph.mon.smithi183.smithi183.stdout:Dec 04 20:47:32 smithi183 systemd[1]: ceph-75a14fc2-366d-11eb-980d-001a4aab830c@mon.smithi183.service: Failed with result 'start-limit-hit'.
2020-12-04T20:47:32.451 INFO:journalctl@ceph.mon.smithi183.smithi183.stdout:Dec 04 20:47:32 smithi183 systemd[1]: Failed to start Ceph mon.smithi183 for 75a14fc2-366d-11eb-980d-001a4aab830c.

Actions #4

Updated by Sebastian Wagner over 3 years ago

maybe try calling

systemctl reset-failed 'ceph-%s@%s.%s' % (self.fsid, self.type_, self.id_),

here?

https://github.com/ceph/teuthology/blob/d7dfe66e0fb7fba26185a6d12d288a49a4423b86/teuthology/orchestra/daemon/cephadmunit.py#L92

Actions #5

Updated by Yuri Weinstein over 3 years ago

Sebastian Wagner wrote:

maybe try calling

[...]

here?

https://github.com/ceph/teuthology/blob/d7dfe66e0fb7fba26185a6d12d288a49a4423b86/teuthology/orchestra/daemon/cephadmunit.py#L92

@Sebastian I.

From IRC

[13:38:11] <neha> dmick: yuriw: In both the failures, there are 3 mons in the monmap but the cephadm task expects 2 mons. I think that num_mons isn't getting incremented correctly, may have something to do with roleless. -> my last finding

This is a run on proposed changes on Dan's wip
http://pulpito.ceph.com/teuthology-2020-12-11_17:08:39-upgrade:octopus-x-master-distro-basic-smithi/

(PR details here
https://github.com/ceph/ceph/pull/36592#issuecomment-742756476)

Actions #6

Updated by Sebastian Wagner over 3 years ago

  • Related to Bug #48754: "failed xx 'sudo systemctl start ceph-None@rgw.client.1'" in upgrade:octopus-x-master added
Actions #7

Updated by Sebastian Wagner about 3 years ago

  • Status changed from New to Can't reproduce

lots of upgrade fixes in the meantime

Actions

Also available in: Atom PDF