Bug #48469
closedupgrade:octopus-x-master: "Failed to start Ceph mon.smithi116": Failed with result 'start-limit-hit'.
0%
Description
Run: https://pulpito.ceph.com/teuthology-2020-12-04_19:57:53-upgrade:octopus-x-master-distro-basic-smithi/
Job:5681009, 5681010
Logs:/a/teuthology-2020-12-04_19:57:53-upgrade:octopus-x-master-distro-basic-smithi/5681010/teuthology.log
2020-12-04T20:53:16.470 INFO:teuthology.orchestra.run.smithi116:> sudo systemctl start ceph-a0fd247a-366d-11eb-980d-001a4aab830c@mon.smithi116 2020-12-04T20:53:16.485 INFO:journalctl@ceph.mon.smithi116.smithi116.stdout:-- Logs begin at Sun 2020-11-29 12:19:31 UTC. -- 2020-12-04T20:53:16.504 INFO:teuthology.orchestra.run.smithi116.stderr:Job for ceph-a0fd247a-366d-11eb-980d-001a4aab830c@mon.smithi116.service failed. 2020-12-04T20:53:16.505 INFO:teuthology.orchestra.run.smithi116.stderr:See "systemctl status ceph-a0fd247a-366d-11eb-980d-001a4aab830c@mon.smithi116.service" and "journalctl -xe" for details. 2020-12-04T20:53:16.506 DEBUG:teuthology.orchestra.run:got remote process result: 1 2020-12-04T20:53:16.506 ERROR:tasks.mon_thrash.mon_thrasher:exception: Traceback (most recent call last): File "/home/teuthworker/src/github.com_yuriw_ceph_wip-yuriw-octopus-x-parallel-master/qa/tasks/mon_thrash.py", line 232, in do_thrash self._do_thrash() File "/home/teuthworker/src/github.com_yuriw_ceph_wip-yuriw-octopus-x-parallel-master/qa/tasks/mon_thrash.py", line 312, in _do_thrash self.revive_mon(mon) File "/home/teuthworker/src/github.com_yuriw_ceph_wip-yuriw-octopus-x-parallel-master/qa/tasks/mon_thrash.py", line 215, in revive_mon self.manager.revive_mon(mon) File "/home/teuthworker/src/github.com_yuriw_ceph_wip-yuriw-octopus-x-parallel-master/qa/tasks/ceph_manager.py", line 2834, in revive_mon self.ctx.daemons.get_daemon('mon', mon, self.cluster).restart() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/daemon/cephadmunit.py", line 95, in restart self.remote.sh(self.start_cmd) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 259, in sh proc = self.run(**kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 215, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 446, in run r.wait() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 160, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 182, in _raise_for_status node=self.hostname, label=self.label teuthology.exceptions.CommandFailedError: Command failed on smithi116 with status 1: 'sudo systemctl start ceph-a0fd247a-366d-11eb-980d-001a4aab830c@mon.smithi116' 2020-12-04T20:53:16.710 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Dec 04 20:53:16 smithi135 bash[24913]: audit 2020-12-04T20:53:15.430282+0000 mon.smithi151 (mon.0) 144 : audit [INF] from='client.154209 172.21.15.151:0/3316206744' entity='client.admin' cmd='[{"prefix": "osd tier set-overlay", "pool": "test-rados-api-smithi151-47528-76", "overlaypool": "test-rados-api-smithi151-47528-87"}]': finished 2020-12-04T20:53:16.711 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Dec 04 20:53:16 smithi135 bash[24913]: cluster 2020-12-04T20:53:15.430347+0000 mon.smithi151 (mon.0) 145 : cluster [DBG] osdmap e1154: 8 total, 8 up, 8 in 2020-12-04T20:53:16.711 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Dec 04 20:53:16 smithi135 bash[24913]: audit 2020-12-04T20:53:15.431189+0000 mon.smithi151 (mon.0) 146 : audit [INF] from='client.154209 172.21.15.151:0/3316206744' entity='client.admin' cmd=[{"prefix": "osd tier cache-mode", "pool": "test-rados-api-smithi151-47528-87", "mode": "writeback"}]: dispatch 2020-12-04T20:53:16.711 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Dec 04 20:53:16 smithi135 bash[24913]: cluster 2020-12-04T20:53:16.427548+0000 mon.smithi151 (mon.0) 147 : cluster [WRN] Health check failed: 1 cache pools are missing hit_sets (CACHE_POOL_NO_HIT_SET) 2020-12-04T20:53:16.751 INFO:journalctl@ceph.mon.smithi151.smithi151.stdout:Dec 04 20:53:16 smithi151 bash[122979]: audit 2020-12-04T20:53:15.430282+0000 mon.smithi151 (mon.0) 144 : audit [INF] from='client.154209 172.21.15.151:0/3316206744' entity='client.admin' cmd='[{"prefix": "osd tier set-overlay", "pool": "test-rados-api-smithi151-47528-76", "overlaypool": "test-rados-api-smithi151-47528-87"}]': finished 2020-12-04T20:53:16.751 INFO:journalctl@ceph.mon.smithi151.smithi151.stdout:Dec 04 20:53:16 smithi151 bash[122979]: cluster 2020-12-04T20:53:15.430347+0000 mon.smithi151 (mon.0) 145 : cluster [DBG] osdmap e1154: 8 total, 8 up, 8 in 2020-12-04T20:53:16.752 INFO:journalctl@ceph.mon.smithi151.smithi151.stdout:Dec 04 20:53:16 smithi151 bash[122979]: audit 2020-12-04T20:53:15.431189+0000 mon.smithi151 (mon.0) 146 : audit [INF] from='client.154209 172.21.15.151:0/3316206744' entity='client.admin' cmd=[{"prefix": "osd tier cache-mode", "pool": "test-rados-api-smithi151-47528-87", "mode": "writeback"}]: dispatch 2020-12-04T20:53:16.752 INFO:journalctl@ceph.mon.smithi151.smithi151.stdout:Dec 04 20:53:16 smithi151 bash[122979]: cluster 2020-12-04T20:53:16.427548+0000 mon.smithi151 (mon.0) 147 : cluster [WRN] Health check failed: 1 cache pools are missing hit_sets (CACHE_POOL_NO_HIT_SET) 2020-12-04T20:53:16.918 INFO:journalctl@ceph.mon.smithi116.smithi116.stdout:Dec 04 20:53:16 smithi116 systemd[1]: ceph-a0fd247a-366d-11eb-980d-001a4aab830c@mon.smithi116.service: Start request repeated too quickly. 2020-12-04T20:53:16.918 INFO:journalctl@ceph.mon.smithi116.smithi116.stdout:Dec 04 20:53:16 smithi116 systemd[1]: ceph-a0fd247a-366d-11eb-980d-001a4aab830c@mon.smithi116.service: Failed with result 'start-limit-hit'. 2020-12-04T20:53:16.919 INFO:journalctl@ceph.mon.smithi116.smithi116.stdout:Dec 04 20:53:16 smithi116 systemd[1]: Failed to start Ceph mon.smithi116 for a0fd247a-366d-11eb-980d-001a4aab830c.
Updated by Yuri Weinstein over 3 years ago
tests are here https://github.com/ceph/ceph/pull/36592
Updated by Sebastian Wagner over 3 years ago
- Subject changed from "Failed to start Ceph mon.smithi116 for a0fd247a-366d-11eb-980d-001a4aab830c." in upgrade:octopus-x-master to upgrade:octopus-x-master: "Failed to start Ceph mon.smithi116": Failed with result 'start-limit-hit'.
mon thrasher:
2020-12-04T20:47:32.450 INFO:journalctl@ceph.mon.smithi183.smithi183.stdout:Dec 04 20:47:32 smithi183 systemd[1]: ceph-75a14fc2-366d-11eb-980d-001a4aab830c@mon.smithi183.service: Start request repeated too quickly. 2020-12-04T20:47:32.450 INFO:journalctl@ceph.mon.smithi183.smithi183.stdout:Dec 04 20:47:32 smithi183 systemd[1]: ceph-75a14fc2-366d-11eb-980d-001a4aab830c@mon.smithi183.service: Failed with result 'start-limit-hit'. 2020-12-04T20:47:32.451 INFO:journalctl@ceph.mon.smithi183.smithi183.stdout:Dec 04 20:47:32 smithi183 systemd[1]: Failed to start Ceph mon.smithi183 for 75a14fc2-366d-11eb-980d-001a4aab830c.
Updated by Sebastian Wagner over 3 years ago
maybe try calling
systemctl reset-failed 'ceph-%s@%s.%s' % (self.fsid, self.type_, self.id_),
here?
Updated by Yuri Weinstein over 3 years ago
Sebastian Wagner wrote:
maybe try calling
[...]
here?
From IRC
[13:38:11] <neha> dmick: yuriw: In both the failures, there are 3 mons in the monmap but the cephadm task expects 2 mons. I think that num_mons isn't getting incremented correctly, may have something to do with roleless. -> my last finding
This is a run on proposed changes on Dan's wip
http://pulpito.ceph.com/teuthology-2020-12-11_17:08:39-upgrade:octopus-x-master-distro-basic-smithi/
(PR details here
https://github.com/ceph/ceph/pull/36592#issuecomment-742756476)
Updated by Sebastian Wagner over 3 years ago
- Related to Bug #48754: "failed xx 'sudo systemctl start ceph-None@rgw.client.1'" in upgrade:octopus-x-master added
Updated by Sebastian Wagner about 3 years ago
- Status changed from New to Can't reproduce
lots of upgrade fixes in the meantime