Bug #64865
closedcephadm: Health check failed: 1 osds down (OSD_DOWN) in cluster log
0%
Description
The following tests in the cephadm suite failed with the warning:
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587301
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587410
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587630
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587912
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587938
The tests above bring down OSD and therefore the OSD_DOWN events are expected. The logs below
from 7587301 shows that the OSD_DOWN warning is raised temporarily and eventually cleared:
2024-03-09T17:13:56.698 INFO:teuthology.orchestra.run.smithi012.stderr:+ ceph orch osd rm status 2024-03-09T17:13:56.699 INFO:teuthology.orchestra.run.smithi012.stderr:+ grep '^1' ... 2024-03-09T17:13:57.039 INFO:teuthology.orchestra.run.smithi012.stdout:1 smithi012 done, waiting for purge 0 False False False 2024-03-09T17:13:57.040 INFO:teuthology.orchestra.run.smithi012.stderr:+ sleep 5 ... 2024-03-09T17:13:58.008 INFO:journalctl@ceph.mon.smithi012.smithi012.stdout:Mar 09 17:13:57 smithi012 ceph-mon[25263]: Health check failed: 1 osds down (OSD_DOWN) 2024-03-09T17:13:58.009 INFO:journalctl@ceph.mon.smithi012.smithi012.stdout:Mar 09 17:13:57 smithi012 ceph-mon[25263]: from='mgr.14215 172.21.15.12:0/2551335845' entity='mgr.smithi012.zfjpsz' cmd='[{"prefix": "osd down", "ids": ["1"]}]': finished 2024-03-09T17:13:58.009 INFO:journalctl@ceph.mon.smithi012.smithi012.stdout:Mar 09 17:13:57 smithi012 ceph-mon[25263]: osdmap e45: 8 total, 7 up, 8 in 2024-03-09T17:13:58.009 INFO:journalctl@ceph.mon.smithi012.smithi012.stdout:Mar 09 17:13:57 smithi012 ceph-mon[25263]: osd.1 now down 2024-03-09T17:13:58.009 INFO:journalctl@ceph.mon.smithi012.smithi012.stdout:Mar 09 17:13:57 smithi012 ceph-mon[25263]: Removing daemon osd.1 from smithi012 -- ports [] ... 2024-03-09T17:14:02.235 INFO:journalctl@ceph.mon.smithi012.smithi012.stdout:Mar 09 17:14:01 smithi012 ceph-mon[25263]: Removing key for osd.1 2024-03-09T17:14:02.235 INFO:journalctl@ceph.mon.smithi012.smithi012.stdout:Mar 09 17:14:01 smithi012 ceph-mon[25263]: from='mgr.14215 172.21.15.12:0/2551335845' entity='mgr.smithi012.zfjpsz' cmd=[{"prefix": "auth rm", "entity": "osd.1"}]: dispatch 2024-03-09T17:14:02.235 INFO:journalctl@ceph.mon.smithi012.smithi012.stdout:Mar 09 17:14:01 smithi012 ceph-mon[25263]: from='mgr.14215 172.21.15.12:0/2551335845' entity='mgr.smithi012.zfjpsz' cmd='[{"prefix": "auth rm", "entity": "osd.1"}]': finished 2024-03-09T17:14:02.236 INFO:journalctl@ceph.mon.smithi012.smithi012.stdout:Mar 09 17:14:01 smithi012 ceph-mon[25263]: Successfully removed osd.1 on smithi012 2024-03-09T17:14:02.236 INFO:journalctl@ceph.mon.smithi012.smithi012.stdout:Mar 09 17:14:01 smithi012 ceph-mon[25263]: from='mgr.14215 172.21.15.12:0/2551335845' entity='mgr.smithi012.zfjpsz' cmd=[{"prefix": "osd purge-actual", "id": 1, "yes_i_really_mean_it": true}]: dispatch 2024-03-09T17:14:02.236 INFO:journalctl@ceph.mon.smithi012.smithi012.stdout:Mar 09 17:14:01 smithi012 ceph-mon[25263]: Health check cleared: OSD_DOWN (was: 1 osds down) 2024-03-09T17:14:02.236 INFO:journalctl@ceph.mon.smithi012.smithi012.stdout:Mar 09 17:14:01 smithi012 ceph-mon[25263]: Cluster is now healthy
Therefore, the tests should add the warning to the ignorelist.
Updated by Sridhar Seshasayee about 2 months ago
- Translation missing: en.field_tag_list set to test-failure
- Tags deleted (
test-failure)
Updated by Aishwarya Mathuria about 1 month ago
/a/yuriw-2024-03-19_00:09:45-rados-wip-yuri5-testing-2024-03-18-1144-distro-default-smithi/7609832
Updated by Nitzan Mordechai about 1 month ago
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7620805
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7620914
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7620938
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7621014
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7621050
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7621076
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7621103
Updated by Nitzan Mordechai about 1 month ago
- Status changed from New to Fix Under Review
- Pull request ID set to 56613
Updated by Backport Bot 24 days ago
- Copied to Backport #65414: squid: cephadm: Health check failed: 1 osds down (OSD_DOWN) in cluster log added