Bug #65627
openCentos 9 stream ceph container iscsi test failure
0%
Description
Missing k8sevents module¶
While waiting for the mgr to start, we get this traceback message:
teuthology.log
2024-04-21T15:40:34.547 INFO:teuthology.orchestra.run.smithi017.stdout:mgr not available, waiting (1/15)... 2024-04-21T15:40:34.853 INFO:journalctl@ceph.mgr.x.smithi017.stdout:Apr 21 22:40:34 smithi017 ceph-fc21275e-002f-11ef-9c9f-ac1f6bc659f6-mgr-x[23720]: 2024-04-21T22:40:34.429+0000 7f6f44397040 -1 mgr[py] Module not found: 'k8sevents' 2024-04-21T15:40:34.853 INFO:journalctl@ceph.mgr.x.smithi017.stdout:Apr 21 22:40:34 smithi017 ceph-fc21275e-002f-11ef-9c9f-ac1f6bc659f6-mgr-x[23720]: 2024-04-21T22:40:34.429+0000 7f6f44397040 -1 mgr[py] Traceback (most recent call last): 2024-04-21T15:40:34.853 INFO:journalctl@ceph.mgr.x.smithi017.stdout:Apr 21 22:40:34 smithi017 ceph-fc21275e-002f-11ef-9c9f-ac1f6bc659f6-mgr-x[23720]: File "/usr/share/ceph/mgr/k8sevents/__init__.py", line 1, in <module> 2024-04-21T15:40:34.853 INFO:journalctl@ceph.mgr.x.smithi017.stdout:Apr 21 22:40:34 smithi017 ceph-fc21275e-002f-11ef-9c9f-ac1f6bc659f6-mgr-x[23720]: from .module import Module 2024-04-21T15:40:34.854 INFO:journalctl@ceph.mgr.x.smithi017.stdout:Apr 21 22:40:34 smithi017 ceph-fc21275e-002f-11ef-9c9f-ac1f6bc659f6-mgr-x[23720]: File "/usr/share/ceph/mgr/k8sevents/module.py", line 70, in <module> 2024-04-21T15:40:34.854 INFO:journalctl@ceph.mgr.x.smithi017.stdout:Apr 21 22:40:34 smithi017 ceph-fc21275e-002f-11ef-9c9f-ac1f6bc659f6-mgr-x[23720]: from kubernetes.client.models.v1_event import V1Event 2024-04-21T15:40:34.854 INFO:journalctl@ceph.mgr.x.smithi017.stdout:Apr 21 22:40:34 smithi017 ceph-fc21275e-002f-11ef-9c9f-ac1f6bc659f6-mgr-x[23720]: ModuleNotFoundError: No module named 'kubernetes.client.models.v1_event'
ceph-mgr.x.log:
2024-04-21T22:40:34.094+0000 7f6f44397040 4 mgr[py] Module influx reported that it cannot run: influxdb python module not found 2024-04-21T22:40:34.094+0000 7f6f44397040 1 mgr[py] Loading python module 'k8sevents' 2024-04-21T22:40:34.111+0000 7f6f44397040 10 mgr[py] Computed sys.path '/usr/share/ceph/mgr:/usr/local/lib64/python3.9/site-packages:/usr/local/lib/python3.9/site-packages:/lib64/python3.9/site-packages:/lib/python3.9/site-packages:lib64/python39.zip:/lib64/python3.9:/lib64/python3.9/lib-dynload' 2024-04-21T22:40:34.429+0000 7f6f44397040 -1 mgr[py] Module not found: 'k8sevents' 2024-04-21T22:40:34.429+0000 7f6f44397040 -1 mgr[py] Traceback (most recent call last): File "/usr/share/ceph/mgr/k8sevents/__init__.py", line 1, in <module> from .module import Module File "/usr/share/ceph/mgr/k8sevents/module.py", line 70, in <module> from kubernetes.client.models.v1_event import V1Event ModuleNotFoundError: No module named 'kubernetes.client.models.v1_event'
There is also a Traceback in the mon a bit after this time, which indicates a connection issue with mgr.x:
ceph-mon.a.log
2024-04-21T22:40:56.696+0000 7febf911f640 10 mon.a@0(leader).log v24 logging 2024-04-21T22:40:56.057598+0000 mgr.x (mgr.14118) 13 : cephadm [INF] [21/Apr/2024:22:40:56] ENGINE Serving on https://172.21.15.17:7150 2024-04-21T22:40:56.696+0000 7febf911f640 10 mon.a@0(leader).log v24 logging 2024-04-21T22:40:56.057736+0000 mgr.x (mgr.14118) 14 : cephadm [INF] [21/Apr/2024:22:40:56] ENGINE Bus STARTED 2024-04-21T22:40:56.696+0000 7febf911f640 10 mon.a@0(leader).log v24 logging 2024-04-21T22:40:56.059970+0000 mgr.x (mgr.14118) 15 : cephadm [ERR] [21/Apr/2024:22:40:56] ENGINE Error in HTTPServer.serve Traceback (most recent call last): File "/lib/python3.9/site-packages/cheroot/server.py", line 1823, in serve self._connections.run(self.expiration_interval) File "/lib/python3.9/site-packages/cheroot/connections.py", line 203, in run self._run(expiration_interval) File "/lib/python3.9/site-packages/cheroot/connections.py", line 246, in _run new_conn = self._from_server_socket(self.server.socket) File "/lib/python3.9/site-packages/cheroot/connections.py", line 300, in _from_server_socket s, ssl_env = self.server.ssl_adapter.wrap(s) File "/lib/python3.9/site-packages/cheroot/ssl/builtin.py", line 277, in wrap s = self.context.wrap_socket( File "/lib64/python3.9/ssl.py", line 501, in wrap_socket return self.sslsocket_class._create( File "/lib64/python3.9/ssl.py", line 1074, in _create self.do_handshake() File "/lib64/python3.9/ssl.py", line 1343, in do_handshake self._sslobj.do_handshake() ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) (_ssl.c:1129)
The mgr log keeps going for awhile after this Traceback, so this error does not seem to kill the manager. The teuthology test ends when the rest_api_delete.t test is run and passes. This must be the last workload because teardown begins, and all daemons are stopped.
2024-04-21T15:54:46.234 INFO:tasks.cram.client.2.smithi159.stdout:/home/ubuntu/cephtest/archive/cram.client.2/rest_api_delete.t: passed 2024-04-21T15:54:46.234 INFO:tasks.cram.client.2.smithi159.stdout:# Ran 1 tests, 0 skipped, 0 failed. 2024-04-21T15:54:46.353 INFO:journalctl@ceph.iscsi.iscsi.a.smithi017.stdout:Apr 21 22:54:45 smithi017 ceph-fc21275e-002f-11ef-9c9f-ac1f6bc659f6-iscsi-iscsi-a[47907]: debug (Gateway.load_config) successfully loaded existing target definition 2024-04-21T15:54:46.353 INFO:journalctl@ceph.iscsi.iscsi.a.smithi017.stdout:Apr 21 22:54:45 smithi017 ceph-fc21275e-002f-11ef-9c9f-ac1f6bc659f6-iscsi-iscsi-a[47907]: debug Removing target configuration 2024-04-21T15:54:46.353 INFO:journalctl@ceph.iscsi.iscsi.a.smithi017.stdout:Apr 21 22:54:46 smithi017 ceph-fc21275e-002f-11ef-9c9f-ac1f6bc659f6-iscsi-iscsi-a[47907]: debug ::ffff:172.21.15.159 - - [21/Apr/2024 22:54:46] "DELETE /api/_target/iqn.2003-01.com.redhat.iscsi-gw:ceph-gw HTTP/1.1" 200 - 2024-04-21T15:54:46.353 INFO:journalctl@ceph.iscsi.iscsi.a.smithi017.stdout:Apr 21 22:54:46 smithi017 ceph-fc21275e-002f-11ef-9c9f-ac1f6bc659f6-iscsi-iscsi-a[47907]: ::ffff:172.21.15.159 - - [21/Apr/2024 22:54:46] "DELETE /api/_target/iqn.2003-01.com.redhat.iscsi-gw:ceph-gw HTTP/1.1" 200 - 2024-04-21T15:54:46.660 INFO:journalctl@ceph.iscsi.iscsi.b.smithi159.stdout:Apr 21 22:54:46 smithi159 ceph-fc21275e-002f-11ef-9c9f-ac1f6bc659f6-iscsi-iscsi-b[47144]: debug _disk update on smithi159, successful 2024-04-21T15:54:46.660 INFO:journalctl@ceph.iscsi.iscsi.b.smithi159.stdout:Apr 21 22:54:46 smithi159 ceph-fc21275e-002f-11ef-9c9f-ac1f6bc659f6-iscsi-iscsi-b[47144]: debug ::ffff:127.0.0.1 - - [21/Apr/2024 22:54:46] "DELETE /api/disk/datapool/block1 HTTP/1.1" 200 - 2024-04-21T15:54:46.660 INFO:journalctl@ceph.iscsi.iscsi.b.smithi159.stdout:Apr 21 22:54:46 smithi159 ceph-fc21275e-002f-11ef-9c9f-ac1f6bc659f6-iscsi-iscsi-b[47144]: ::ffff:127.0.0.1 - - [21/Apr/2024 22:54:46] "DELETE /api/disk/datapool/block1 HTTP/1.1" 200 - 2024-04-21T15:54:46.853 INFO:journalctl@ceph.iscsi.iscsi.a.smithi017.stdout:Apr 21 22:54:46 smithi017 ceph-fc21275e-002f-11ef-9c9f-ac1f6bc659f6-iscsi-iscsi-a[47907]: debug Change detected - internal 24 / xattr 26 refreshing 2024-04-21T15:54:47.159 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:54:46 smithi159 ceph-mon[26763]: pgmap v466: 4 pgs: 4 active+clean; 580 KiB data, 235 MiB used, 715 GiB / 715 GiB avail; 34 KiB/s rd, 2.3 KiB/s wr, 37 op/s 2024-04-21T15:54:47.181 INFO:journalctl@ceph.mon.b.smithi118.stdout:Apr 21 22:54:46 smithi118 ceph-mon[26396]: pgmap v466: 4 pgs: 4 active+clean; 580 KiB data, 235 MiB used, 715 GiB / 715 GiB avail; 34 KiB/s rd, 2.3 KiB/s wr, 37 op/s 2024-04-21T15:54:47.243 INFO:teuthology.task.install.util:Removing shipped files: /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits /usr/bin/stdin-killer... 2024-04-21T15:54:47.268 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:54:46 smithi017 ceph-mon[23490]: pgmap v466: 4 pgs: 4 active+clean; 580 KiB data, 235 MiB used, 715 GiB / 715 GiB avail; 34 KiB/s rd, 2.3 KiB/s wr, 37 op/s 2024-04-21T15:54:47.348 INFO:tasks.cephadm:Teardown begin 2024-04-21T15:54:47.424 INFO:tasks.cephadm:Cleaning up testdir ceph.* files... 2024-04-21T15:54:47.478 INFO:tasks.cephadm:Stopping all daemons...
Why did the test say it failed due to a MON_DOWN warning?¶
We start deploying mon.b (at this point mon.a and mon.c are already deployed). At this point, mon.b is not in the monmap yet:
2024-04-21T15:42:02.105 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:01 smithi017 ceph-mon[23490]: Deploying daemon mon.b on smithi118 2024-04-21T15:42:02.155 INFO:teuthology.orchestra.run.smithi159.stdout: 2024-04-21T15:42:02.155 INFO:teuthology.orchestra.run.smithi159.stdout:{"epoch":2,"fsid":"fc21275e-002f-11ef-9c9f-ac1f6bc659f6","modified":"2024-04-21T22:41:56.715872Z","created":"2024-04-21T22:40:23.832411Z","min_mon_release":19,"min_mon_release_name":"squid","election_strategy":1,"disallowed_leaders: ":"","stretch_mode":false,"tiebreaker_mon":"","removed_ranks: ":"","features":{"persistent":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus","pacific","elector-pinging","quincy","reef","squid"],"optional":[]},"mons":[{"rank":0,"name":"a","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.17:3300","nonce":0},{"type":"v1","addr":"172.21.15.17:6789","nonce":0}]},"addr":"172.21.15.17:6789/0","public_addr":"172.21.15.17:6789/0","priority":0,"weight":0,"crush_location":"{}"},{"rank":1,"name":"c","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.159:3300","nonce":0},{"type":"v1","addr":"172.21.15.159:6789","nonce":0}]},"addr":"172.21.15.159:6789/0","public_addr":"172.21.15.159:6789/0","priority":0,"weight":0,"crush_location":"{}"}],"quorum":[0,1]} 2024-04-21T15:42:02.155 INFO:teuthology.orchestra.run.smithi159.stderr:dumped monmap epoch 2
mon.a and mon.c call for election:
2024-04-21T15:42:02.156 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:01 smithi159 ceph-mon[26763]: mon.a calling monitor election 2024-04-21T15:42:02.156 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:01 smithi159 ceph-mon[26763]: from='mgr.14150 172.21.15.17:0/3986839991' entity='mgr.x' cmd=[{"prefix": "mon metadata", "id": "c"}]: dispatch 2024-04-21T15:42:02.156 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:01 smithi159 ceph-mon[26763]: pgmap v16: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail 2024-04-21T15:42:02.156 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:01 smithi159 ceph-mon[26763]: from='mgr.14150 172.21.15.17:0/3986839991' entity='mgr.x' cmd=[{"prefix": "mon metadata", "id": "c"}]: dispatch 2024-04-21T15:42:02.156 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:01 smithi159 ceph-mon[26763]: mon.c calling monitor election
mon.a becomes the leader and both mons are in quorum (mon.b is still deploying):
2024-04-21T15:42:02.156 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:01 smithi159 ceph-mon[26763]: mon.a is new leader, mons a,c in quorum (ranks 0,1) 2024-04-21T15:42:02.156 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:01 smithi159 ceph-mon[26763]: monmap epoch 2 2024-04-21T15:42:02.156 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:01 smithi159 ceph-mon[26763]: fsid fc21275e-002f-11ef-9c9f-ac1f6bc659f6 2024-04-21T15:42:02.156 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:01 smithi159 ceph-mon[26763]: last_changed 2024-04-21T22:41:56.715872+0000 2024-04-21T15:42:02.156 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:01 smithi159 ceph-mon[26763]: created 2024-04-21T22:40:23.832411+0000 2024-04-21T15:42:02.156 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:01 smithi159 ceph-mon[26763]: min_mon_release 19 (squid) 2024-04-21T15:42:02.156 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:01 smithi159 ceph-mon[26763]: election_strategy: 1 2024-04-21T15:42:02.156 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:01 smithi159 ceph-mon[26763]: 0: [v2:172.21.15.17:3300/0,v1:172.21.15.17:6789/0] mon.a 2024-04-21T15:42:02.156 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:01 smithi159 ceph-mon[26763]: 1: [v2:172.21.15.159:3300/0,v1:172.21.15.159:6789/0] mon.c 2024-04-21T15:42:02.157 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:01 smithi159 ceph-mon[26763]: fsmap 2024-04-21T15:42:02.157 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:01 smithi159 ceph-mon[26763]: osdmap e4: 0 total, 0 up, 0 in 2024-04-21T15:42:02.157 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:01 smithi159 ceph-mon[26763]: mgrmap e13: x(active, since 48s) 2024-04-21T15:42:02.157 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:01 smithi159 ceph-mon[26763]: overall HEALTH_OK
Then we wait for 3 mons in the monmap. Now, mon.b is present in the monmap:
2024-04-21T15:42:03.717 INFO:tasks.cephadm:Waiting for 3 mons in monmap... 2024-04-21T15:42:03.950 INFO:teuthology.orchestra.run.smithi159.stderr:Inferring config /var/lib/ceph/fc21275e-002f-11ef-9c9f-ac1f6bc659f6/mon.c/config 2024-04-21T15:42:04.072 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:03 smithi159 ceph-mon[26763]: pgmap v19: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail 2024-04-21T15:42:04.103 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:03 smithi017 ceph-mon[23490]: pgmap v19: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail 2024-04-21T15:42:05.682 INFO:journalctl@ceph.mon.b.smithi118.stdout:Apr 21 22:42:05 smithi118 systemd[1]: Started Ceph mon.b for fc21275e-002f-11ef-9c9f-ac1f6bc659f6. 2024-04-21T15:42:10.540 INFO:teuthology.orchestra.run.smithi159.stdout: 2024-04-21T15:42:10.540 INFO:teuthology.orchestra.run.smithi159.stdout:{"epoch":3,"fsid":"fc21275e-002f-11ef-9c9f-ac1f6bc659f6","modified":"2024-04-21T22:42:05.526437Z","created":"2024-04-21T22:40:23.832411Z","min_mon_release":19,"min_mon_release_name":"squid","election_strategy":1,"disallowed_leaders: ":"","stretch_mode":false,"tiebreaker_mon":"","removed_ranks: ":"","features":{"persistent":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus","pacific","elector-pinging","quincy","reef","squid"],"optional":[]},"mons":[{"rank":0,"name":"a","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.17:3300","nonce":0},{"type":"v1","addr":"172.21.15.17:6789","nonce":0}]},"addr":"172.21.15.17:6789/0","public_addr":"172.21.15.17:6789/0","priority":0,"weight":0,"crush_location":"{}"},{"rank":1,"name":"c","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.159:3300","nonce":0},{"type":"v1","addr":"172.21.15.159:6789","nonce":0}]},"addr":"172.21.15.159:6789/0","public_addr":"172.21.15.159:6789/0","priority":0,"weight":0,"crush_location":"{}"},{"rank":2,"name":"b","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.118:3300","nonce":0},{"type":"v1","addr":"172.21.15.118:6789","nonce":0}]},"addr":"172.21.15.118:6789/0","public_addr":"172.21.15.118:6789/0","priority":0,"weight":0,"crush_location":"{}"}],"quorum":[0,1]} 2024-04-21T15:42:10.540 INFO:teuthology.orchestra.run.smithi159.stderr:dumped monmap epoch 3
mon.a and mon.c call for election:
2024-04-21T15:42:10.845 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: mon.a calling monitor election 2024-04-21T15:42:10.846 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: mon.c calling monitor election
mon.a ends up as the leader, but only mon.a and mon.c are in quorum:
2024-04-21T15:42:10.846 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: mon.a is new leader, mons a,c in quorum (ranks 0,1) 2024-04-21T15:42:10.846 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: monmap epoch 3 2024-04-21T15:42:10.846 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: fsid fc21275e-002f-11ef-9c9f-ac1f6bc659f6 2024-04-21T15:42:10.846 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: last_changed 2024-04-21T22:42:05.526437+0000 2024-04-21T15:42:10.846 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: created 2024-04-21T22:40:23.832411+0000 2024-04-21T15:42:10.846 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: min_mon_release 19 (squid) 2024-04-21T15:42:10.846 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: election_strategy: 1 2024-04-21T15:42:10.846 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: 0: [v2:172.21.15.17:3300/0,v1:172.21.15.17:6789/0] mon.a 2024-04-21T15:42:10.846 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: 1: [v2:172.21.15.159:3300/0,v1:172.21.15.159:6789/0] mon.c 2024-04-21T15:42:10.846 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: 2: [v2:172.21.15.118:3300/0,v1:172.21.15.118:6789/0] mon.b 2024-04-21T15:42:10.846 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: fsmap 2024-04-21T15:42:10.847 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: osdmap e4: 0 total, 0 up, 0 in 2024-04-21T15:42:10.847 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: mgrmap e13: x(active, since 57s) 2024-04-21T15:42:10.847 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: Health check failed: 1/3 mons down, quorum a,c (MON_DOWN) 2024-04-21T15:42:10.847 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: Health detail: HEALTH_WARN 1/3 mons down, quorum a,c 2024-04-21T15:42:10.847 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: [WRN] MON_DOWN: 1/3 mons down, quorum a,c 2024-04-21T15:42:10.847 INFO:journalctl@ceph.mon.c.smithi159.stdout:Apr 21 22:42:10 smithi159 ceph-mon[26763]: mon.b (rank 2) addr [v2:172.21.15.118:3300/0,v1:172.21.15.118:6789/0] is down (out of quorum)
Next, mon.b calls for election. This time, the ceph.conf is updated, and all monitors are in quorum. The health warning from before is clear.
2024-04-21T15:42:13.654 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: mon.b calling monitor election 2024-04-21T15:42:13.654 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: Updating smithi017:/etc/ceph/ceph.conf 2024-04-21T15:42:13.654 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: Updating smithi118:/etc/ceph/ceph.conf 2024-04-21T15:42:13.654 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: Updating smithi159:/etc/ceph/ceph.conf 2024-04-21T15:42:13.654 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: Updating smithi118:/var/lib/ceph/fc21275e-002f-11ef-9c9f-ac1f6bc659f6/config/ceph.conf 2024-04-21T15:42:13.654 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: Updating smithi017:/var/lib/ceph/fc21275e-002f-11ef-9c9f-ac1f6bc659f6/config/ceph.conf 2024-04-21T15:42:13.654 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: Updating smithi159:/var/lib/ceph/fc21275e-002f-11ef-9c9f-ac1f6bc659f6/config/ceph.conf 2024-04-21T15:42:13.654 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: Reconfiguring mon.a (unknown last config time)... 2024-04-21T15:42:13.654 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: Reconfiguring daemon mon.a on smithi017 2024-04-21T15:42:13.654 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: mon.b calling monitor election 2024-04-21T15:42:13.654 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: mon.c calling monitor election 2024-04-21T15:42:13.654 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: mon.a calling monitor election 2024-04-21T15:42:13.654 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: mon.a is new leader, mons a,c,b in quorum (ranks 0,1,2) 2024-04-21T15:42:13.654 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: monmap epoch 3 2024-04-21T15:42:13.655 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: fsid fc21275e-002f-11ef-9c9f-ac1f6bc659f6 2024-04-21T15:42:13.655 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: last_changed 2024-04-21T22:42:05.526437+0000 2024-04-21T15:42:13.655 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: created 2024-04-21T22:40:23.832411+0000 2024-04-21T15:42:13.655 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: min_mon_release 19 (squid) 2024-04-21T15:42:13.655 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: election_strategy: 1 2024-04-21T15:42:13.655 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: 0: [v2:172.21.15.17:3300/0,v1:172.21.15.17:6789/0] mon.a 2024-04-21T15:42:13.655 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: 1: [v2:172.21.15.159:3300/0,v1:172.21.15.159:6789/0] mon.c 2024-04-21T15:42:13.655 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: 2: [v2:172.21.15.118:3300/0,v1:172.21.15.118:6789/0] mon.b 2024-04-21T15:42:13.655 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: fsmap 2024-04-21T15:42:13.655 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: osdmap e4: 0 total, 0 up, 0 in 2024-04-21T15:42:13.655 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: mgrmap e13: x(active, since 59s) 2024-04-21T15:42:13.655 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: Health check cleared: MON_DOWN (was: 1/3 mons down, quorum a,c) 2024-04-21T15:42:13.655 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: Cluster is now healthy 2024-04-21T15:42:13.655 INFO:journalctl@ceph.mon.a.smithi017.stdout:Apr 21 22:42:13 smithi017 ceph-mon[23490]: overall HEALTH_OK
The next mentions of MON_DOWN come later when we're checking the cluster log for badness. This happens a bit after the final rest_api_delete.t test is run from the first section and teardown occurs. This check does detect that we had the MON_DOWN warning earlier:
2024-04-21T15:56:24.613 INFO:tasks.cephadm:Checking cluster log for badness... 2024-04-21T15:56:24.643 INFO:teuthology.orchestra.run.smithi017.stdout:2024-04-21T22:42:10.535211+0000 mon.a (mon.0) 206 : cluster [WRN] Health check failed: 1/3 mons down, quorum a,c (MON_DOWN) 2024-04-21T15:56:24.643 WARNING:tasks.cephadm:Found errors (ERR|WRN|SEC) in cluster log 2024-04-21T15:56:24.850 INFO:teuthology.orchestra.run.smithi017.stdout:2024-04-21T22:42:10.535211+0000 mon.a (mon.0) 206 : cluster [WRN] Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)
This is why the test failure reason is: Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log
In the cluster log shows the same series of events as the teuthology log, except a few minutes off. (Are cluster logs delayed?)
2024-04-21T22:42:10.535211+0000 mon.a (mon.0) 206 : cluster [WRN] Health check failed: 1/3 mons down, quorum a,c (MON_DOWN) 2024-04-21T22:42:10.538197+0000 mon.a (mon.0) 207 : cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum a,c 2024-04-21T22:42:10.538216+0000 mon.a (mon.0) 208 : cluster [WRN] [WRN] MON_DOWN: 1/3 mons down, quorum a,c 2024-04-21T22:42:10.538222+0000 mon.a (mon.0) 209 : cluster [WRN] mon.b (rank 2) addr [v2:172.21.15.118:3300/0,v1:172.21.15.118:6789/0] is down (out of quorum) 2024-04-21T22:42:07.527675+0000 mon.b (mon.2) 1 : cluster [INF] mon.b calling monitor election 2024-04-21T22:42:12.534479+0000 mon.b (mon.2) 2 : cluster [INF] mon.b calling monitor election 2024-04-21T22:42:12.534787+0000 mon.c (mon.1) 3 : cluster [INF] mon.c calling monitor election 2024-04-21T22:42:12.535024+0000 mon.a (mon.0) 237 : cluster [INF] mon.a calling monitor election 2024-04-21T22:42:12.536672+0000 mon.a (mon.0) 238 : cluster [INF] mon.a is new leader, mons a,c,b in quorum (ranks 0,1,2) 2024-04-21T22:42:12.540048+0000 mon.a (mon.0) 239 : cluster [DBG] monmap epoch 3 2024-04-21T22:42:12.540070+0000 mon.a (mon.0) 240 : cluster [DBG] fsid fc21275e-002f-11ef-9c9f-ac1f6bc659f6 2024-04-21T22:42:12.540086+0000 mon.a (mon.0) 241 : cluster [DBG] last_changed 2024-04-21T22:42:05.526437+0000 2024-04-21T22:42:12.540099+0000 mon.a (mon.0) 242 : cluster [DBG] created 2024-04-21T22:40:23.832411+0000 2024-04-21T22:42:12.540111+0000 mon.a (mon.0) 243 : cluster [DBG] min_mon_release 19 (squid) 2024-04-21T22:42:12.540124+0000 mon.a (mon.0) 244 : cluster [DBG] election_strategy: 1 2024-04-21T22:42:12.540137+0000 mon.a (mon.0) 245 : cluster [DBG] 0: [v2:172.21.15.17:3300/0,v1:172.21.15.17:6789/0] mon.a 2024-04-21T22:42:12.540150+0000 mon.a (mon.0) 246 : cluster [DBG] 1: [v2:172.21.15.159:3300/0,v1:172.21.15.159:6789/0] mon.c 2024-04-21T22:42:12.540163+0000 mon.a (mon.0) 247 : cluster [DBG] 2: [v2:172.21.15.118:3300/0,v1:172.21.15.118:6789/0] mon.b 2024-04-21T22:42:12.540858+0000 mon.a (mon.0) 248 : cluster [DBG] fsmap 2024-04-21T22:42:12.540892+0000 mon.a (mon.0) 249 : cluster [DBG] osdmap e4: 0 total, 0 up, 0 in 2024-04-21T22:42:12.541111+0000 mon.a (mon.0) 250 : cluster [DBG] mgrmap e13: x(active, since 59s) 2024-04-21T22:42:12.541400+0000 mon.a (mon.0) 251 : cluster [INF] Health check cleared: MON_DOWN (was: 1/3 mons down, quorum a,c) 2024-04-21T22:42:12.541425+0000 mon.a (mon.0) 252 : cluster [INF] Cluster is now healthy 2024-04-21T22:42:12.544885+0000 mon.a (mon.0) 253 : cluster [INF] overall HEALTH_OK
Conclusion¶
There was a clear package failure (k8sevents) in the mgr, but this error was not fatal to the mgr. The test summary said the test failed from a MON_DOWN warning, but this was just a warning seen briefly because mon.b had not been added to the ceph.conf yet upon deploying.
It is tough to say how the teuthology test would have failed if we'd had MON_DOWN in the ignorelist, but it might have been something like what was logged in the journal syslog:
journalctl-b0.gz
Apr 21 22:40:38 smithi017 ceph-fc21275e-002f-11ef-9c9f-ac1f6bc659f6-mgr-x[23720]: 2024-04-21T22:40:38.636+0000 7f6f44397040 -1 log_channel(cluster) log [ERR] : Failed to load ceph-mgr modules: k8sevents
Updated by Laura Flores 3 days ago
- Related to Bug #64981: MGR non-fatal crash during startup - k8sevents missing module - Centos 9 / 18.2.2 added
Updated by Laura Flores 3 days ago
Logs are in ~dmick/c9.iscsi.archive.tgz on the teuthology node.