Hi Xiubo,
The logs for the job link in the description is not matching the logs snippet provided by you.
I see the job has failed with following Traceback
2023-10-17T00:24:07.651 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: 2023-10-17T00:24:07.359+0000 7f3f16afe700 -1 log_channel(cephadm) log [ERR] : Can't communicate with remote host `172.21.15.70`, possibly because the host is not reac
hable or python3 is not installed on the host. [Errno 113] Connect call failed ('172.21.15.70', 22)
2023-10-17T00:24:07.651 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: Traceback (most recent call last):
2023-10-17T00:24:07.651 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: File "/usr/share/ceph/mgr/cephadm/ssh.py", line 122, in redirect_log
2023-10-17T00:24:07.651 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: yield
2023-10-17T00:24:07.652 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: File "/usr/share/ceph/mgr/cephadm/ssh.py", line 101, in _remote_connection
2023-10-17T00:24:07.652 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: preferred_auth=['publickey'], options=ssh_options)
2023-10-17T00:24:07.652 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: File "/lib/python3.6/site-packages/asyncssh/connection.py", line 6804, in connect
2023-10-17T00:24:07.652 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: 'Opening SSH connection to')
2023-10-17T00:24:07.652 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: File "/lib/python3.6/site-packages/asyncssh/connection.py", line 299, in _connect
2023-10-17T00:24:07.653 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: local_addr=local_addr)
2023-10-17T00:24:07.653 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: File "/lib64/python3.6/asyncio/base_events.py", line 794, in create_connection
2023-10-17T00:24:07.653 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: raise exceptions[0]
2023-10-17T00:24:07.653 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: File "/lib64/python3.6/asyncio/base_events.py", line 781, in create_connection
2023-10-17T00:24:07.653 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: yield from self.sock_connect(sock, address)
2023-10-17T00:24:07.653 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: File "/lib64/python3.6/asyncio/selector_events.py", line 439, in sock_connect
2023-10-17T00:24:07.654 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: return (yield from fut)
2023-10-17T00:24:07.654 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: File "/lib64/python3.6/asyncio/selector_events.py", line 469, in _sock_connect_cb
2023-10-17T00:24:07.654 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: raise OSError(err, 'Connect call failed %s' % (address,))
2023-10-17T00:24:07.654 INFO:journalctl@ceph.mgr.z.smithi177.stdout:Oct 17 00:24:07 smithi177 ceph-b278b73a-6c81-11ee-8db6-212e2dc638e7-mgr-z[105683]: OSError: [Errno 113] Connect call failed ('172.21.15.70', 22)
And 25 % pgs are degraded
2023-10-17T00:23:52.324 INFO:journalctl@ceph.mon.b.smithi079.stdout:Oct 17 00:23:52 smithi079 ceph-mon[103382]: pgmap v4: 129 pgs: 3 down, 32 active+clean, 40 active+undersized, 31 undersized+peered, 1 unknown, 16 active+undersized+degraded, 6 undersized+degraded+peered; 20 MiB data, 356 MiB used, 715 GiB / 715 GiB
avail; 59/227 objects degraded (25.991%)
2023-10-17T00:23:52.324 INFO:journalctl@ceph.mon.b.smithi079.stdout:Oct 17 00:23:52 smithi079 ceph-mon[103382]: Health check failed: Reduced data availability: 12 pgs inactive, 3 pgs down (PG_AVAILABILITY)
2023-10-17T00:23:52.325 INFO:journalctl@ceph.mon.b.smithi079.stdout:Oct 17 00:23:52 smithi079 ceph-mon[103382]: Health check failed: Degraded data redundancy: 59/227 objects degraded (25.991%), 22 pgs degraded (PG_DEGRADED)
2023-10-17T00:23:52.325 INFO:journalctl@ceph.mon.b.smithi079.stdout:Oct 17 00:23:52 smithi079 ceph-mon[103382]: mgrmap e29: z(active, since 2s), standbys: y
2023-10-17T00:23:52.401 INFO:journalctl@ceph.mon.c.smithi177.stdout:Oct 17 00:23:52 smithi177 ceph-mon[103807]: pgmap v4: 129 pgs: 3 down, 32 active+clean, 40 active+undersized, 31 undersized+peered, 1 unknown, 16 active+undersized+degraded, 6 undersized+degraded+peered; 20 MiB data, 356 MiB used, 715 GiB / 715 GiB
avail; 59/227 objects degraded (25.991%)
And I also see the following on `smithi070`
2023-10-17T00:19:19.766315+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered blocking state
2023-10-17T00:19:19.766380+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered disabled state
2023-10-17T00:19:19.766415+00:00 smithi070 kernel: device brx.0 entered promiscuous mode
2023-10-17T00:19:19.776687+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered blocking state
2023-10-17T00:19:19.776728+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered forwarding state
2023-10-17T00:20:43.463574+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered disabled state
2023-10-17T00:20:43.474848+00:00 smithi070 kernel: device brx.0 left promiscuous mode
2023-10-17T00:20:43.474898+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered disabled state
2023-10-17T00:20:46.599641+00:00 smithi070 kernel: IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
2023-10-17T00:20:46.733074+00:00 smithi070 kernel: IPv6: ADDRCONF(NETDEV_UP): brx.0: link is not ready
2023-10-17T00:20:46.733124+00:00 smithi070 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): brx.0: link becomes ready
2023-10-17T00:20:46.733145+00:00 smithi070 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
2023-10-17T00:20:46.770505+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered blocking state
2023-10-17T00:20:46.770583+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered disabled state
2023-10-17T00:20:46.770609+00:00 smithi070 kernel: device brx.0 entered promiscuous mode
2023-10-17T00:20:46.782244+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered blocking state
2023-10-17T00:20:46.782290+00:00 smithi070 kernel: ceph-brx: port 1(brx.0) entered forwarding state
2023-10-17T00:20:46.992612+00:00 smithi070 kernel: Key type dns_resolver registered
2023-10-17T00:20:47.020613+00:00 smithi070 kernel: Key type ceph registered
2023-10-17T00:20:47.027618+00:00 smithi070 kernel: libceph: loaded (mon/osd proto 15/24)
2023-10-17T00:20:47.069320+00:00 smithi070 kernel: ceph: loaded (mds proto 32)
2023-10-17T00:20:47.092579+00:00 smithi070 kernel: ceph: device name is missing path (no : separator in 0@b278b73a-6c81-11ee-8db6-212e2dc638e7.cephfs=/volumes/_nogroup/sv_1/01bcc01b-872b-44d5-ae90-55cda190fd63)
2023-10-17T00:20:47.101591+00:00 smithi070 kernel: libceph: mon1 (1)172.21.15.79:6789 session established
2023-10-17T00:20:47.109585+00:00 smithi070 kernel: libceph: client25127 fsid b278b73a-6c81-11ee-8db6-212e2dc638e7
2023-10-17T00:20:47.117605+00:00 smithi070 kernel: ceph: mds1 session blocklisted
2023-10-17T00:20:47.172627+00:00 smithi070 kernel: ceph: mds0 session blocklisted
Could you please double check ?