Bug #52866
closedremoval of iscsi causes mgr module to fail
0%
Description
this does not happen all the time maybe 10% of the time. could potentially effect of daemon type that have post actions
health: HEALTH_ERR
Module 'cephadm' has failed: dashboard iscsi-gateway-rm failed: iSCSI gateway 'ceph-pnataraj-7ypsv7-node3' does not exist retval: -2
sequence of events: https://pastebin.com/EA5peQVt
traceback: https://pastebin.com/hfjvK5m7
what happens is 3 iscsi daemons are made and the iscsi type is added to self.mgr.requires_post_actions https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/serve.py#L1128
then _check_daemons is ran and gets the list of daemons from the cache https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/serve.py#L920
it possible the the cache does not contain all 3 new iscsi daemons
so when the post deamon actions are ran its only passed 2 deamondescriptions instead of 3 https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/serve.py#L1006
the dashboard iscsi-gateway list then only contains 2 of the 3 daemons https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/services/iscsi.py#L120
and eventually when the service is removed it will try to remove 3 entries from the dashboard iscsi-gateway list but only 2 exist and it crashes https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/services/iscsi.py#L162
so the cause of the crash is the dashboard iscsi-gateway list not begin setup correctly when the daemons are originally deployed because they are not all in the cache when the post actions are ran