Bug #52866
closedremoval of iscsi causes mgr module to fail
0%
Description
this does not happen all the time maybe 10% of the time. could potentially effect of daemon type that have post actions
health: HEALTH_ERR
Module 'cephadm' has failed: dashboard iscsi-gateway-rm failed: iSCSI gateway 'ceph-pnataraj-7ypsv7-node3' does not exist retval: -2
sequence of events: https://pastebin.com/EA5peQVt
traceback: https://pastebin.com/hfjvK5m7
what happens is 3 iscsi daemons are made and the iscsi type is added to self.mgr.requires_post_actions https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/serve.py#L1128
then _check_daemons is ran and gets the list of daemons from the cache https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/serve.py#L920
it possible the the cache does not contain all 3 new iscsi daemons
so when the post deamon actions are ran its only passed 2 deamondescriptions instead of 3 https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/serve.py#L1006
the dashboard iscsi-gateway list then only contains 2 of the 3 daemons https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/services/iscsi.py#L120
and eventually when the service is removed it will try to remove 3 entries from the dashboard iscsi-gateway list but only 2 exist and it crashes https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/services/iscsi.py#L162
so the cause of the crash is the dashboard iscsi-gateway list not begin setup correctly when the daemons are originally deployed because they are not all in the cache when the post actions are ran
Updated by Daniel Pivonka over 2 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 43454
Updated by Sebastian Wagner over 2 years ago
- Status changed from Fix Under Review to Resolved