Bug #57897
openceph mgr restart causes restart of all iscsi daemons in a loop
0%
Description
We have observed that since v17.2.4, a restart of the active ceph mgr appears to cause all iSCSI daemons to restart and then continue to restart in a loop causing a complete loss of iSCSI service.
We are still working out the steps to reproduce in a test environment, but given the severity of impact I thought i'd create this issue. It feels likely to be a regression from https://tracker.ceph.com/issues/57527
Current steps to reproduce:- 3 node cephadm adopted cluster
- Deploy 3 iSCSI daemons
- Run `ceph orch upgrade start --ceph-version 17.2.5`
- (This may also happen on any active manager failover)
These logs indicate the cosntant reconfiguration:
2022-10-19 06:37:42 log_channel(cephadm) log [INF] : Reconfiguring iscsi.iscsi.node0.kwbtan (dependencies changed)... 2022-10-19 06:37:42 log_channel(cephadm) log [INF] : Reconfiguring daemon iscsi.iscsi.node0.kwbtan on node0.ma1.tnp.infra 2022-10-19 06:37:56 log_channel(cephadm) log [INF] : Reconfiguring iscsi.iscsi.node1.adffbc (dependencies changed)... 2022-10-19 06:37:56 log_channel(cephadm) log [INF] : Reconfiguring daemon iscsi.iscsi.node1.adffbc on node1.ma1.tnp.infra 2022-10-19 06:38:11 log_channel(cephadm) log [INF] : Reconfiguring iscsi.iscsi.node2.rogtfr (dependencies changed)... 2022-10-19 06:38:11 log_channel(cephadm) log [INF] : Reconfiguring daemon iscsi.iscsi.node2.rogtfr on node2.ma1.tnp.infra 2022-10-19 06:39:29 log_channel(cephadm) log [INF] : Reconfiguring iscsi.iscsi.node0.kwbtan (dependencies changed)... 2022-10-19 06:39:29 log_channel(cephadm) log [INF] : Reconfiguring daemon iscsi.iscsi.node0.kwbtan on node0.ma1.tnp.infra 2022-10-19 06:39:33 log_channel(cephadm) log [INF] : Reconfiguring iscsi.iscsi.node1.adffbc (dependencies changed)... 2022-10-19 06:39:34 log_channel(cephadm) log [INF] : Reconfiguring daemon iscsi.iscsi.node1.adffbc on node1.ma1.tnp.infra 2022-10-19 06:39:39 log_channel(cephadm) log [INF] : Reconfiguring iscsi.iscsi.node2.rogtfr (dependencies changed)... 2022-10-19 06:39:39 log_channel(cephadm) log [INF] : Reconfiguring daemon iscsi.iscsi.node2.rogtfr on node2.ma1.tnp.infra 2022-10-19 06:40:01 log_channel(cephadm) log [INF] : Reconfiguring iscsi.iscsi.node0.kwbtan (dependencies changed)... [etc]
Updated by David Heap over 1 year ago
We currently have this situation again after an mgr failover for node maintenance/reboots and managed enabled debug logging
It looks like the orchestrator keeps reconfiguring the service due to thinking the mgr ip has changed when it hasn't - based on the code in serve.py here https://github.com/ceph/ceph/blob/v17.2.5/src/pybind/mgr/cephadm/serve.py#L910 it looks like the ip address it's getting as `last_deps` isn't updating, so it keeps trying to reconfigure the ip each time:
Nov 01 18:35:51 node1 ceph-mon[3901]: iscsi.iscsi.node0.ldfwch deps ['172.16.1.4'] -> ['172.16.1.6']
Nov 01 18:35:51 node1 ceph-mon[3901]: Reconfiguring iscsi.iscsi.node0.ldfwch (dependencies changed)...
Nov 01 18:36:03 node1 ceph-mon[3901]: iscsi.iscsi.node2.lofqpp deps ['172.16.1.5'] -> ['172.16.1.6']
Nov 01 18:36:03 node1 ceph-mon[3901]: Reconfiguring iscsi.iscsi.node2.lofqpp (dependencies changed)...
Nov 01 18:36:06 node1 ceph-mon[3901]: iscsi.iscsi.node0.ldfwch deps ['172.16.1.4'] -> ['172.16.1.6']
Nov 01 18:36:06 node1 ceph-mon[3901]: Reconfiguring iscsi.iscsi.node0.ldfwch (dependencies changed)...
Nov 01 18:37:10 node1 ceph-mon[3901]: iscsi.iscsi.node2.lofqpp deps ['172.16.1.5'] -> ['172.16.1.6']
Nov 01 18:37:10 node1 ceph-mon[3901]: Reconfiguring iscsi.iscsi.node2.lofqpp (dependencies changed)...
Nov 01 18:37:15 node1 ceph-mon[3901]: iscsi.iscsi.node0.ldfwch deps ['172.16.1.4'] -> ['172.16.1.6']
Nov 01 18:37:15 node1 ceph-mon[3901]: Reconfiguring iscsi.iscsi.node0.ldfwch (dependencies changed)...
Nov 01 18:38:19 node1 ceph-mon[3901]: iscsi.iscsi.node2.lofqpp deps ['172.16.1.5'] -> ['172.16.1.6']
Nov 01 18:38:19 node1 ceph-mon[3901]: Reconfiguring iscsi.iscsi.node2.lofqpp (dependencies changed)...
Nov 01 18:38:25 node1 ceph-mon[3901]: iscsi.iscsi.node0.ldfwch deps ['172.16.1.4'] -> ['172.16.1.6']
Nov 01 18:38:25 node1 ceph-mon[3901]: Reconfiguring iscsi.iscsi.node0.ldfwch (dependencies changed)...
Nov 01 18:39:31 node1 ceph-mon[3901]: iscsi.iscsi.node2.lofqpp deps ['172.16.1.5'] -> ['172.16.1.6']
Nov 01 18:39:31 node1 ceph-mon[3901]: Reconfiguring iscsi.iscsi.node2.lofqpp (dependencies changed)...
Nov 01 18:39:35 node1 ceph-mon[3901]: iscsi.iscsi.node0.ldfwch deps ['172.16.1.4'] -> ['172.16.1.6']
Nov 01 18:39:35 node1 ceph-mon[3901]: Reconfiguring iscsi.iscsi.node0.ldfwch (dependencies changed)...
When we pause orchestration and check inside the container that's created, the trusted ip list looks to be appended with the correct mgr ip as expected in quincy (which i believe is the reason the reconfiguration takes place since https://github.com/ceph/ceph/commit/cda82c98a32f51cb392fc51ba854bcae409567f8 )
trusted_ip_list = 172.16.1.4,172.16.1.5,172.16.1.6,172.16.1.7,172.16.1.8,172.16.1.6
Updated by David Heap over 1 year ago
Attempts to stop or redeploy the daemon don't work as they seem to invoke a dependency check, which then restarts the service again
Nov 02 12:18:04 dh-ceph01-test ceph-mon[2054]: Schedule stop daemon iscsi.iscsi.dh-ceph01-test.jrdndc
Nov 02 12:18:05 dh-ceph01-test ceph-mon[2054]: Reconfiguring iscsi.iscsi.dh-ceph01-test.jrdndc (dependencies changed)...
Nov 02 12:18:05 dh-ceph01-test ceph-mon[2054]: Reconfiguring daemon iscsi.iscsi.dh-ceph01-test.jrdndc on dh-ceph01-test
Nov 02 12:18:23 dh-ceph01-test ceph-mon[2054]: Schedule redeploy daemon iscsi.iscsi.dh-ceph01-test.jrdndc
Nov 02 12:18:23 dh-ceph01-test ceph-mon[2054]: Reconfiguring iscsi.iscsi.dh-ceph01-test.jrdndc (dependencies changed)...
Nov 02 12:18:23 dh-ceph01-test ceph-mon[2054]: Reconfiguring daemon iscsi.iscsi.dh-ceph01-test.jrdndc on dh-ceph01-test
Updated by Redouane Kachach Elhichou over 1 year ago
In this case I think it would be helpful to see what's the actual content of the deps. To get this information, from a cephadm shell enable the debug level and grep for 'deps'. Something like:
ceph config set mgr mgr/cephadm/log_to_cluster_level debug; ceph -W cephadm --watch-debug | grep -e iscsi -e deps
Updated by Adam King over 1 year ago
this is a painful one. @David at least until we have a fix for this, I will mention that setting the iscsi spec to unmanaged (https://docs.ceph.com/en/quincy/cephadm/services/#disabling-automatic-management-of-daemons) will stop cephadm from doing these dependency checks and will at least stop this from happening. The downside is it won't update the trusted ip list when it actually should (that's what it's trying to do here since the active mgr ip needs to be in that list), but theoretically just a "ceph orch redeploy <iscsi-service-name>" after the mgr failover happens should resolve it.
Updated by David Heap over 1 year ago
Adam King wrote:
this is a painful one. @David at least until we have a fix for this, I will mention that setting the iscsi spec to unmanaged (https://docs.ceph.com/en/quincy/cephadm/services/#disabling-automatic-management-of-daemons) will stop cephadm from doing these dependency checks and will at least stop this from happening. The downside is it won't update the trusted ip list when it actually should (that's what it's trying to do here since the active mgr ip needs to be in that list), but theoretically just a "ceph orch redeploy <iscsi-service-name>" after the mgr failover happens should resolve it.
Thanks, we will look into setting it to unmanaged for now. We've currently got orchestration paused at a point where all the containers are up, which i guess is doing a similar thing in a more global way.
Our mgr IPs are in the initial trusted IP lists we configured when deploying the iscsi services, so the updates aren't really required in our clusters anyway.
Redouane Kachach Elhichou wrote:
In this case I think it would be helpful to see what's the actual content of the deps. To get this information, from a cephadm shell enable the debug level and grep for 'deps'.
It's the same IP address issue as above when attempting to stop or redeploy the service
2022-11-08T11:59:11.048572+0000 mgr.dh-ceph00-test [INF] Schedule stop daemon iscsi.iscsi.dh-ceph00-test.asunqg
2022-11-08T11:59:11.279916+0000 mgr.dh-ceph00-test [DBG] iscsi.iscsi.dh-ceph00-test.asunqg deps ['10.10.15.183'] -> ['10.10.15.182']
2022-11-08T11:59:11.280093+0000 mgr.dh-ceph00-test [INF] Reconfiguring iscsi.iscsi.dh-ceph00-test.asunqg (dependencies changed)...
2022-11-08T11:59:21.609811+0000 mgr.dh-ceph00-test [INF] Schedule redeploy daemon iscsi.iscsi.dh-ceph00-test.asunqg
2022-11-08T11:59:21.822075+0000 mgr.dh-ceph00-test [DBG] iscsi.iscsi.dh-ceph00-test.asunqg deps ['10.10.15.183'] -> ['10.10.15.182']
2022-11-08T11:59:21.822242+0000 mgr.dh-ceph00-test [INF] Reconfiguring iscsi.iscsi.dh-ceph00-test.asunqg (dependencies changed)...
Updated by Redouane Kachach Elhichou over 1 year ago
I tried to reproduce the issue with the same setup but wasn't successful so far. I started from a cluster running v17.2.4 with iscsi devices and then upgraded to v17.2.5. Upgrade was smooth and I didn't see any issues.
Updated by Mykola Golub over 1 year ago
David mentioned [1] as a potential cause when the issue was introduced. But actually I think it is [2]. And apart that there seem to be a bug that causes an issue in David's environment, I am not sure I like the approach very much. I.e. now on every mgr failover we are reconfiguring and restarting iscsi gateways because of the trusted_ip_list change. I think it could be improved if the list includes not only the active mgr ip, but all mgr ips.
[1] https://github.com/ceph/ceph/commit/cda82c98a32f51cb392fc51ba854bcae409567f8
[2] https://github.com/ceph/ceph/pull/48076/commits/7ad668158c76c498c08ca584a27d288de0ac1e3b
Updated by David Heap over 1 year ago
Hi Mykola and Redouane
Thanks for looking into this - we initially thought the same regarding the commit in the [2] link (see Dan's opening entry above), but it doesn't look like that commit is in the code tagged in 17.2.5:
https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/module.py#L2480
https://github.com/ceph/ceph/blob/v17.2.5/src/pybind/mgr/cephadm/module.py#L2362
nor is the new code in the containers running 17.2.5:
heapd@dh-ceph00-test:~$ sudo ceph version
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
heapd@dh-ceph00-test:~$ sudo podman ps | grep mgr
c85b4d1c7e76 quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 -n mgr.dh-ceph00-... 9 days ago Up 9 days ago ...
heapd@dh-ceph00-test:~$ sudo podman exec c85b4d1c7e76 grep iscsi /usr/share/ceph/mgr/cephadm/module.py
from .services.iscsi import IscsiService
elif daemon_type == 'iscsi':
'iscsi': PlacementSpec(count=1),
def apply_iscsi(self, spec: ServiceSpec) -> str:
Agree that it probably shouldn't be reconfiguring and restarting on every mgr failover if the pool of mgr IPs is staying the same
Updated by Mykola Golub about 1 year ago
Hi David,
Although I was not able to reproduce your issue, I think the commit [1] I mentioned in my previous comment may fix your issue, as it changes how deps is calculated. This change will be in 17.2.6.
Also I created a ticket [2] and PR [3] to change the mgr cephadm module not to add the mgr ip in to the trusted_ip_list if it's already there. In your case (when you already have the mgr ips in the list) it should avoid restarting the iscsi gateway on mgr failover.
[1] https://github.com/ceph/ceph/pull/48076/commits/7ad668158c76c498c08ca584a27d288de0ac1e3b
[2] https://tracker.ceph.com/issues/58792
[3] https://github.com/ceph/ceph/pull/50167
Updated by David Heap about 1 year ago
Hi Mykola
Unfortunately the change in deps calculation in 17.2.6 didn't resolve the issue as the deps still change, but hopefully your deduplication PR will help once released - thanks for submitting that!
2023-04-12T16:15:29.180866+0100 mgr.dh-ceph01-test [DBG] Applying service iscsi.iscsi spec 2023-04-12T16:15:29.181770+0100 mgr.dh-ceph01-test [DBG] Combine hosts with existing daemons [<DaemonDescription>(iscsi.iscsi.dh-ceph00-test.asunqg)] + new hosts [] 2023-04-12T16:15:29.326384+0100 mgr.dh-ceph01-test [DBG] iscsi.iscsi.dh-ceph00-test.asunqg deps ['10.10.15.181,10.10.15.182,10.10.15.183,10.10.15.184,10.10.15.185,10.10.15.182'] -> ['10.10.15.181,10.10.15.182,10.10.15.183,10.10.15.184,10.10.15.185,10.10.15.183'] 2023-04-12T16:15:29.326585+0100 mgr.dh-ceph01-test [INF] Reconfiguring iscsi.iscsi.dh-ceph00-test.asunqg (dependencies changed)... 2023-04-12T16:15:29.336105+0100 mgr.dh-ceph01-test [INF] Reconfiguring daemon iscsi.iscsi.dh-ceph00-test.asunqg on dh-ceph00-test