Bug #59472
cephadm: ingress service is not redeployed from an offline host
% Done:
0%
Source:
Tags:
backport_processed
Backport:
reef, quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Description
The following placement spec defines an nfs HA cluster using a daemon count of 1:
admin:~ # ceph orch ls ingress --export service_type: ingress service_id: nfs.nfs-ha service_name: ingress.nfs.nfs-ha placement: count: 1 hosts: - node2 - node3 spec: backend_service: nfs.nfs-ha frontend_port: 2049 monitor_port: 9049 virtual_ip: 192.168.121.7/24 admin:~ # ceph orch ls nfs --export service_type: nfs service_id: nfs-ha service_name: nfs.nfs-ha placement: count: 1 hosts: - node2 - node3 spec: port: 12049
And when deployed the ingress and nfs service are running on `node2`:
admin:~ # ceph orch ps | egrep 'nfs|haproxy' haproxy.nfs.nfs-ha.node2.fnjqfm node2 *:2049,9049 running (34m) 3m ago 6d 7876k - 2.0.14 63d60b027ee7 1a365e33b9ff keepalived.nfs.nfs-ha.node2.xotdlc node2 running (6d) 3m ago 6d 14.1M - 2.0.19 e127a1458705 32f6b0865307 nfs.nfs-ha.0.19.node2.pwwgal node2 *:12049 running (36m) 3m ago 36m 77.0M - 3.3 45534bad0fd1 facacf17da8c
When `node2` goes offline, the nfs daemon is redeployed to `node3`, but haproxy/keepalived are never redeployed:
admin:~ # ceph orch ps | egrep 'nfs|haproxy' haproxy.nfs.nfs-ha.node2.fnjqfm node2 *:2049,9049 host is offline 8m ago 6d 7876k - 2.0.14 63d60b027ee7 1a365e33b9ff keepalived.nfs.nfs-ha.node2.xotdlc node2 host is offline 8m ago 6d 14.1M - 2.0.19 e127a1458705 32f6b0865307 nfs.nfs-ha.0.19.node2.pwwgal node2 *:12049 host is offline 8m ago 41m 77.0M - 3.3 45534bad0fd1 facacf17da8c nfs.nfs-ha.0.22.node3.fhwfvf node3 *:12049 running (79s) 77s ago 79s 27.1M - 3.3 45534bad0fd1 b8be0d90d139
Which prevents clients from preforming recovery during failover. A similar situation is also possible with a larger daemon count if more than one node were to also go offline.
Related issues
History
#1 Updated by Michael Fritch 12 months ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 51120
#4 Updated by Backport Bot 11 months ago
- Copied to Backport #59544: pacific: cephadm: ingress service is not redeployed from an offline host added
#5 Updated by Backport Bot 11 months ago
- Copied to Backport #59545: quincy: cephadm: ingress service is not redeployed from an offline host added
#6 Updated by Backport Bot 11 months ago
- Copied to Backport #59546: reef: cephadm: ingress service is not redeployed from an offline host added
#7 Updated by Backport Bot 11 months ago
- Tags set to backport_processed