Project

General

Profile

Bug #59472

cephadm: ingress service is not redeployed from an offline host

Added by Michael Fritch 12 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
Normal
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
reef, quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The following placement spec defines an nfs HA cluster using a daemon count of 1:

admin:~ # ceph orch ls ingress --export
service_type: ingress
service_id: nfs.nfs-ha
service_name: ingress.nfs.nfs-ha
placement:
  count: 1
  hosts:
  - node2
  - node3
spec:
  backend_service: nfs.nfs-ha
  frontend_port: 2049
  monitor_port: 9049
  virtual_ip: 192.168.121.7/24

admin:~ # ceph orch ls nfs --export
service_type: nfs
service_id: nfs-ha
service_name: nfs.nfs-ha
placement:
  count: 1
  hosts:
  - node2
  - node3 
spec:
  port: 12049

And when deployed the ingress and nfs service are running on `node2`:

admin:~ # ceph orch ps | egrep 'nfs|haproxy' 
haproxy.nfs.nfs-ha.node2.fnjqfm     node2   *:2049,9049  running (34m)     3m ago   6d    7876k        -  2.0.14                63d60b027ee7  1a365e33b9ff
keepalived.nfs.nfs-ha.node2.xotdlc  node2                running (6d)      3m ago   6d    14.1M        -  2.0.19                e127a1458705  32f6b0865307
nfs.nfs-ha.0.19.node2.pwwgal        node2   *:12049      running (36m)     3m ago  36m    77.0M        -  3.3                   45534bad0fd1  facacf17da8c

When `node2` goes offline, the nfs daemon is redeployed to `node3`, but haproxy/keepalived are never redeployed:

admin:~ # ceph orch ps | egrep 'nfs|haproxy'
haproxy.nfs.nfs-ha.node2.fnjqfm     node2   *:2049,9049  host is offline     8m ago   6d    7876k        -  2.0.14                63d60b027ee7  1a365e33b9ff
keepalived.nfs.nfs-ha.node2.xotdlc  node2                host is offline     8m ago   6d    14.1M        -  2.0.19                e127a1458705  32f6b0865307
nfs.nfs-ha.0.19.node2.pwwgal        node2   *:12049      host is offline     8m ago  41m    77.0M        -  3.3                   45534bad0fd1  facacf17da8c
nfs.nfs-ha.0.22.node3.fhwfvf        node3   *:12049      running (79s)      77s ago  79s    27.1M        -  3.3                   45534bad0fd1  b8be0d90d139

Which prevents clients from preforming recovery during failover. A similar situation is also possible with a larger daemon count if more than one node were to also go offline.


Related issues

Copied to Orchestrator - Backport #59544: pacific: cephadm: ingress service is not redeployed from an offline host Resolved
Copied to Orchestrator - Backport #59545: quincy: cephadm: ingress service is not redeployed from an offline host Resolved
Copied to Orchestrator - Backport #59546: reef: cephadm: ingress service is not redeployed from an offline host Resolved

History

#1 Updated by Michael Fritch 12 months ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 51120

#2 Updated by Adam King 11 months ago

  • Backport set to reef, quincy, pacific

#3 Updated by Adam King 11 months ago

  • Status changed from Fix Under Review to Pending Backport

#4 Updated by Backport Bot 11 months ago

  • Copied to Backport #59544: pacific: cephadm: ingress service is not redeployed from an offline host added

#5 Updated by Backport Bot 11 months ago

  • Copied to Backport #59545: quincy: cephadm: ingress service is not redeployed from an offline host added

#6 Updated by Backport Bot 11 months ago

  • Copied to Backport #59546: reef: cephadm: ingress service is not redeployed from an offline host added

#7 Updated by Backport Bot 11 months ago

  • Tags set to backport_processed

#8 Updated by Adam King 10 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF