Project

General

Profile

Actions

Bug #57614

closed

"ceph nfs cluster create ..." always show process bound to 2049: unable to deploy ingress

Added by Francesco Pantano over 1 year ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
backport_processed
Backport:
reef, quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Here an example of the issue described in $subject:

root@devstack:/# ceph orch ls
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
crash 1/1 4m ago 2h *
mgr 2/2 4m ago 2h count:2
mon 1/5 4m ago 2h count:5
osd 1 4m ago - <unmanaged>
root@devstack:/# ceph nfs cluster create cephfs --placement=devstack.localdomain --ingress  --virtual-ip 192.168.24.75/24 --port 2049
NFS Cluster Created Successfully
root@devstack:/# ceph orch ls
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
crash 1/1 3s ago 2h *
ingress.nfs.cephfs 192.168.24.75:2049,9049 0/4 - 10s count:2
mgr 2/2 3s ago 2h count:2
mon 1/5 3s ago 2h count:5
nfs.cephfs ?:12049 1/1 3s ago 10s devstack.localdomain
osd 1 3s ago - <unmanaged>
root@devstack:/var/log/ceph# ceph orch ps
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
crash.devstack devstack.localdomain running (2h) 4m ago 2h 7201k - 17.2.3 0912465dcea5 b79b84f7c463
mgr.devstack.jykurw devstack.localdomain running (2h) 4m ago 2h 432M - 17.2.3 0912465dcea5 e70c1bd70f43
mgr.devstack.localdomain.yworig devstack.localdomain *:9283 running (2h) 4m ago 2h 539M - 17.2.3 0912465dcea5 4b0232569f76
mon.devstack.localdomain devstack.localdomain running (2h) 4m ago 2h 440M 2048M 17.2.3 0912465dcea5 1e53d1de366a
nfs.cephfs.0.0.devstack.kfvcel devstack.localdomain *:12049 running (4m) 4m ago 4m 9353k - 4.0 0912465dcea5 dab8ed3fd5bb
osd.0 devstack.localdomain running (2h) 4m ago 2h 101M 4096M 17.2.3 0912465dcea5 88a226cee3e7
root@devstack:/# ceph -W cephadm --watch-debug
cluster:
id: 15b994ed-4341-4522-94e9-56e75279659a
health: HEALTH_WARN
Failed to place 2 daemon(s)
services:
mon: 1 daemons, quorum devstack.localdomain (age 26h)
mgr: devstack.localdomain.yworig(active, since 26h), standbys: devstack.jykurw
osd: 1 osds: 1 up (since 26h), 1 in (since 26h)
data:
pools: 2 pools, 9 pgs
objects: 5 objects, 449 KiB
usage: 21 MiB used, 10 GiB / 10 GiB avail
pgs: 9 active+clean
io:
client: 767 B/s rd, 511 B/s wr, 0 op/s rd, 0 op/s wr

But the ingress daemon fails with the following:

2022-09-20 07:42:53,430 7faa31849740 INFO Deploy daemon haproxy.nfs.cephfs.devstack.ultjer ...
2022-09-20 07:42:53,751 7faa31849740 DEBUG stat: 0 0
2022-09-20 07:42:53,906 7faa31849740 INFO Verifying port 2049 ...
2022-09-20 07:42:53,907 7faa31849740 WARNING Cannot bind to IP 0.0.0.0 port 2049: [Errno 98] Address already in use
2022-09-20 07:42:53,907 7faa31849740 INFO Verifying port 9049 ...
2022-09-20 07:42:53,907 7faa31849740 ERROR ERROR: TCP Port(s) '2049,9049' required for haproxy already in use

netstat shows the following:

LISTEN     0       64                  0.0.0.0:2049               0.0.0.0:*
LISTEN 0 128 :12049 *: users:(("ganesha.nfsd",pid=611540,fd=35))
LISTEN 0 64 [::]:2049 [::]:*

I see many problems here:

1. a process is bound on 2049, and it's not haproxy
2. ganesha, which is bound on $port + [1], is bound on '*', which is a limitation for the "ceph nfs cluster" cli
3. even using a spec, the result is still the same

root@devstack:/# cat nfs
service_type: nfs
service_id: cephfs
placement:
hosts:
- devstack.localdomain
spec:
port: 12345
root@devstack:/# ceph orch ps
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
crash.devstack devstack.localdomain running (2h) 23s ago 2h 7201k - 17.2.3 0912465dcea5 b79b84f7c463
mgr.devstack.jykurw devstack.localdomain running (2h) 23s ago 2h 432M - 17.2.3 0912465dcea5 e70c1bd70f43
mgr.devstack.localdomain.yworig devstack.localdomain *:9283 running (2h) 23s ago 2h 540M - 17.2.3 0912465dcea5 4b0232569f76
mon.devstack.localdomain devstack.localdomain running (2h) 23s ago 2h 444M 2048M 17.2.3 0912465dcea5 1e53d1de366a
nfs.cephfs.0.0.devstack.dhqbzc devstack.localdomain *:12345 running (26s) 23s ago 26s 9365k - 4.0 0912465dcea5 2b4d379d2282
osd.0 devstack.localdomain running (2h) 23s ago 2h 101M 4096M 17.2.3 0912465dcea5 88a226cee3e7
stack@devstack:~$ sudo ss -antop | grep 2049
LISTEN 0 64 0.0.0.0:2049 0.0.0.0:*
LISTEN 0 64 [::]:2049 [::]:*

I still see a process on 2049: I have no ingress in this config, and it will fail with the error described above if I try to apply:

service_type: ingress
service_id: nfs.cephfs
placement:
count: 1
spec:
backend_service: nfs.cephfs
frontend_port: 2049
monitor_port: 8000
virtual_ip: 192.168.24.75/24"

[1] https://github.com/ceph/ceph/blob/beabb1fa114ea75151746817195176ddcf035aa0/src/pybind/mgr/nfs/cluster.py#L70


Related issues 2 (0 open2 closed)

Copied to Orchestrator - Backport #62532: reef: "ceph nfs cluster create ..." always show process bound to 2049: unable to deploy ingressResolvedAdam KingActions
Copied to Orchestrator - Backport #62533: quincy: "ceph nfs cluster create ..." always show process bound to 2049: unable to deploy ingressRejectedAdam KingActions
Actions #1

Updated by Ilya Dryomov over 1 year ago

  • Target version deleted (v17.2.4)
Actions #2

Updated by Adam King about 1 year ago

was later found out this issue only appears when the conflict is between the frontend port haproxy is trying to use and the port the backend service is using. In his case, it should actually work because haproxy is only binding to the VIP we setup while the backend service is binding to the host ip. Fixing this will require making the port check in the binary when we deploy daemons aware of what IP is being bound to (currently it just checks if the port is bound on any IP).

Actions #3

Updated by Adam King about 1 year ago

  • Project changed from 31 to Orchestrator
  • Assignee set to Adam King
  • Severity changed from 2 - major to 3 - minor
Actions #4

Updated by Adam King 8 months ago

  • Backport set to reef
  • Pull request ID set to 53008
Actions #5

Updated by Adam King 8 months ago

  • Status changed from New to Pending Backport
  • Backport changed from reef to reef, quincy
Actions #6

Updated by Backport Bot 8 months ago

  • Copied to Backport #62532: reef: "ceph nfs cluster create ..." always show process bound to 2049: unable to deploy ingress added
Actions #7

Updated by Backport Bot 8 months ago

  • Copied to Backport #62533: quincy: "ceph nfs cluster create ..." always show process bound to 2049: unable to deploy ingress added
Actions #8

Updated by Backport Bot 8 months ago

  • Tags set to backport_processed
Actions #9

Updated by Adam King about 1 month ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF