Project

General

Profile

Bug #57614

"ceph nfs cluster create ..." always show process bound to 2049: unable to deploy ingress

Added by Francesco Pantano 3 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Here an example of the issue described in $subject:

root@devstack:/# ceph orch ls
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
crash 1/1 4m ago 2h *
mgr 2/2 4m ago 2h count:2
mon 1/5 4m ago 2h count:5
osd 1 4m ago - <unmanaged>
root@devstack:/# ceph nfs cluster create cephfs --placement=devstack.localdomain --ingress  --virtual-ip 192.168.24.75/24 --port 2049
NFS Cluster Created Successfully
root@devstack:/# ceph orch ls
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
crash 1/1 3s ago 2h *
ingress.nfs.cephfs 192.168.24.75:2049,9049 0/4 - 10s count:2
mgr 2/2 3s ago 2h count:2
mon 1/5 3s ago 2h count:5
nfs.cephfs ?:12049 1/1 3s ago 10s devstack.localdomain
osd 1 3s ago - <unmanaged>
root@devstack:/var/log/ceph# ceph orch ps
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
crash.devstack devstack.localdomain running (2h) 4m ago 2h 7201k - 17.2.3 0912465dcea5 b79b84f7c463
mgr.devstack.jykurw devstack.localdomain running (2h) 4m ago 2h 432M - 17.2.3 0912465dcea5 e70c1bd70f43
mgr.devstack.localdomain.yworig devstack.localdomain *:9283 running (2h) 4m ago 2h 539M - 17.2.3 0912465dcea5 4b0232569f76
mon.devstack.localdomain devstack.localdomain running (2h) 4m ago 2h 440M 2048M 17.2.3 0912465dcea5 1e53d1de366a
nfs.cephfs.0.0.devstack.kfvcel devstack.localdomain *:12049 running (4m) 4m ago 4m 9353k - 4.0 0912465dcea5 dab8ed3fd5bb
osd.0 devstack.localdomain running (2h) 4m ago 2h 101M 4096M 17.2.3 0912465dcea5 88a226cee3e7
root@devstack:/# ceph -W cephadm --watch-debug
cluster:
id: 15b994ed-4341-4522-94e9-56e75279659a
health: HEALTH_WARN
Failed to place 2 daemon(s)
services:
mon: 1 daemons, quorum devstack.localdomain (age 26h)
mgr: devstack.localdomain.yworig(active, since 26h), standbys: devstack.jykurw
osd: 1 osds: 1 up (since 26h), 1 in (since 26h)
data:
pools: 2 pools, 9 pgs
objects: 5 objects, 449 KiB
usage: 21 MiB used, 10 GiB / 10 GiB avail
pgs: 9 active+clean
io:
client: 767 B/s rd, 511 B/s wr, 0 op/s rd, 0 op/s wr

But the ingress daemon fails with the following:

2022-09-20 07:42:53,430 7faa31849740 INFO Deploy daemon haproxy.nfs.cephfs.devstack.ultjer ...
2022-09-20 07:42:53,751 7faa31849740 DEBUG stat: 0 0
2022-09-20 07:42:53,906 7faa31849740 INFO Verifying port 2049 ...
2022-09-20 07:42:53,907 7faa31849740 WARNING Cannot bind to IP 0.0.0.0 port 2049: [Errno 98] Address already in use
2022-09-20 07:42:53,907 7faa31849740 INFO Verifying port 9049 ...
2022-09-20 07:42:53,907 7faa31849740 ERROR ERROR: TCP Port(s) '2049,9049' required for haproxy already in use

netstat shows the following:

LISTEN     0       64                  0.0.0.0:2049               0.0.0.0:*
LISTEN 0 128 :12049 *: users:(("ganesha.nfsd",pid=611540,fd=35))
LISTEN 0 64 [::]:2049 [::]:*

I see many problems here:

1. a process is bound on 2049, and it's not haproxy
2. ganesha, which is bound on $port + [1], is bound on '*', which is a limitation for the "ceph nfs cluster" cli
3. even using a spec, the result is still the same

root@devstack:/# cat nfs
service_type: nfs
service_id: cephfs
placement:
hosts:
- devstack.localdomain
spec:
port: 12345
root@devstack:/# ceph orch ps
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
crash.devstack devstack.localdomain running (2h) 23s ago 2h 7201k - 17.2.3 0912465dcea5 b79b84f7c463
mgr.devstack.jykurw devstack.localdomain running (2h) 23s ago 2h 432M - 17.2.3 0912465dcea5 e70c1bd70f43
mgr.devstack.localdomain.yworig devstack.localdomain *:9283 running (2h) 23s ago 2h 540M - 17.2.3 0912465dcea5 4b0232569f76
mon.devstack.localdomain devstack.localdomain running (2h) 23s ago 2h 444M 2048M 17.2.3 0912465dcea5 1e53d1de366a
nfs.cephfs.0.0.devstack.dhqbzc devstack.localdomain *:12345 running (26s) 23s ago 26s 9365k - 4.0 0912465dcea5 2b4d379d2282
osd.0 devstack.localdomain running (2h) 23s ago 2h 101M 4096M 17.2.3 0912465dcea5 88a226cee3e7
stack@devstack:~$ sudo ss -antop | grep 2049
LISTEN 0 64 0.0.0.0:2049 0.0.0.0:*
LISTEN 0 64 [::]:2049 [::]:*

I still see a process on 2049: I have no ingress in this config, and it will fail with the error described above if I try to apply:

service_type: ingress
service_id: nfs.cephfs
placement:
count: 1
spec:
backend_service: nfs.cephfs
frontend_port: 2049
monitor_port: 8000
virtual_ip: 192.168.24.75/24"

[1] https://github.com/ceph/ceph/blob/beabb1fa114ea75151746817195176ddcf035aa0/src/pybind/mgr/nfs/cluster.py#L70

History

#1 Updated by Ilya Dryomov 2 months ago

  • Target version deleted (v17.2.4)

Also available in: Atom PDF