Bug #57614: "ceph nfs cluster create ..." always show process bound to 2049: unable to deploy ingress - Orchestrator - Ceph

Actions

Copy link

Bug #57614

closed

"ceph nfs cluster create ..." always show process bound to 2049: unable to deploy ingress

Added by Francesco Pantano over 1 year ago. Updated about 2 months ago.

Status:

Resolved

Priority:

Normal

Assignee:

Adam King

Category:

Target version:

% Done:

Source:

Community (dev)

Tags:

backport_processed

Backport:

reef, quincy

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

53008

Crash signature (v1):

Crash signature (v2):

Description

Here an example of the issue described in $subject:

root@devstack:/# ceph orch ls
    NAME   PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
    crash             1/1  4m ago     2h   *
    mgr               2/2  4m ago     2h   count:2
    mon               1/5  4m ago     2h   count:5
    osd                 1  4m ago     -    &lt;unmanaged&gt;

root@devstack:/# ceph nfs cluster create cephfs --placement=devstack.localdomain --ingress  --virtual-ip 192.168.24.75/24 --port 2049
    NFS Cluster Created Successfully

root@devstack:/# ceph orch ls
    NAME                PORTS                    RUNNING  REFRESHED  AGE  PLACEMENT
    crash                                            1/1  3s ago     2h   *
    ingress.nfs.cephfs  192.168.24.75:2049,9049      0/4  -          10s  count:2
    mgr                                              2/2  3s ago     2h   count:2
    mon                                              1/5  3s ago     2h   count:5
    nfs.cephfs          ?:12049                      1/1  3s ago     10s  devstack.localdomain
    osd                                                1  3s ago     -    &lt;unmanaged&gt;

root@devstack:/var/log/ceph# ceph orch ps
    NAME                             HOST                  PORTS    STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
    crash.devstack                   devstack.localdomain           running (2h)     4m ago   2h    7201k        -  17.2.3   0912465dcea5  b79b84f7c463
    mgr.devstack.jykurw              devstack.localdomain           running (2h)     4m ago   2h     432M        -  17.2.3   0912465dcea5  e70c1bd70f43
    mgr.devstack.localdomain.yworig  devstack.localdomain  *:9283   running (2h)     4m ago   2h     539M        -  17.2.3   0912465dcea5  4b0232569f76
    mon.devstack.localdomain         devstack.localdomain           running (2h)     4m ago   2h     440M    2048M  17.2.3   0912465dcea5  1e53d1de366a
    nfs.cephfs.0.0.devstack.kfvcel   devstack.localdomain  *:12049  running (4m)     4m ago   4m    9353k        -  4.0      0912465dcea5  dab8ed3fd5bb
    osd.0                            devstack.localdomain           running (2h)     4m ago   2h     101M    4096M  17.2.3   0912465dcea5  88a226cee3e7

root@devstack:/# ceph -W cephadm --watch-debug
      cluster:
        id:     15b994ed-4341-4522-94e9-56e75279659a
        health: HEALTH_WARN
                Failed to place 2 daemon(s)

services:
    mon: 1 daemons, quorum devstack.localdomain (age 26h)
    mgr: devstack.localdomain.yworig(active, since 26h), standbys: devstack.jykurw
    osd: 1 osds: 1 up (since 26h), 1 in (since 26h)

data:
    pools:   2 pools, 9 pgs
    objects: 5 objects, 449 KiB
    usage:   21 MiB used, 10 GiB / 10 GiB avail
    pgs:     9 active+clean

io:
    client:   767 B/s rd, 511 B/s wr, 0 op/s rd, 0 op/s wr

But the ingress daemon fails with the following:

2022-09-20 07:42:53,430 7faa31849740 INFO Deploy daemon haproxy.nfs.cephfs.devstack.ultjer ...
  2022-09-20 07:42:53,751 7faa31849740 DEBUG stat: 0 0
  2022-09-20 07:42:53,906 7faa31849740 INFO Verifying port 2049 ...
  2022-09-20 07:42:53,907 7faa31849740 WARNING Cannot bind to IP 0.0.0.0 port 2049: [Errno 98] Address already in use
  2022-09-20 07:42:53,907 7faa31849740 INFO Verifying port 9049 ...
  2022-09-20 07:42:53,907 7faa31849740 ERROR ERROR: TCP Port(s) '2049,9049' required for haproxy already in use

netstat shows the following:

LISTEN     0       64                  0.0.0.0:2049               0.0.0.0:*
  LISTEN     0       128                       :12049                    *:      users:(("ganesha.nfsd",pid=611540,fd=35))
  LISTEN     0       64                     [::]:2049                  [::]:*

I see many problems here:

1. a process is bound on 2049, and it's not haproxy
2. ganesha, which is bound on $port + [1], is bound on '*', which is a limitation for the "ceph nfs cluster" cli
3. even using a spec, the result is still the same

root@devstack:/# cat nfs
  service_type: nfs
  service_id: cephfs
  placement:
    hosts:
      - devstack.localdomain
  spec:
    port: 12345

root@devstack:/# ceph orch ps
  NAME                             HOST                  PORTS    STATUS         REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
  crash.devstack                   devstack.localdomain           running (2h)     23s ago   2h    7201k        -  17.2.3   0912465dcea5  b79b84f7c463
  mgr.devstack.jykurw              devstack.localdomain           running (2h)     23s ago   2h     432M        -  17.2.3   0912465dcea5  e70c1bd70f43
  mgr.devstack.localdomain.yworig  devstack.localdomain  *:9283   running (2h)     23s ago   2h     540M        -  17.2.3   0912465dcea5  4b0232569f76
  mon.devstack.localdomain         devstack.localdomain           running (2h)     23s ago   2h     444M    2048M  17.2.3   0912465dcea5  1e53d1de366a
  nfs.cephfs.0.0.devstack.dhqbzc   devstack.localdomain  *:12345  running (26s)    23s ago  26s    9365k        -  4.0      0912465dcea5  2b4d379d2282
  osd.0                            devstack.localdomain           running (2h)     23s ago   2h     101M    4096M  17.2.3   0912465dcea5  88a226cee3e7

stack@devstack:~$ sudo ss -antop | grep 2049
  LISTEN     0       64                  0.0.0.0:2049               0.0.0.0:*
  LISTEN     0       64                     [::]:2049                  [::]:*

I still see a process on 2049: I have no ingress in this config, and it will fail with the error described above if I try to apply:

service_type: ingress
  service_id: nfs.cephfs
  placement:
    count: 1
  spec:
    backend_service: nfs.cephfs
    frontend_port: 2049
    monitor_port: 8000
    virtual_ip: 192.168.24.75/24"

[1] https://github.com/ceph/ceph/blob/beabb1fa114ea75151746817195176ddcf035aa0/src/pybind/mgr/nfs/cluster.py#L70

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Orchestrator

Custom queries

Bug #57614

"ceph nfs cluster create ..." always show process bound to 2049: unable to deploy ingress

Updated by Ilya Dryomov over 1 year ago

Updated by Adam King about 1 year ago

Updated by Adam King about 1 year ago

Updated by Adam King 9 months ago

Updated by Adam King 9 months ago

Updated by Backport Bot 9 months ago

Updated by Backport Bot 9 months ago

Updated by Backport Bot 9 months ago

Updated by Adam King about 2 months ago