Bug #51291
closedAdoption fails for Ceph MDS servers
0%
Description
I'm migrating my Ceph cluster from `ceph-ansible` to `cephadm` by following the guide here: https://docs.ceph.com/en/octopus/cephadm/adoption/
I've made it to step 10 where one runs the command:
# ceph orch apply mds <fs-name> [--placement=<placement>]
After running this nothing changes. I know it did something as now `ceph orch` returns MDS servers, but none deployed
# ceph orch ls NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID mds.cephfs 0/3 - - athos2;athos3;athos4;count:3 <unknown> <unknown> mgr 5/0 16m ago - <unmanaged> docker.io/ceph/ceph:v15.2.13 2cf504fded39 mon 5/0 16m ago - <unmanaged> docker.io/ceph/ceph:v15.2.13 2cf504fded39
The target FS is called `cephfs`
# ceph fs ls name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
If I do a `cephadm ls` on the node, it only returns the legacy MDS server. I've tried disabling the legacy service on the target machine but no success so far.
Digging deeper, I found the following from ceph orch
# ceph orch ls --service_name=mds.cephfs --format yaml service_type: mds service_id: cephfs service_name: mds.cephfs placement: count: 3 hosts: - athos2 - athos3 - athos4 status: running: 0 size: 3 events: - '2021-06-19T23:32:01.844902Z service:mds.cephfs [ERROR] "Failed while placing mds.cephfs.athos4.wqwvixon athos4: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos4.wqwvix --config-json -"' - '2021-06-19T23:32:01.949145Z service:mds.cephfs [ERROR] "Failed while placing mds.cephfs.athos2.vemowmon athos2: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos2.vemowm --config-json -"' - '2021-06-19T23:32:41.577409Z service:mds.cephfs [ERROR] "Failed while placing mds.cephfs.athos3.iubqwaon athos3: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos3.iubqwa --config-json -"' - '2021-06-19T23:32:43.647630Z service:mds.cephfs [ERROR] "Failed while placing mds.cephfs.athos4.amlogwon athos4: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos4.amlogw --config-json -"' - '2021-06-19T23:32:49.889821Z service:mds.cephfs [ERROR] "Failed while placing mds.cephfs.athos2.ebrxnmon athos2: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos2.ebrxnm --config-json -"'
I've been stuck here. Running the command manually hangs without any further output. I had hoped that meant it'd be running in the foreground, but running `cephadm ls` on the node returned no active services.