Project

General

Profile

Actions

Bug #51291

closed

Adoption fails for Ceph MDS servers

Added by Jesse Roland almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm migrating my Ceph cluster from `ceph-ansible` to `cephadm` by following the guide here: https://docs.ceph.com/en/octopus/cephadm/adoption/

I've made it to step 10 where one runs the command:

# ceph orch apply mds <fs-name> [--placement=<placement>]

After running this nothing changes. I know it did something as now `ceph orch` returns MDS servers, but none deployed

# ceph orch ls
NAME        RUNNING  REFRESHED  AGE  PLACEMENT                     IMAGE NAME                    IMAGE ID      
mds.cephfs      0/3  -          -    athos2;athos3;athos4;count:3  <unknown>                     <unknown>     
mgr             5/0  16m ago    -    <unmanaged>                   docker.io/ceph/ceph:v15.2.13  2cf504fded39  
mon             5/0  16m ago    -    <unmanaged>                   docker.io/ceph/ceph:v15.2.13  2cf504fded39 

The target FS is called `cephfs`

# ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]

If I do a `cephadm ls` on the node, it only returns the legacy MDS server. I've tried disabling the legacy service on the target machine but no success so far.

Digging deeper, I found the following from ceph orch

# ceph orch ls --service_name=mds.cephfs --format yaml
service_type: mds
service_id: cephfs
service_name: mds.cephfs
placement:
  count: 3
  hosts:
  - athos2
  - athos3
  - athos4
status:
  running: 0
  size: 3
events:
- '2021-06-19T23:32:01.844902Z service:mds.cephfs [ERROR] "Failed while placing mds.cephfs.athos4.wqwvixon
  athos4: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15
  --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos4.wqwvix
  --config-json -"'
- '2021-06-19T23:32:01.949145Z service:mds.cephfs [ERROR] "Failed while placing mds.cephfs.athos2.vemowmon
  athos2: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15
  --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos2.vemowm
  --config-json -"'
- '2021-06-19T23:32:41.577409Z service:mds.cephfs [ERROR] "Failed while placing mds.cephfs.athos3.iubqwaon
  athos3: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15
  --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos3.iubqwa
  --config-json -"'
- '2021-06-19T23:32:43.647630Z service:mds.cephfs [ERROR] "Failed while placing mds.cephfs.athos4.amlogwon
  athos4: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15
  --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos4.amlogw
  --config-json -"'
- '2021-06-19T23:32:49.889821Z service:mds.cephfs [ERROR] "Failed while placing mds.cephfs.athos2.ebrxnmon
  athos2: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15
  --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos2.ebrxnm
  --config-json -"'

I've been stuck here. Running the command manually hangs without any further output. I had hoped that meant it'd be running in the foreground, but running `cephadm ls` on the node returned no active services.

Actions

Also available in: Atom PDF