Bug #51291
closedAdoption fails for Ceph MDS servers
0%
Description
I'm migrating my Ceph cluster from `ceph-ansible` to `cephadm` by following the guide here: https://docs.ceph.com/en/octopus/cephadm/adoption/
I've made it to step 10 where one runs the command:
# ceph orch apply mds <fs-name> [--placement=<placement>]
After running this nothing changes. I know it did something as now `ceph orch` returns MDS servers, but none deployed
# ceph orch ls NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID mds.cephfs 0/3 - - athos2;athos3;athos4;count:3 <unknown> <unknown> mgr 5/0 16m ago - <unmanaged> docker.io/ceph/ceph:v15.2.13 2cf504fded39 mon 5/0 16m ago - <unmanaged> docker.io/ceph/ceph:v15.2.13 2cf504fded39
The target FS is called `cephfs`
# ceph fs ls name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
If I do a `cephadm ls` on the node, it only returns the legacy MDS server. I've tried disabling the legacy service on the target machine but no success so far.
Digging deeper, I found the following from ceph orch
# ceph orch ls --service_name=mds.cephfs --format yaml service_type: mds service_id: cephfs service_name: mds.cephfs placement: count: 3 hosts: - athos2 - athos3 - athos4 status: running: 0 size: 3 events: - '2021-06-19T23:32:01.844902Z service:mds.cephfs [ERROR] "Failed while placing mds.cephfs.athos4.wqwvixon athos4: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos4.wqwvix --config-json -"' - '2021-06-19T23:32:01.949145Z service:mds.cephfs [ERROR] "Failed while placing mds.cephfs.athos2.vemowmon athos2: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos2.vemowm --config-json -"' - '2021-06-19T23:32:41.577409Z service:mds.cephfs [ERROR] "Failed while placing mds.cephfs.athos3.iubqwaon athos3: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos3.iubqwa --config-json -"' - '2021-06-19T23:32:43.647630Z service:mds.cephfs [ERROR] "Failed while placing mds.cephfs.athos4.amlogwon athos4: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos4.amlogw --config-json -"' - '2021-06-19T23:32:49.889821Z service:mds.cephfs [ERROR] "Failed while placing mds.cephfs.athos2.ebrxnmon athos2: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos2.ebrxnm --config-json -"'
I've been stuck here. Running the command manually hangs without any further output. I had hoped that meant it'd be running in the foreground, but running `cephadm ls` on the node returned no active services.
Updated by Jesse Roland almost 3 years ago
Posting an update with additional details. I was able to get some more verbose output from running `ceph log last cephadm`
RuntimeError: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos4.nsevry --config-json - 2021-06-28T12:19:36.468436+0000 mgr.athos2 (mgr.5211678) 1623980 : cephadm [INF] Deploying daemon mds.cephfs.athos2.adupvw on athos2 2021-06-28T12:19:36.504448+0000 mgr.athos2 (mgr.5211678) 1623985 : cephadm [ERR] Traceback (most recent call last): 2021-06-28T12:19:36.504677+0000 mgr.athos2 (mgr.5211678) 1623986 : cephadm [ERR] File "/lib/python3.6/site-packages/remoto/process.py", line 188, in check 2021-06-28T12:19:36.504883+0000 mgr.athos2 (mgr.5211678) 1623987 : cephadm [ERR] response = result.receive(timeout) 2021-06-28T12:19:36.505087+0000 mgr.athos2 (mgr.5211678) 1623988 : cephadm [ERR] File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 749, in receive 2021-06-28T12:19:36.505290+0000 mgr.athos2 (mgr.5211678) 1623989 : cephadm [ERR] raise self._getremoteerror() or EOFError() 2021-06-28T12:19:36.505525+0000 mgr.athos2 (mgr.5211678) 1623990 : cephadm [ERR] execnet.gateway_base.RemoteError: Traceback (most recent call last): 2021-06-28T12:19:36.505741+0000 mgr.athos2 (mgr.5211678) 1623991 : cephadm [ERR] File "<string>", line 1088, in executetask 2021-06-28T12:19:36.505951+0000 mgr.athos2 (mgr.5211678) 1623992 : cephadm [ERR] File "/lib/python3.6/site-packages/remoto/process.py", line 151, in _remote_check 2021-06-28T12:19:36.506161+0000 mgr.athos2 (mgr.5211678) 1623993 : cephadm [ERR] File "/usr/lib/python3.6/subprocess.py", line 863, in communicate 2021-06-28T12:19:36.506370+0000 mgr.athos2 (mgr.5211678) 1623994 : cephadm [ERR] stdout, stderr = self._communicate(input, endtime, timeout) 2021-06-28T12:19:36.506579+0000 mgr.athos2 (mgr.5211678) 1623995 : cephadm [ERR] File "/usr/lib/python3.6/subprocess.py", line 1519, in _communicate 2021-06-28T12:19:36.506785+0000 mgr.athos2 (mgr.5211678) 1623996 : cephadm [ERR] input_view = memoryview(self._input) 2021-06-28T12:19:36.506993+0000 mgr.athos2 (mgr.5211678) 1623997 : cephadm [ERR] TypeError: memoryview: a bytes-like object is required, not 'str' 2021-06-28T12:19:36.507202+0000 mgr.athos2 (mgr.5211678) 1623998 : cephadm [ERR] 2021-06-28T12:19:36.507409+0000 mgr.athos2 (mgr.5211678) 1623999 : cephadm [ERR] 2021-06-28T12:19:36.508516+0000 mgr.athos2 (mgr.5211678) 1624001 : cephadm [ERR] Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos2.adupvw --config-json - Traceback (most recent call last): File "/lib/python3.6/site-packages/remoto/process.py", line 188, in check response = result.receive(timeout) File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 749, in receive raise self._getremoteerror() or EOFError() execnet.gateway_base.RemoteError: Traceback (most recent call last): File "<string>", line 1088, in executetask File "/lib/python3.6/site-packages/remoto/process.py", line 151, in _remote_check File "/usr/lib/python3.6/subprocess.py", line 863, in communicate stdout, stderr = self._communicate(input, endtime, timeout) File "/usr/lib/python3.6/subprocess.py", line 1519, in _communicate input_view = memoryview(self._input) TypeError: memoryview: a bytes-like object is required, not 'str' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/module.py", line 1021, in _remote_connection yield (conn, connr) File "/usr/share/ceph/mgr/cephadm/module.py", line 1157, in _run_cephadm stdin=stdin) File "/lib/python3.6/site-packages/remoto/process.py", line 209, in check 'Failed to execute command: %s' % ' '.join(command) RuntimeError: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mds.cephfs.athos2.adupvw --config-json -
This appears to be a python related error. There were a few tickets about this in the past, but none were MDS related
Updated by Jesse Roland almost 3 years ago
I had put this task on the shelf for a while to work on other stuff and since the cluster was still in a functional state. Coming back and inspecting I'm realizing this python error is occurring on all of my containers, includings monitors, managers, and OSD's
2021-07-12T20:35:52.738777+0000 mgr.athos2 (mgr.5211678) 3363601 : cephadm [ERR] Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no- container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mgr.athos6 --reconfig --config-json - Traceback (most recent call last): File "/lib/python3.6/site-packages/remoto/process.py", line 188, in check response = result.receive(timeout) File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 749, in receive raise self._getremoteerror() or EOFError() execnet.gateway_base.RemoteError: Traceback (most recent call last): File "<string>", line 1088, in executetask File "/lib/python3.6/site-packages/remoto/process.py", line 151, in _remote_check File "/usr/lib/python3.6/subprocess.py", line 863, in communicate stdout, stderr = self._communicate(input, endtime, timeout) File "/usr/lib/python3.6/subprocess.py", line 1519, in _communicate input_view = memoryview(self._input) TypeError: memoryview: a bytes-like object is required, not 'str' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/module.py", line 1021, in _remote_connection yield (conn, connr) File "/usr/share/ceph/mgr/cephadm/module.py", line 1157, in _run_cephadm stdin=stdin) File "/lib/python3.6/site-packages/remoto/process.py", line 209, in check 'Failed to execute command: %s' % ' '.join(command) RuntimeError: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mgr.athos6 --reconfig --config-json - 2021-07-12T20:35:52.740158+0000 mgr.athos2 (mgr.5211678) 3363602 : cephadm [INF] Reconfiguring mon.athos6 (unknown last config time)... 2021-07-12T20:35:52.747412+0000 mgr.athos2 (mgr.5211678) 3363605 : cephadm [INF] Deploying daemon mon.athos6 on athos6 2021-07-12T20:35:54.859395+0000 mgr.athos2 (mgr.5211678) 3363611 : cephadm [ERR] Traceback (most recent call last): 2021-07-12T20:35:54.859597+0000 mgr.athos2 (mgr.5211678) 3363612 : cephadm [ERR] File "/lib/python3.6/site-packages/remoto/process.py", line 188, in check 2021-07-12T20:35:54.859796+0000 mgr.athos2 (mgr.5211678) 3363613 : cephadm [ERR] response = result.receive(timeout) 2021-07-12T20:35:54.860031+0000 mgr.athos2 (mgr.5211678) 3363614 : cephadm [ERR] File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 749, in receive 2021-07-12T20:35:54.860260+0000 mgr.athos2 (mgr.5211678) 3363615 : cephadm [ERR] raise self._getremoteerror() or EOFError() 2021-07-12T20:35:54.860484+0000 mgr.athos2 (mgr.5211678) 3363616 : cephadm [ERR] execnet.gateway_base.RemoteError: Traceback (most recent call last): 2021-07-12T20:35:54.860719+0000 mgr.athos2 (mgr.5211678) 3363617 : cephadm [ERR] File "<string>", line 1088, in executetask 2021-07-12T20:35:54.860939+0000 mgr.athos2 (mgr.5211678) 3363618 : cephadm [ERR] File "/lib/python3.6/site-packages/remoto/process.py", line 151, in _remote_check 2021-07-12T20:35:54.861150+0000 mgr.athos2 (mgr.5211678) 3363619 : cephadm [ERR] File "/usr/lib/python3.6/subprocess.py", line 863, in communicate 2021-07-12T20:35:54.861459+0000 mgr.athos2 (mgr.5211678) 3363620 : cephadm [ERR] stdout, stderr = self._communicate(input, endtime, timeout) 2021-07-12T20:35:54.861677+0000 mgr.athos2 (mgr.5211678) 3363621 : cephadm [ERR] File "/usr/lib/python3.6/subprocess.py", line 1519, in _communicate 2021-07-12T20:35:54.861887+0000 mgr.athos2 (mgr.5211678) 3363622 : cephadm [ERR] input_view = memoryview(self._input) 2021-07-12T20:35:54.862103+0000 mgr.athos2 (mgr.5211678) 3363623 : cephadm [ERR] TypeError: memoryview: a bytes-like object is required, not 'str' 2021-07-12T20:35:54.862310+0000 mgr.athos2 (mgr.5211678) 3363624 : cephadm [ERR] 2021-07-12T20:35:54.862519+0000 mgr.athos2 (mgr.5211678) 3363625 : cephadm [ERR] 2021-07-12T20:35:54.863743+0000 mgr.athos2 (mgr.5211678) 3363627 : cephadm [ERR] Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mon.athos6 --config-json - Traceback (most recent call last): File "/lib/python3.6/site-packages/remoto/process.py", line 188, in check response = result.receive(timeout) File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 749, in receive raise self._getremoteerror() or EOFError() execnet.gateway_base.RemoteError: Traceback (most recent call last): File "<string>", line 1088, in executetask File "/lib/python3.6/site-packages/remoto/process.py", line 151, in _remote_check File "/usr/lib/python3.6/subprocess.py", line 863, in communicate stdout, stderr = self._communicate(input, endtime, timeout) File "/usr/lib/python3.6/subprocess.py", line 1519, in _communicate input_view = memoryview(self._input) TypeError: memoryview: a bytes-like object is required, not 'str' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/module.py", line 1021, in _remote_connection yield (conn, connr) File "/usr/share/ceph/mgr/cephadm/module.py", line 1157, in _run_cephadm stdin=stdin) File "/lib/python3.6/site-packages/remoto/process.py", line 209, in check 'Failed to execute command: %s' % ' '.join(command) RuntimeError: Failed to execute command: sudo /usr/bin/cephadm --image docker.io/ceph/ceph:v15 --no-container-init deploy --fsid 85361255-4989-4e27-bdb3-e017b9081911 --name mon.athos6 --config-json -
Updated by Jesse Roland almost 3 years ago
I've tracked down the issue. More details with fix here: https://github.com/alfredodeza/remoto/issues/65
The problem stems from the remoto library, which is not properly encoding the `stdin` variable. I've filed a PR to address this here: https://github.com/alfredodeza/remoto/pull/66/
There may be a better solution, but as of now patching remoto/process.py in the container has fixed the issue for me.
Updated by Sebastian Wagner over 2 years ago
- Project changed from Ceph to Orchestrator
Updated by Sebastian Wagner over 2 years ago
- Status changed from New to Resolved
awesome. Thank you!
resolved upstream