Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2022-01-28T15:55:57ZCeph
Redmine Orchestrator - Documentation #54056 (New): https://docs.ceph.com/en/latest/cephadm/install/#deplo...https://tracker.ceph.com/issues/540562022-01-28T15:55:57ZSebastian Wagner
<p><a class="external" href="https://docs.ceph.com/en/latest/cephadm/install/#deployment-in-an-isolated-environment">https://docs.ceph.com/en/latest/cephadm/install/#deployment-in-an-isolated-environment</a></p>
<ul>
<li>e.g. the monitoring stack and ingress is completely missing</li>
<li>link to the registry-login functionality</li>
</ul> Orchestrator - Documentation #54051 (New): cephadm: advise users how when when to change OSD specshttps://tracker.ceph.com/issues/540512022-01-28T12:24:42ZSebastian Wagner
<p>In general it works fine, there is just one caveat: we're not going to touch any already deployed OSDs</p>
<p>I'd recommend to stick to a particular osd layout, but adjust the placement for example<br />let's say testing the deployment on 1 host and then rolling it out on all the others</p>
<p>If a user changes other properties of the OSD spec, he has to understand that existing osds are not getting redeployed. E.g. changing the encraption flag of an OSD spec doesn't magically encrypt any OSDs. Only new OSDs will then be encrypted.</p> Orchestrator - Cleanup #54042 (New): move pybind/ceph_argparse.py and pybind/ceph_daemon.py into ...https://tracker.ceph.com/issues/540422022-01-27T15:28:18ZSebastian Wagner
<p>Why? because there is not reason we need those as extra RPM packages. We can simply remove those RPM packages and simply make use of python-common.</p>
<p>This is basically just a <strong>git mv src/pybind/ceph_argparse.py src/python_common/ceph/argparse.py</strong> plus some import fixes plus some ceph.spec cleanup.</p>
<p>Benefit is clear: as those modules are huge, we'd have the ability to properly refactor those finally</p> Orchestrator - Tasks #54011 (New): Improve osdspec previewshttps://tracker.ceph.com/issues/540112022-01-25T11:32:04ZSebastian Wagner
<p>There are three major things that needs to be improved for osd spec previews:</p>
<ul>
<li>Can we make it such that we don't actually need to ssh to remote hosts? And instead rely solely on the data cephadm already has available? This is needed in order to make use of it from the dashboard</li>
<li>can we get some decent testing coverage? I don't know if this is still properly working</li>
<li>Can we prettify the output it generates?</li>
</ul> Orchestrator - Cleanup #54002 (New): orchestrator interface: service type should be a Python Enumhttps://tracker.ceph.com/issues/540022022-01-24T16:36:01ZSebastian Wagner
<pre><code class="python syntaxhl"><span class="CodeRay"><span class="keyword">class</span> <span class="class">ServiceType</span>(enum.Enum):
mon = <span class="string"><span class="delimiter">'</span><span class="content">mon</span><span class="delimiter">'</span></span>
mgr = <span class="string"><span class="delimiter">'</span><span class="content">mgr</span><span class="delimiter">'</span></span>
rbd_mirror = <span class="string"><span class="delimiter">'</span><span class="content">rbd-mirror</span><span class="delimiter">'</span></span>
cephfs_mirror = <span class="string"><span class="delimiter">'</span><span class="content">cephfs-mirror</span><span class="delimiter">'</span></span>
crash = <span class="string"><span class="delimiter">'</span><span class="content">crash</span><span class="delimiter">'</span></span>
alertmanager = <span class="string"><span class="delimiter">'</span><span class="content">alertmanager</span><span class="delimiter">'</span></span>
grafana = <span class="string"><span class="delimiter">'</span><span class="content">grafana</span><span class="delimiter">'</span></span>
node_exporter = <span class="string"><span class="delimiter">'</span><span class="content">node-exporter</span><span class="delimiter">'</span></span>
prometheus = <span class="string"><span class="delimiter">'</span><span class="content">prometheus</span><span class="delimiter">'</span></span>
mds = <span class="string"><span class="delimiter">'</span><span class="content">mds</span><span class="delimiter">'</span></span>
rgw = <span class="string"><span class="delimiter">'</span><span class="content">rgw</span><span class="delimiter">'</span></span>
nfs = <span class="string"><span class="delimiter">'</span><span class="content">nfs</span><span class="delimiter">'</span></span>
iscsi = <span class="string"><span class="delimiter">'</span><span class="content">iscsi</span><span class="delimiter">'</span></span>
snmp_gateway = <span class="string"><span class="delimiter">'</span><span class="content">snmp-gateway</span><span class="delimiter">'</span></span>
</span></code></pre>
<p>should be used throughout the whole cephadm code base. this makes things a bit more clearer.</p>
<p>Is it worth it? I don't know.</p> Orchestrator - Cleanup #54001 (New): type safe Python mon_command API clientshttps://tracker.ceph.com/issues/540012022-01-24T16:13:41ZSebastian Wagner
<p>like for the last 6 years, I always wondered, why we don't have an type<br />safe way to use the mon command api. Turns out I never had the time to<br />actually work on it. But my idea looks more or less like so:</p>
<p><a class="external" href="https://pad.ceph.com/p/mon_command_python_api">https://pad.ceph.com/p/mon_command_python_api</a></p>
<p>what do you think? Is there any use for something like this?</p>
<pre><code class="python syntaxhl"><span class="CodeRay"><span class="comment"># mon_command_api.py</span>
<span class="keyword">class</span> <span class="class">CommandResult</span>(NamedTuple):
ret: <span class="predefined">int</span>
stdout: <span class="predefined">str</span>
stderr: <span class="predefined">str</span>
<span class="keyword">def</span> <span class="function">Call</span>(NamedTuple):
api: MonCommandApi
prefix: <span class="predefined">str</span>
<span class="keyword">def</span> <span class="function">__call__</span>(<span class="predefined-constant">self</span>, **kwargs):
kwargs[<span class="string"><span class="delimiter">'</span><span class="content">prefix</span><span class="delimiter">'</span></span>] = prefix
<span class="keyword">return</span> CommandResult(<span class="predefined-constant">self</span>.api._mon_command(kwargs))
<span class="keyword">class</span> <span class="class">MonCommandApi</span>:
<span class="keyword">def</span> <span class="function">__init__</span>(<span class="predefined-constant">self</span>, rados):
<span class="predefined-constant">self</span>.rados = rados
<span class="keyword">def</span> <span class="function">_mon_command</span>(<span class="predefined-constant">self</span>, <span class="predefined">dict</span>):
<span class="keyword">return</span> CommandResult(<span class="predefined-constant">self</span>.rados.mon_command(<span class="predefined">dict</span>))
<span class="keyword">def</span> <span class="function">__get_attr__</span>(<span class="predefined-constant">self</span>, aname):
<span class="keyword">return</span> Call(<span class="predefined-constant">self</span>, aname.replace(<span class="string"><span class="delimiter">'</span><span class="content">_</span><span class="delimiter">'</span></span>, <span class="string"><span class="delimiter">'</span><span class="content"> </span><span class="delimiter">'</span></span>))
<span class="comment"># mon_command_api.pyi. Autogenerated by a script, exactly like https://docs.ceph.com/en/latest/api/mon_command_api/</span>
<span class="keyword">class</span> <span class="class">MonCommandApi</span>:
<span class="keyword">def</span> <span class="function">__init__</span>(<span class="predefined-constant">self</span>, rados): <span class="predefined-constant">None</span>:
<span class="predefined-constant">self</span>.rados = rados
<span class="keyword">def</span> <span class="function">_mon_command</span>(<span class="predefined-constant">self</span>, <span class="predefined">dict</span>) -> CommandResult:
...
<span class="keyword">def</span> <span class="function">orch_rm_daemon</span>(name: <span class="predefined">str</span>) -> CommandResult
...
<span class="keyword">def</span> <span class="function">osd_create_pool</span>(pool: <span class="predefined">str</span>, pg_num: <span class="predefined">int</span>, size: <span class="predefined">int</span>)
...
<span class="comment"># example use:</span>
<span class="keyword">def</span> <span class="function">orch_rm_daemon</span>(daemon_name):
api = MonCommandApi(...)
<span class="keyword">return</span> api.orch_rm_daemon(name=daemon_name)
<span class="keyword">def</span> <span class="function">other_function</span>(daemon_name):
api = MonCommandApi(...)
<span class="keyword">return</span> api.osd_create_pool(pool=<span class="string"><span class="delimiter">'</span><span class="content">my_pool</span><span class="delimiter">'</span></span>, pg_num=<span class="integer">42</span>, size=<span class="integer">3</span>)
</span></code></pre> Orchestrator - Cleanup #54000 (New): cephadm: upgrade commands should return yamlhttps://tracker.ceph.com/issues/540002022-01-24T15:39:53ZSebastian Wagner
<p>Right now, commands like</p>
<ul>
<li>ceph orch upgrade ls</li>
<li>ceph orch upgrade status</li>
</ul>
<p>are returning json. YAML is much more readable. Let's return yaml instead!</p>
<p><strong>Note</strong>, now that needs a <strong>--format=...</strong> argument.</p> Orchestrator - Cleanup #53999 (New): orch interface: cephadm contains a lot of special apply methodshttps://tracker.ceph.com/issues/539992022-01-24T15:28:49ZSebastian Wagner
<pre><code class="python syntaxhl"><span class="CodeRay"> <span class="decorator">@handle_orch_error</span>
<span class="keyword">def</span> <span class="function">apply_rgw</span>(<span class="predefined-constant">self</span>, spec: ServiceSpec) -> <span class="predefined">str</span>:
<span class="keyword">return</span> <span class="predefined-constant">self</span>._apply(spec)
</span></code></pre>
<p>Let's remove them! It's getting out of hand by now. Just use <strong>apply</strong> for everything.</p> Orchestrator - Cleanup #53998 (New): Drop support for old-style service spechttps://tracker.ceph.com/issues/539982022-01-24T15:25:56ZSebastian Wagner
<pre><code class="yaml syntaxhl"><span class="CodeRay"><span class="key">service_type</span>: <span class="string"><span class="content">nfs</span></span>
<span class="key">service_id</span>: <span class="string"><span class="content">foo</span></span>
<span class="key">pool</span>: <span class="string"><span class="content">mypool</span></span>
<span class="key">namespace</span>: <span class="string"><span class="content">myns</span></span>
</span></code></pre>
<p>I.e. without any indentation. This is irritating for more complicated specs and already deprecated and no longer documented. Let's drop it accordingly!</p> Orchestrator - Documentation #53997 (Resolved): cephadm spec: document "config" keyhttps://tracker.ceph.com/issues/539972022-01-24T15:19:53ZSebastian Wagner
<p>Users can specify config options for a particular service:</p>
<pre><code class="yaml syntaxhl"><span class="CodeRay"><span class="key">service_type</span>: <span class="string"><span class="content">mds</span></span>
<span class="key">service_id</span>: <span class="string"><span class="content">fsname</span></span>
<span class="key">placement</span>:
<span class="key">count</span>: <span class="string"><span class="content">2</span></span>
<span class="key">config</span>:
<span class="key">mds_cache_memory_limit</span>: <span class="string"><span class="content">8Gi</span></span>
</span></code></pre>
<p>This is also going to be cleaned up properly. Let's document it!</p> Orchestrator - Feature #53967 (New): support minor downgradeshttps://tracker.ceph.com/issues/539672022-01-21T16:47:54ZSebastian Wagner
<p>There is a general (needs to be properly supported by all components!) demand to support minor downgrades. For that we have to solve things like</p>
<pre>
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.359+0000 7f1657d84700 0 log_channel(cephadm) log [DBG] : mgr option log_to_file = False
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.359+0000 7f1657d84700 0 log_channel(cephadm) log [DBG] : mgr option log_to_cluster_level = debug
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.374+0000 7f1657d84700 -1 mgr load Failed to construct class in 'cephadm'
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.374+0000 7f1657d84700 -1 mgr load Traceback (most recent call last):
Jan 21 15:09:08 standalone.localdomain conmon[286544]: File "/usr/share/ceph/mgr/cephadm/module.py", line 405, in __init__
Jan 21 15:09:08 standalone.localdomain conmon[286544]: self.upgrade = CephadmUpgrade(self)
Jan 21 15:09:08 standalone.localdomain conmon[286544]: File "/usr/share/ceph/mgr/cephadm/upgrade.py", line 76, in __init__
Jan 21 15:09:08 standalone.localdomain conmon[286544]: self.upgrade_state: Optional[UpgradeState] = UpgradeState.from_json(json.loads(t))
Jan 21 15:09:08 standalone.localdomain conmon[286544]: File "/usr/share/ceph/mgr/cephadm/upgrade.py", line 58, in from_json
Jan 21 15:09:08 standalone.localdomain conmon[286544]: return cls(**c)
Jan 21 15:09:08 standalone.localdomain conmon[286544]: TypeError: __init__() got an unexpected keyword argument 'fs_original_allow_standby_replay'
Jan 21 15:09:08 standalone.localdomain conmon[286544]:
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.379+0000 7f1657d84700 -1 mgr operator() Failed to run module in active mode ('cephadm')
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.380+0000 7f1657d84700 0 [crash DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.380+0000 7f1657d84700 1 mgr load Constructed class from module: crash
</pre> Orchestrator - Bug #53965 (New): cephadm: RGW container is crashing at 'rados_nobjects_list_next2...https://tracker.ceph.com/issues/539652022-01-21T15:53:15ZSebastian Wagner
<p>when upgrading from ceph-ansible, we're getting:</p>
<pre>
Jan 12 18:30:14 host conmon[12897]: debug 2022-01-12T17:30:14.112+0000 0 starting handler: beast
Jan 12 18:30:14 host conmon[12897]: ceph version 16.2.0-146.x (sha) pacific (stable)
Jan 12 18:30:14 host conmon[12897]: 1: /lib64/libpthread.so.0(+0x12c20) [0x7fb4b1e86c20]
Jan 12 18:30:14 host conmon[12897]: 2: gsignal()
Jan 12 18:30:14 host conmon[12897]: 3: abort()
Jan 12 18:30:14 host conmon[12897]: 4: /lib64/libstdc++.so.6(+0x9009b) [0x7fb4b0e7a09b]
Jan 12 18:30:14 host conmon[12897]: 5: /lib64/libstdc++.so.6(+0x9653c) [0x7fb4b0e8053c]
Jan 12 18:30:14 host conmon[12897]: 6: /lib64/libstdc++.so.6(+0x96597) [0x7fb4b0e80597]
Jan 12 18:30:14 host conmon[12897]: 7: /lib64/libstdc++.so.6(+0x967f8) [0x7fb4b0e807f8]
Jan 12 18:30:14 host conmon[12897]: 8: /lib64/librados.so.2(+0x3a4c0) [0x7fb4bc3f74c0]
Jan 12 18:30:14 host conmon[12897]: 9: /lib64/librados.so.2(+0x809d2) [0x7fb4bc43d9d2]
Jan 12 18:30:14 host conmon[12897]: 10: (librados::v14_2_0::IoCtx::nobjects_begin(librados::v14_2_0::ObjectCursor const&, ceph::buffer::v15_2_0::list const&)+0x5c) [0x7fb4bc43da5c]
Jan 12 18:30:14 host conmon[12897]: 11: (RGWSI_RADOS::Pool::List::init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWAccessListFilter*)+0x115) [0x7fb4bd26d8e5]
Jan 12 18:30:14 host conmon[12897]: 12: (RGWSI_SysObj_Core::pool_list_objects_init(rgw_pool const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWSI_SysObj::Pool::ListCtx*)+0x24b) [0x7fb4bcd9b0bb]
Jan 12 18:30:14 host conmon[12897]: 13: (RGWSI_MetaBackend_SObj::list_init(RGWSI_MetaBackend::Context*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1e0) [0x7fb4bd2609d0]
Jan 12 18:30:14 host conmon[12897]: 14: (RGWMetadataHandler_GenericMetaBE::list_keys_init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void**)+0x40) [0x7fb4bcead940]
Jan 12 18:30:14 host conmon[12897]: 15: (RGWMetadataManager::list_keys_init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void**)+0x73) [0x7fb4bceaff13]
Jan 12 18:30:14 host conmon[12897]: 16: (RGWMetadataManager::list_keys_init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void**)+0x3e) [0x7fb4bceaffce]
Jan 12 18:30:14 host conmon[12897]: 17: (RGWUserStatsCache::sync_all_users(optional_yield)+0x71) [0x7fb4bd052361]
Jan 12 18:30:14 host conmon[12897]: 18: (RGWUserStatsCache::UserSyncThread::entry()+0x10f) [0x7fb4bd05984f]
Jan 12 18:30:14 host conmon[12897]: 19: /lib64/libpthread.so.0(+0x817a) [0x7fb4b1e7c17a]
Jan 12 18:30:14 host conmon[12897]: 20: clone()
</pre>
<p>One possibility is that the pools might not have the rgw tag enabled.</p>
<p><strong>possible workaround</strong></p>
<p>add the rgw tag to all the rgw pools.</p>
<p><strong>known workaround</strong></p>
<p>change the cephx caps of all affected daemons from</p>
<pre>
mon allow *
mgr allow rw
osd allow rwx tag rgw *=*
</pre>
<p>to</p>
<pre>
mon allow *
mgr allow rw
osd allow rwx
</pre> Orchestrator - Bug #53939 (Resolved): ceph-nfs-upgrade, pacific: Upgrade Paused due to UPGRADE_RE...https://tracker.ceph.com/issues/539392022-01-19T16:07:48ZSebastian Wagner
<pre>
mon[102341]: : cluster [WRN] Health check failed: Upgrading daemon osd.0 on host smithi103 failed. (UPGRADE_REDEPLOY_DAEMON)
mon[66897]: cephadm 2022-01-18T16:27:48.439275+0000 mgr.smithi103.wyeocw (mgr.14712) 129 : cephadm [ERR] cephadm exited with an error code: 1, stderr:Redeploy daemon osd.0 ...
mon[66897]: Non-zero exit code 1 from systemctl start ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0
mon[66897]: systemctl: stderr Job for ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service failed because a timeout was exceeded.
mon[66897]: systemctl: stderr See "systemctl status ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service" and "journalctl -xe" for details.
mon[66897]: Traceback (most recent call last):
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 8615, in <module>
mon[66897]: main()
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 8603, in main
mon[66897]: r = ctx.func(ctx)
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 1790, in _default_image
mon[66897]: return func(ctx)
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 4603, in command_deploy
mon[66897]: ports=daemon_ports)
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 2715, in deploy_daemon
mon[66897]: c, osd_fsid=osd_fsid, ports=ports)
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 2960, in deploy_daemon_units
mon[66897]: call_throws(ctx, ['systemctl', 'start', unit_name])
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 1469, in call_throws
mon[66897]: raise RuntimeError(f'Failed command: {" ".join(command)}: {s}')
mon[66897]: RuntimeError: Failed command: systemctl start ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0: Job for ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service failed because a timeout was exceeded.
mon[66897]: See "systemctl status ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service" and "journalctl -xe" for details.
mon[66897]: Traceback (most recent call last):
mon[66897]: File "/usr/share/ceph/mgr/cephadm/serve.py", line 1402, in _remote_connection
mon[66897]: yield (conn, connr)
mon[66897]: File "/usr/share/ceph/mgr/cephadm/serve.py", line 1295, in _run_cephadm
mon[66897]: code, '\n'.join(err)))
mon[66897]: orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Redeploy daemon osd.0 ...
mon[66897]: Non-zero exit code 1 from systemctl start ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0
mon[66897]: systemctl: stderr Job for ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service failed because a timeout was exceeded.
mon[66897]: systemctl: stderr See "systemctl status ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service" and "journalctl -xe" for details.
mon[66897]: Traceback (most recent call last):
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 8615, in <module>
mon[66897]: main()
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 8603, in main
mon[66897]: r = ctx.func(ctx)
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 1790, in _default_image
mon[66897]: return func(ctx)
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 4603, in command_deploy
mon[66897]: ports=daemon_ports)
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 2715, in deploy_daemon
mon[66897]: c, osd_fsid=osd_fsid, ports=ports)
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 2960, in deploy_daemon_units
mon[66897]: call_throws(ctx, ['systemctl', 'start', unit_name])
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 1469, in call_throws
mon[66897]: raise RuntimeError(f'Failed command: {" ".join(command)}: {s}')
mon[66897]: RuntimeError: Failed command: systemctl start ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0: Job for ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service failed because a timeout was exceeded.
mon[66897]: See "systemctl status ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service" and "journalctl -xe" for details.
...
cephadm 2022-01-18T16:27:48.439412+0000 mgr.smithi103.wyeocw (mgr.14712) 130 : cephadm [ERR] Upgrade: Paused due to UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.0 on host smithi103 failed.
</pre>
<p><a class="external" href="https://pulpito.ceph.com/swagner-2022-01-18_15:34:53-rados:cephadm-wip-swagner2-testing-2022-01-18-1242-pacific-distro-default-smithi/6624255">https://pulpito.ceph.com/swagner-2022-01-18_15:34:53-rados:cephadm-wip-swagner2-testing-2022-01-18-1242-pacific-distro-default-smithi/6624255</a></p> Orchestrator - Bug #53904 (Duplicate): cephadm: ingress jobs stuckhttps://tracker.ceph.com/issues/539042022-01-17T16:07:38ZSebastian Wagner
<p><a class="external" href="https://pulpito.ceph.com/swagner-2022-01-17_12:42:04-orch:cephadm-wip-swagner-testing-2022-01-17-1014-distro-default-smithi/">https://pulpito.ceph.com/swagner-2022-01-17_12:42:04-orch:cephadm-wip-swagner-testing-2022-01-17-1014-distro-default-smithi/</a></p>
<pre>
2022-01-17T13:17:17.053 DEBUG:teuthology.orchestra.run.smithi155:> sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:1cdf02ebbbdd98a055173cbac4d0171328a564dc shell -c /etc/ceph/ceph.conf -k />
2022-01-17T13:17:17.054 DEBUG:teuthology.orchestra.run.smithi155:> for haproxy in `ceph orch ps | grep ^haproxy.nfs.foo. | awk '"'"'{print $1}'"'"'`; do
2022-01-17T13:17:17.054 DEBUG:teuthology.orchestra.run.smithi155:> ceph orch daemon stop $haproxy
2022-01-17T13:17:17.054 DEBUG:teuthology.orchestra.run.smithi155:> while ! ceph orch ps | grep $haproxy | grep stopped; do sleep 1 ; done
2022-01-17T13:17:17.055 DEBUG:teuthology.orchestra.run.smithi155:> cat /mnt/foo/testfile
2022-01-17T13:17:17.055 DEBUG:teuthology.orchestra.run.smithi155:> echo $haproxy > /mnt/foo/testfile
2022-01-17T13:17:17.055 DEBUG:teuthology.orchestra.run.smithi155:> sync
2022-01-17T13:17:17.055 DEBUG:teuthology.orchestra.run.smithi155:> ceph orch daemon start $haproxy
2022-01-17T13:17:17.056 DEBUG:teuthology.orchestra.run.smithi155:> while ! ceph orch ps | grep $haproxy | grep running; do sleep 1 ; done
2022-01-17T13:17:17.056 DEBUG:teuthology.orchestra.run.smithi155:> done
2022-01-17T13:17:17.056 DEBUG:teuthology.orchestra.run.smithi155:> '
</pre><br />...snip...<br /><pre>
2022-01-17T13:17:20.571 INFO:teuthology.orchestra.run.smithi155.stdout:Check with each haproxy down in turn...
2022-01-17T13:17:21.281 INFO:teuthology.orchestra.run.smithi155.stdout:Scheduled to stop haproxy.nfs.foo.smithi155.xhswck on host 'smithi155'
</pre><br />...snip...
<pre>
2022-01-17T13:17:36.893 INFO:teuthology.orchestra.run.smithi155.stdout:haproxy.nfs.foo.smithi155.xhswck smithi155 *:2049,9002 stopped 0s ago 79s - - <unknown> <un>
2022-01-17T13:17:36.898 INFO:teuthology.orchestra.run.smithi155.stdout:test
2022-01-17T13:17:37.528 INFO:teuthology.orchestra.run.smithi155.stdout:Scheduled to start haproxy.nfs.foo.smithi155.xhswck on host 'smithi155'
</pre><br />...snip...<br /><pre>
2022-01-17T13:17:53.182 INFO:teuthology.orchestra.run.smithi155.stdout:haproxy.nfs.foo.smithi155.xhswck smithi155 *:2049,9002 running (5s) 0s ago 95s - - 2.3.17-d1c9119 14b>
2022-01-17T13:17:53.519 INFO:teuthology.orchestra.run.smithi155.stdout:Scheduled to stop haproxy.nfs.foo.smithi162.mahcqs on host 'smithi162'
</pre><br />...snip...<br /><pre>
2022-01-17T13:18:07.810 INFO:teuthology.orchestra.run.smithi155.stdout:haproxy.nfs.foo.smithi162.mahcqs smithi162 *:2049,9002 stopped 0s ago 102s - - <unknown> <unk>
</pre><br />...snip..<br /><pre>
h[14066]: cephadm 2022-01-17T13:17:53.516345+0000 mgr.smithi155.uoijyc (mgr.14206) 339 : cephadm [INF] Schedule stop daemon haproxy.nfs.foo.smithi162.mahcqs
</pre>
<p>But I never see a start of haproxy.nfs.foo.smithi162.mahcqs again.</p> Orchestrator - Feature #53815 (Resolved): cephadm rm-cluster should delete log fileshttps://tracker.ceph.com/issues/538152022-01-10T11:22:41ZSebastian Wagner
<ul>
<li>/var/log/ceph/cephadm.log*</li>
<li>/var/log/ceph/<cluster-fsid></li>
</ul>