Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2022-01-28T15:55:57ZCeph
Redmine Orchestrator - Documentation #54056 (New): https://docs.ceph.com/en/latest/cephadm/install/#deplo...https://tracker.ceph.com/issues/540562022-01-28T15:55:57ZSebastian Wagner
<p><a class="external" href="https://docs.ceph.com/en/latest/cephadm/install/#deployment-in-an-isolated-environment">https://docs.ceph.com/en/latest/cephadm/install/#deployment-in-an-isolated-environment</a></p>
<ul>
<li>e.g. the monitoring stack and ingress is completely missing</li>
<li>link to the registry-login functionality</li>
</ul> Orchestrator - Documentation #54051 (New): cephadm: advise users how when when to change OSD specshttps://tracker.ceph.com/issues/540512022-01-28T12:24:42ZSebastian Wagner
<p>In general it works fine, there is just one caveat: we're not going to touch any already deployed OSDs</p>
<p>I'd recommend to stick to a particular osd layout, but adjust the placement for example<br />let's say testing the deployment on 1 host and then rolling it out on all the others</p>
<p>If a user changes other properties of the OSD spec, he has to understand that existing osds are not getting redeployed. E.g. changing the encraption flag of an OSD spec doesn't magically encrypt any OSDs. Only new OSDs will then be encrypted.</p> Orchestrator - Cleanup #54042 (New): move pybind/ceph_argparse.py and pybind/ceph_daemon.py into ...https://tracker.ceph.com/issues/540422022-01-27T15:28:18ZSebastian Wagner
<p>Why? because there is not reason we need those as extra RPM packages. We can simply remove those RPM packages and simply make use of python-common.</p>
<p>This is basically just a <strong>git mv src/pybind/ceph_argparse.py src/python_common/ceph/argparse.py</strong> plus some import fixes plus some ceph.spec cleanup.</p>
<p>Benefit is clear: as those modules are huge, we'd have the ability to properly refactor those finally</p> Orchestrator - Tasks #54011 (New): Improve osdspec previewshttps://tracker.ceph.com/issues/540112022-01-25T11:32:04ZSebastian Wagner
<p>There are three major things that needs to be improved for osd spec previews:</p>
<ul>
<li>Can we make it such that we don't actually need to ssh to remote hosts? And instead rely solely on the data cephadm already has available? This is needed in order to make use of it from the dashboard</li>
<li>can we get some decent testing coverage? I don't know if this is still properly working</li>
<li>Can we prettify the output it generates?</li>
</ul> Orchestrator - Cleanup #54002 (New): orchestrator interface: service type should be a Python Enumhttps://tracker.ceph.com/issues/540022022-01-24T16:36:01ZSebastian Wagner
<pre><code class="python syntaxhl"><span class="CodeRay"><span class="keyword">class</span> <span class="class">ServiceType</span>(enum.Enum):
mon = <span class="string"><span class="delimiter">'</span><span class="content">mon</span><span class="delimiter">'</span></span>
mgr = <span class="string"><span class="delimiter">'</span><span class="content">mgr</span><span class="delimiter">'</span></span>
rbd_mirror = <span class="string"><span class="delimiter">'</span><span class="content">rbd-mirror</span><span class="delimiter">'</span></span>
cephfs_mirror = <span class="string"><span class="delimiter">'</span><span class="content">cephfs-mirror</span><span class="delimiter">'</span></span>
crash = <span class="string"><span class="delimiter">'</span><span class="content">crash</span><span class="delimiter">'</span></span>
alertmanager = <span class="string"><span class="delimiter">'</span><span class="content">alertmanager</span><span class="delimiter">'</span></span>
grafana = <span class="string"><span class="delimiter">'</span><span class="content">grafana</span><span class="delimiter">'</span></span>
node_exporter = <span class="string"><span class="delimiter">'</span><span class="content">node-exporter</span><span class="delimiter">'</span></span>
prometheus = <span class="string"><span class="delimiter">'</span><span class="content">prometheus</span><span class="delimiter">'</span></span>
mds = <span class="string"><span class="delimiter">'</span><span class="content">mds</span><span class="delimiter">'</span></span>
rgw = <span class="string"><span class="delimiter">'</span><span class="content">rgw</span><span class="delimiter">'</span></span>
nfs = <span class="string"><span class="delimiter">'</span><span class="content">nfs</span><span class="delimiter">'</span></span>
iscsi = <span class="string"><span class="delimiter">'</span><span class="content">iscsi</span><span class="delimiter">'</span></span>
snmp_gateway = <span class="string"><span class="delimiter">'</span><span class="content">snmp-gateway</span><span class="delimiter">'</span></span>
</span></code></pre>
<p>should be used throughout the whole cephadm code base. this makes things a bit more clearer.</p>
<p>Is it worth it? I don't know.</p> Orchestrator - Cleanup #54001 (New): type safe Python mon_command API clientshttps://tracker.ceph.com/issues/540012022-01-24T16:13:41ZSebastian Wagner
<p>like for the last 6 years, I always wondered, why we don't have an type<br />safe way to use the mon command api. Turns out I never had the time to<br />actually work on it. But my idea looks more or less like so:</p>
<p><a class="external" href="https://pad.ceph.com/p/mon_command_python_api">https://pad.ceph.com/p/mon_command_python_api</a></p>
<p>what do you think? Is there any use for something like this?</p>
<pre><code class="python syntaxhl"><span class="CodeRay"><span class="comment"># mon_command_api.py</span>
<span class="keyword">class</span> <span class="class">CommandResult</span>(NamedTuple):
ret: <span class="predefined">int</span>
stdout: <span class="predefined">str</span>
stderr: <span class="predefined">str</span>
<span class="keyword">def</span> <span class="function">Call</span>(NamedTuple):
api: MonCommandApi
prefix: <span class="predefined">str</span>
<span class="keyword">def</span> <span class="function">__call__</span>(<span class="predefined-constant">self</span>, **kwargs):
kwargs[<span class="string"><span class="delimiter">'</span><span class="content">prefix</span><span class="delimiter">'</span></span>] = prefix
<span class="keyword">return</span> CommandResult(<span class="predefined-constant">self</span>.api._mon_command(kwargs))
<span class="keyword">class</span> <span class="class">MonCommandApi</span>:
<span class="keyword">def</span> <span class="function">__init__</span>(<span class="predefined-constant">self</span>, rados):
<span class="predefined-constant">self</span>.rados = rados
<span class="keyword">def</span> <span class="function">_mon_command</span>(<span class="predefined-constant">self</span>, <span class="predefined">dict</span>):
<span class="keyword">return</span> CommandResult(<span class="predefined-constant">self</span>.rados.mon_command(<span class="predefined">dict</span>))
<span class="keyword">def</span> <span class="function">__get_attr__</span>(<span class="predefined-constant">self</span>, aname):
<span class="keyword">return</span> Call(<span class="predefined-constant">self</span>, aname.replace(<span class="string"><span class="delimiter">'</span><span class="content">_</span><span class="delimiter">'</span></span>, <span class="string"><span class="delimiter">'</span><span class="content"> </span><span class="delimiter">'</span></span>))
<span class="comment"># mon_command_api.pyi. Autogenerated by a script, exactly like https://docs.ceph.com/en/latest/api/mon_command_api/</span>
<span class="keyword">class</span> <span class="class">MonCommandApi</span>:
<span class="keyword">def</span> <span class="function">__init__</span>(<span class="predefined-constant">self</span>, rados): <span class="predefined-constant">None</span>:
<span class="predefined-constant">self</span>.rados = rados
<span class="keyword">def</span> <span class="function">_mon_command</span>(<span class="predefined-constant">self</span>, <span class="predefined">dict</span>) -> CommandResult:
...
<span class="keyword">def</span> <span class="function">orch_rm_daemon</span>(name: <span class="predefined">str</span>) -> CommandResult
...
<span class="keyword">def</span> <span class="function">osd_create_pool</span>(pool: <span class="predefined">str</span>, pg_num: <span class="predefined">int</span>, size: <span class="predefined">int</span>)
...
<span class="comment"># example use:</span>
<span class="keyword">def</span> <span class="function">orch_rm_daemon</span>(daemon_name):
api = MonCommandApi(...)
<span class="keyword">return</span> api.orch_rm_daemon(name=daemon_name)
<span class="keyword">def</span> <span class="function">other_function</span>(daemon_name):
api = MonCommandApi(...)
<span class="keyword">return</span> api.osd_create_pool(pool=<span class="string"><span class="delimiter">'</span><span class="content">my_pool</span><span class="delimiter">'</span></span>, pg_num=<span class="integer">42</span>, size=<span class="integer">3</span>)
</span></code></pre> Orchestrator - Cleanup #53998 (New): Drop support for old-style service spechttps://tracker.ceph.com/issues/539982022-01-24T15:25:56ZSebastian Wagner
<pre><code class="yaml syntaxhl"><span class="CodeRay"><span class="key">service_type</span>: <span class="string"><span class="content">nfs</span></span>
<span class="key">service_id</span>: <span class="string"><span class="content">foo</span></span>
<span class="key">pool</span>: <span class="string"><span class="content">mypool</span></span>
<span class="key">namespace</span>: <span class="string"><span class="content">myns</span></span>
</span></code></pre>
<p>I.e. without any indentation. This is irritating for more complicated specs and already deprecated and no longer documented. Let's drop it accordingly!</p> Orchestrator - Feature #53967 (New): support minor downgradeshttps://tracker.ceph.com/issues/539672022-01-21T16:47:54ZSebastian Wagner
<p>There is a general (needs to be properly supported by all components!) demand to support minor downgrades. For that we have to solve things like</p>
<pre>
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.359+0000 7f1657d84700 0 log_channel(cephadm) log [DBG] : mgr option log_to_file = False
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.359+0000 7f1657d84700 0 log_channel(cephadm) log [DBG] : mgr option log_to_cluster_level = debug
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.374+0000 7f1657d84700 -1 mgr load Failed to construct class in 'cephadm'
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.374+0000 7f1657d84700 -1 mgr load Traceback (most recent call last):
Jan 21 15:09:08 standalone.localdomain conmon[286544]: File "/usr/share/ceph/mgr/cephadm/module.py", line 405, in __init__
Jan 21 15:09:08 standalone.localdomain conmon[286544]: self.upgrade = CephadmUpgrade(self)
Jan 21 15:09:08 standalone.localdomain conmon[286544]: File "/usr/share/ceph/mgr/cephadm/upgrade.py", line 76, in __init__
Jan 21 15:09:08 standalone.localdomain conmon[286544]: self.upgrade_state: Optional[UpgradeState] = UpgradeState.from_json(json.loads(t))
Jan 21 15:09:08 standalone.localdomain conmon[286544]: File "/usr/share/ceph/mgr/cephadm/upgrade.py", line 58, in from_json
Jan 21 15:09:08 standalone.localdomain conmon[286544]: return cls(**c)
Jan 21 15:09:08 standalone.localdomain conmon[286544]: TypeError: __init__() got an unexpected keyword argument 'fs_original_allow_standby_replay'
Jan 21 15:09:08 standalone.localdomain conmon[286544]:
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.379+0000 7f1657d84700 -1 mgr operator() Failed to run module in active mode ('cephadm')
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.380+0000 7f1657d84700 0 [crash DEBUG root] setting log level based on debug_mgr: WARNING (1/5)
Jan 21 15:09:08 standalone.localdomain conmon[286544]: debug 2022-01-21T15:09:08.380+0000 7f1657d84700 1 mgr load Constructed class from module: crash
</pre> Orchestrator - Bug #53965 (New): cephadm: RGW container is crashing at 'rados_nobjects_list_next2...https://tracker.ceph.com/issues/539652022-01-21T15:53:15ZSebastian Wagner
<p>when upgrading from ceph-ansible, we're getting:</p>
<pre>
Jan 12 18:30:14 host conmon[12897]: debug 2022-01-12T17:30:14.112+0000 0 starting handler: beast
Jan 12 18:30:14 host conmon[12897]: ceph version 16.2.0-146.x (sha) pacific (stable)
Jan 12 18:30:14 host conmon[12897]: 1: /lib64/libpthread.so.0(+0x12c20) [0x7fb4b1e86c20]
Jan 12 18:30:14 host conmon[12897]: 2: gsignal()
Jan 12 18:30:14 host conmon[12897]: 3: abort()
Jan 12 18:30:14 host conmon[12897]: 4: /lib64/libstdc++.so.6(+0x9009b) [0x7fb4b0e7a09b]
Jan 12 18:30:14 host conmon[12897]: 5: /lib64/libstdc++.so.6(+0x9653c) [0x7fb4b0e8053c]
Jan 12 18:30:14 host conmon[12897]: 6: /lib64/libstdc++.so.6(+0x96597) [0x7fb4b0e80597]
Jan 12 18:30:14 host conmon[12897]: 7: /lib64/libstdc++.so.6(+0x967f8) [0x7fb4b0e807f8]
Jan 12 18:30:14 host conmon[12897]: 8: /lib64/librados.so.2(+0x3a4c0) [0x7fb4bc3f74c0]
Jan 12 18:30:14 host conmon[12897]: 9: /lib64/librados.so.2(+0x809d2) [0x7fb4bc43d9d2]
Jan 12 18:30:14 host conmon[12897]: 10: (librados::v14_2_0::IoCtx::nobjects_begin(librados::v14_2_0::ObjectCursor const&, ceph::buffer::v15_2_0::list const&)+0x5c) [0x7fb4bc43da5c]
Jan 12 18:30:14 host conmon[12897]: 11: (RGWSI_RADOS::Pool::List::init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWAccessListFilter*)+0x115) [0x7fb4bd26d8e5]
Jan 12 18:30:14 host conmon[12897]: 12: (RGWSI_SysObj_Core::pool_list_objects_init(rgw_pool const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWSI_SysObj::Pool::ListCtx*)+0x24b) [0x7fb4bcd9b0bb]
Jan 12 18:30:14 host conmon[12897]: 13: (RGWSI_MetaBackend_SObj::list_init(RGWSI_MetaBackend::Context*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1e0) [0x7fb4bd2609d0]
Jan 12 18:30:14 host conmon[12897]: 14: (RGWMetadataHandler_GenericMetaBE::list_keys_init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void**)+0x40) [0x7fb4bcead940]
Jan 12 18:30:14 host conmon[12897]: 15: (RGWMetadataManager::list_keys_init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void**)+0x73) [0x7fb4bceaff13]
Jan 12 18:30:14 host conmon[12897]: 16: (RGWMetadataManager::list_keys_init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void**)+0x3e) [0x7fb4bceaffce]
Jan 12 18:30:14 host conmon[12897]: 17: (RGWUserStatsCache::sync_all_users(optional_yield)+0x71) [0x7fb4bd052361]
Jan 12 18:30:14 host conmon[12897]: 18: (RGWUserStatsCache::UserSyncThread::entry()+0x10f) [0x7fb4bd05984f]
Jan 12 18:30:14 host conmon[12897]: 19: /lib64/libpthread.so.0(+0x817a) [0x7fb4b1e7c17a]
Jan 12 18:30:14 host conmon[12897]: 20: clone()
</pre>
<p>One possibility is that the pools might not have the rgw tag enabled.</p>
<p><strong>possible workaround</strong></p>
<p>add the rgw tag to all the rgw pools.</p>
<p><strong>known workaround</strong></p>
<p>change the cephx caps of all affected daemons from</p>
<pre>
mon allow *
mgr allow rw
osd allow rwx tag rgw *=*
</pre>
<p>to</p>
<pre>
mon allow *
mgr allow rw
osd allow rwx
</pre> Orchestrator - Bug #53939 (Resolved): ceph-nfs-upgrade, pacific: Upgrade Paused due to UPGRADE_RE...https://tracker.ceph.com/issues/539392022-01-19T16:07:48ZSebastian Wagner
<pre>
mon[102341]: : cluster [WRN] Health check failed: Upgrading daemon osd.0 on host smithi103 failed. (UPGRADE_REDEPLOY_DAEMON)
mon[66897]: cephadm 2022-01-18T16:27:48.439275+0000 mgr.smithi103.wyeocw (mgr.14712) 129 : cephadm [ERR] cephadm exited with an error code: 1, stderr:Redeploy daemon osd.0 ...
mon[66897]: Non-zero exit code 1 from systemctl start ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0
mon[66897]: systemctl: stderr Job for ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service failed because a timeout was exceeded.
mon[66897]: systemctl: stderr See "systemctl status ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service" and "journalctl -xe" for details.
mon[66897]: Traceback (most recent call last):
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 8615, in <module>
mon[66897]: main()
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 8603, in main
mon[66897]: r = ctx.func(ctx)
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 1790, in _default_image
mon[66897]: return func(ctx)
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 4603, in command_deploy
mon[66897]: ports=daemon_ports)
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 2715, in deploy_daemon
mon[66897]: c, osd_fsid=osd_fsid, ports=ports)
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 2960, in deploy_daemon_units
mon[66897]: call_throws(ctx, ['systemctl', 'start', unit_name])
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 1469, in call_throws
mon[66897]: raise RuntimeError(f'Failed command: {" ".join(command)}: {s}')
mon[66897]: RuntimeError: Failed command: systemctl start ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0: Job for ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service failed because a timeout was exceeded.
mon[66897]: See "systemctl status ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service" and "journalctl -xe" for details.
mon[66897]: Traceback (most recent call last):
mon[66897]: File "/usr/share/ceph/mgr/cephadm/serve.py", line 1402, in _remote_connection
mon[66897]: yield (conn, connr)
mon[66897]: File "/usr/share/ceph/mgr/cephadm/serve.py", line 1295, in _run_cephadm
mon[66897]: code, '\n'.join(err)))
mon[66897]: orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Redeploy daemon osd.0 ...
mon[66897]: Non-zero exit code 1 from systemctl start ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0
mon[66897]: systemctl: stderr Job for ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service failed because a timeout was exceeded.
mon[66897]: systemctl: stderr See "systemctl status ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service" and "journalctl -xe" for details.
mon[66897]: Traceback (most recent call last):
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 8615, in <module>
mon[66897]: main()
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 8603, in main
mon[66897]: r = ctx.func(ctx)
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 1790, in _default_image
mon[66897]: return func(ctx)
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 4603, in command_deploy
mon[66897]: ports=daemon_ports)
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 2715, in deploy_daemon
mon[66897]: c, osd_fsid=osd_fsid, ports=ports)
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 2960, in deploy_daemon_units
mon[66897]: call_throws(ctx, ['systemctl', 'start', unit_name])
mon[66897]: File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 1469, in call_throws
mon[66897]: raise RuntimeError(f'Failed command: {" ".join(command)}: {s}')
mon[66897]: RuntimeError: Failed command: systemctl start ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0: Job for ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service failed because a timeout was exceeded.
mon[66897]: See "systemctl status ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service" and "journalctl -xe" for details.
...
cephadm 2022-01-18T16:27:48.439412+0000 mgr.smithi103.wyeocw (mgr.14712) 130 : cephadm [ERR] Upgrade: Paused due to UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.0 on host smithi103 failed.
</pre>
<p><a class="external" href="https://pulpito.ceph.com/swagner-2022-01-18_15:34:53-rados:cephadm-wip-swagner2-testing-2022-01-18-1242-pacific-distro-default-smithi/6624255">https://pulpito.ceph.com/swagner-2022-01-18_15:34:53-rados:cephadm-wip-swagner2-testing-2022-01-18-1242-pacific-distro-default-smithi/6624255</a></p> Orchestrator - Bug #53904 (Duplicate): cephadm: ingress jobs stuckhttps://tracker.ceph.com/issues/539042022-01-17T16:07:38ZSebastian Wagner
<p><a class="external" href="https://pulpito.ceph.com/swagner-2022-01-17_12:42:04-orch:cephadm-wip-swagner-testing-2022-01-17-1014-distro-default-smithi/">https://pulpito.ceph.com/swagner-2022-01-17_12:42:04-orch:cephadm-wip-swagner-testing-2022-01-17-1014-distro-default-smithi/</a></p>
<pre>
2022-01-17T13:17:17.053 DEBUG:teuthology.orchestra.run.smithi155:> sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:1cdf02ebbbdd98a055173cbac4d0171328a564dc shell -c /etc/ceph/ceph.conf -k />
2022-01-17T13:17:17.054 DEBUG:teuthology.orchestra.run.smithi155:> for haproxy in `ceph orch ps | grep ^haproxy.nfs.foo. | awk '"'"'{print $1}'"'"'`; do
2022-01-17T13:17:17.054 DEBUG:teuthology.orchestra.run.smithi155:> ceph orch daemon stop $haproxy
2022-01-17T13:17:17.054 DEBUG:teuthology.orchestra.run.smithi155:> while ! ceph orch ps | grep $haproxy | grep stopped; do sleep 1 ; done
2022-01-17T13:17:17.055 DEBUG:teuthology.orchestra.run.smithi155:> cat /mnt/foo/testfile
2022-01-17T13:17:17.055 DEBUG:teuthology.orchestra.run.smithi155:> echo $haproxy > /mnt/foo/testfile
2022-01-17T13:17:17.055 DEBUG:teuthology.orchestra.run.smithi155:> sync
2022-01-17T13:17:17.055 DEBUG:teuthology.orchestra.run.smithi155:> ceph orch daemon start $haproxy
2022-01-17T13:17:17.056 DEBUG:teuthology.orchestra.run.smithi155:> while ! ceph orch ps | grep $haproxy | grep running; do sleep 1 ; done
2022-01-17T13:17:17.056 DEBUG:teuthology.orchestra.run.smithi155:> done
2022-01-17T13:17:17.056 DEBUG:teuthology.orchestra.run.smithi155:> '
</pre><br />...snip...<br /><pre>
2022-01-17T13:17:20.571 INFO:teuthology.orchestra.run.smithi155.stdout:Check with each haproxy down in turn...
2022-01-17T13:17:21.281 INFO:teuthology.orchestra.run.smithi155.stdout:Scheduled to stop haproxy.nfs.foo.smithi155.xhswck on host 'smithi155'
</pre><br />...snip...
<pre>
2022-01-17T13:17:36.893 INFO:teuthology.orchestra.run.smithi155.stdout:haproxy.nfs.foo.smithi155.xhswck smithi155 *:2049,9002 stopped 0s ago 79s - - <unknown> <un>
2022-01-17T13:17:36.898 INFO:teuthology.orchestra.run.smithi155.stdout:test
2022-01-17T13:17:37.528 INFO:teuthology.orchestra.run.smithi155.stdout:Scheduled to start haproxy.nfs.foo.smithi155.xhswck on host 'smithi155'
</pre><br />...snip...<br /><pre>
2022-01-17T13:17:53.182 INFO:teuthology.orchestra.run.smithi155.stdout:haproxy.nfs.foo.smithi155.xhswck smithi155 *:2049,9002 running (5s) 0s ago 95s - - 2.3.17-d1c9119 14b>
2022-01-17T13:17:53.519 INFO:teuthology.orchestra.run.smithi155.stdout:Scheduled to stop haproxy.nfs.foo.smithi162.mahcqs on host 'smithi162'
</pre><br />...snip...<br /><pre>
2022-01-17T13:18:07.810 INFO:teuthology.orchestra.run.smithi155.stdout:haproxy.nfs.foo.smithi162.mahcqs smithi162 *:2049,9002 stopped 0s ago 102s - - <unknown> <unk>
</pre><br />...snip..<br /><pre>
h[14066]: cephadm 2022-01-17T13:17:53.516345+0000 mgr.smithi155.uoijyc (mgr.14206) 339 : cephadm [INF] Schedule stop daemon haproxy.nfs.foo.smithi162.mahcqs
</pre>
<p>But I never see a start of haproxy.nfs.foo.smithi162.mahcqs again.</p> Orchestrator - Bug #53624 (Resolved): cephadm agent: set_store mon returned -27: error: entry siz...https://tracker.ceph.com/issues/536242021-12-16T09:21:34ZSebastian Wagner
<pre>
+p08UJQl6kAB3NtutBnSLaukwDQYJKoZIhvcNAQEL\nBQAwFzEVMBMGA1UEAwwMY2VwaGFkbS1yb290MB4XDTIxMTIxNjAyMjAzOFoXDTMx\nMTIxNzAyMjAzOFowFzEVMBMGA1UEAwwMY2VwaGFkbS1yb290MIICIjANBgkqhkiG\n9w0BAQEFAAOCAg8AMIICCgKCAgEAzqwANUb3
zFx4981X8YjxlvEPjLITotzyfgu0\neX6FdQTEfEiJjpv8Ikgq7QSKj0mf43J0uy/Y/+OENuHIdPSvtE3/ICQ7qX8HeOAS\noYS41vpAb8la9AODkohMlZa7OogPeoUpnU2Da2E4gUkxMcduMw3CsbF6Url8tiMY\nAqJyNUjQyZN9ah2NUsU4Bqhd2bXBSObDuKqA8M1M/nGwouOyQ
eVZmn1A4yQ3AOS8\nYb+fA/EWQtHZsdyNNBIU1bI68BaSSEtx00Fc1TFPiYf9JTZo5hT2sO3mBUKI/6uP\nwlIr3w23R2vP80WhbpLIZSSbZuQ0uyKc9XWqGvC4rX6VgWgs8xXW2FQGPfhzSWoL\nK4t9tafT8mLlokP5uelA4Da/JyrMf+U+ntEHbKjt/Jns8cR3dEEQQgzKh1197I
NZ\nq3lMHKOwVrkko2rwXaR4BTiv/9YDX1G/10Wc38U7V4Nyuc1MgShaBoy2GyjKl3Cy\nmiz3Jg0pehvqOjFTAfhqsbbocESyxWqZTZRK688NH7+EDOZT6gCEcCmhYJoNFhi1\nc2MJ2Zb4WgAanRcOto6LEYZ0kQQqZhce94RXKkJ265VZOFKcFqymGKAoxBhuBIAK\nJrmYvd1oE
GYROpHVL16tOzlyXxiNR+8zEMUWcRz9JmyE1ZwUhGqB4I2y9xGSsVlC\nK3ckXsECAwEAAaMkMCIwDwYDVR0RBAgwBocErBUCZjAPBgNVHRMBAf8EBTADAQH/\nMA0GCSqGSIb3DQEBCwUAA4ICAQAee7nTsfD5oQMYiZQA0qjeiAYDpG1bPFtyHxZL\n/TY96maIvBxqEx+haOmKvV
RTrV080NIo9VsftiLoFYj5M19bufkMlM764viIKOPE\n0fIHqCDvIresYuOIwB7WGqwiIzp4vQv/7QPqZdXRRW5WUN2fh2EhsSPlxBWoOwMf\nBNqsRwrYOxBzcXjeLJGGO4wmRa5AsFBwZC8cho+bc1W7FJUpXkZVnXmUPXx6ZzA8\nxvXBZfJgQ/2ya0uDBvXXwkxX1/7FHK4IsI7
gCGWHv95b5N2nM8A0Y8p5YvDDHnX2\nhnBmr97LdIfqgzScjblnNeQfNQWJj2kcuY0hjtHxb1Pg/t24N8JsJS1Jp5vshcbS\n8jWWK8wpUIkihDQCezY/5hUfhvQz4RozrAXgQKxAvGaXqWw2RXkwf7cbcXxIlS6A\nrt4L2Vg3xWS/2DIhAFwk8W53F0du3MfFOLIgVuBTOgbghapo
wqppo0qYtvH3nb6Y\nXhK12rDLzQvueQyxZZkjQ6UtYk1++eL+PojYMWFYYFesSayTs3xGi/6wvouVQb6v\njW8GeO2cf1mDJFCNDxAmtDoPCOJfT/j8dy8gCvkT0ypzdm69HqjO+sf8+Ik2VlrR\n3rJbNDd1WwP0o2CYLSIe2KOcQUvX505ojDFznTiLeUJ728Yd2qsnHTIOOK7NJ0x9\nd8hCvw==\n-----END CERTIFICATE-----\n", "172.21.2.102", "7150", "False"], "last_config": "2021-12-16T02:21:06.723847Z"}, "osd.1024": {"deps": [], "last_config": "2021-12-12T22:53:04.829290Z"}, "osd.1025": {"deps": [], "last_config": "2021-12-12T22:54:43.028992Z"}, "osd.1026": {"deps": [], "last_config": "2021-12-12T22:55:48.651847Z"}, "osd.1027": {"deps": [], "last_config": "2021-12-12T22:57:10.880016Z"}, "osd.1028": {"deps": [], "last_config": "2021-12-12T23:00:35.405950Z"}}, "last_daemon_update": "2021-12-16T02:27:35.437721Z", "last_device_update": "2021-12-12T22:52:34.049019Z", "last_network_update": "2021-12-16T02:27:15.196650Z", "last_device_change": "2021-12-12T22:44:01.089290Z", "networks_and_interfaces": {"172.21.0.0/20": {"enp1s0f0": ["172.21.2.145"]}, "fe80::/64": {"enp1s0f0": ["fe80::3eec:efff:fe3d:91b4"]}}, "last_host_check": "2021-12-16T02:22:44.926479Z", "last_client_files": {}, "scheduled_daemon_actions": {}, "agent_counter": 1, "agent_keys": "[client.agent.gibba045]\n\tkey = AQDuerZhDQN2GhAAMFdgQncZtlLlmjT/7IgaYA==\n"$` failed: (27) File too large
Dec 16 02:27:35 gibba002 ceph-mgr[421345]: mgr set_store mon returned -27: error: entry size limited to 65536 bytes. Use 'mon config key max entry size' to manually adjust
</pre> Orchestrator - Bug #53594 (Resolved): mgr/cephadm/upgrade.py: normalize_image_digest has a hard c...https://tracker.ceph.com/issues/535942021-12-13T10:14:08ZSebastian Wagner
<p><a class="external" href="https://github.com/ceph/ceph/blob/84f88eaec44103edd377817e264d5d376df8c554/src/pybind/mgr/cephadm/upgrade.py#L34">https://github.com/ceph/ceph/blob/84f88eaec44103edd377817e264d5d376df8c554/src/pybind/mgr/cephadm/upgrade.py#L34</a></p>
<p>I mean it's clearly wrong as this depends on the search-regiestries setting of the hosts and is not a constant.</p>
<p>Can we drop this "normalizing" step altogether?</p>
<p>Still, we have to avoid creating a regression to <a class="external" href="https://github.com/ceph/ceph/pull/40577">https://github.com/ceph/ceph/pull/40577</a></p> Orchestrator - Feature #53562 (New): cephadm doesn't support osd crush_location_hookhttps://tracker.ceph.com/issues/535622021-12-09T11:59:53ZSebastian Wagner
<p>crush_location_hook is a path to an executable that is executed in order to update the current OSD's crush location. Executed like so:</p>
<pre>
$crush_location_hook --cluster {cluster-name} --id {ID} --type {daemon-type}
</pre>
<p>and prints out the current crush locations.</p>
<p>Workarounds:</p>
<ul>
<li>For a per-host based location, we have: <a class="external" href="https://docs.ceph.com/en/latest/cephadm/host-management/#setting-the-initial-crush-location-of-host">https://docs.ceph.com/en/latest/cephadm/host-management/#setting-the-initial-crush-location-of-host</a> which should cover a lot of use cases.</li>
<li>Build a new container image locally and add the crush_location_hook executable to it. Then set the config option to the file path within the container</li>
</ul> Orchestrator - Bug #53541 (Resolved): permissions too open on the cephadm agent files (644) - inc...https://tracker.ceph.com/issues/535412021-12-08T14:24:38ZSebastian Wagner
<p>permissions too open on the cephadm agent files (644) - includes certs and config</p>