Ceph : Issues
https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2022-01-28T15:55:57Z
Ceph
Redmine
Orchestrator - Documentation #54056 (New): https://docs.ceph.com/en/latest/cephadm/install/#deplo...
https://tracker.ceph.com/issues/54056
2022-01-28T15:55:57Z
Sebastian Wagner
<p><a class="external" href="https://docs.ceph.com/en/latest/cephadm/install/#deployment-in-an-isolated-environment">https://docs.ceph.com/en/latest/cephadm/install/#deployment-in-an-isolated-environment</a></p>
<ul>
<li>e.g. the monitoring stack and ingress is completely missing</li>
<li>link to the registry-login functionality</li>
</ul>
Orchestrator - Cleanup #54042 (New): move pybind/ceph_argparse.py and pybind/ceph_daemon.py into ...
https://tracker.ceph.com/issues/54042
2022-01-27T15:28:18Z
Sebastian Wagner
<p>Why? because there is not reason we need those as extra RPM packages. We can simply remove those RPM packages and simply make use of python-common.</p>
<p>This is basically just a <strong>git mv src/pybind/ceph_argparse.py src/python_common/ceph/argparse.py</strong> plus some import fixes plus some ceph.spec cleanup.</p>
<p>Benefit is clear: as those modules are huge, we'd have the ability to properly refactor those finally</p>
Orchestrator - Cleanup #54001 (New): type safe Python mon_command API clients
https://tracker.ceph.com/issues/54001
2022-01-24T16:13:41Z
Sebastian Wagner
<p>like for the last 6 years, I always wondered, why we don't have an type<br />safe way to use the mon command api. Turns out I never had the time to<br />actually work on it. But my idea looks more or less like so:</p>
<p><a class="external" href="https://pad.ceph.com/p/mon_command_python_api">https://pad.ceph.com/p/mon_command_python_api</a></p>
<p>what do you think? Is there any use for something like this?</p>
<pre><code class="python syntaxhl"><span class="CodeRay"><span class="comment"># mon_command_api.py</span>
<span class="keyword">class</span> <span class="class">CommandResult</span>(NamedTuple):
ret: <span class="predefined">int</span>
stdout: <span class="predefined">str</span>
stderr: <span class="predefined">str</span>
<span class="keyword">def</span> <span class="function">Call</span>(NamedTuple):
api: MonCommandApi
prefix: <span class="predefined">str</span>
<span class="keyword">def</span> <span class="function">__call__</span>(<span class="predefined-constant">self</span>, **kwargs):
kwargs[<span class="string"><span class="delimiter">'</span><span class="content">prefix</span><span class="delimiter">'</span></span>] = prefix
<span class="keyword">return</span> CommandResult(<span class="predefined-constant">self</span>.api._mon_command(kwargs))
<span class="keyword">class</span> <span class="class">MonCommandApi</span>:
<span class="keyword">def</span> <span class="function">__init__</span>(<span class="predefined-constant">self</span>, rados):
<span class="predefined-constant">self</span>.rados = rados
<span class="keyword">def</span> <span class="function">_mon_command</span>(<span class="predefined-constant">self</span>, <span class="predefined">dict</span>):
<span class="keyword">return</span> CommandResult(<span class="predefined-constant">self</span>.rados.mon_command(<span class="predefined">dict</span>))
<span class="keyword">def</span> <span class="function">__get_attr__</span>(<span class="predefined-constant">self</span>, aname):
<span class="keyword">return</span> Call(<span class="predefined-constant">self</span>, aname.replace(<span class="string"><span class="delimiter">'</span><span class="content">_</span><span class="delimiter">'</span></span>, <span class="string"><span class="delimiter">'</span><span class="content"> </span><span class="delimiter">'</span></span>))
<span class="comment"># mon_command_api.pyi. Autogenerated by a script, exactly like https://docs.ceph.com/en/latest/api/mon_command_api/</span>
<span class="keyword">class</span> <span class="class">MonCommandApi</span>:
<span class="keyword">def</span> <span class="function">__init__</span>(<span class="predefined-constant">self</span>, rados): <span class="predefined-constant">None</span>:
<span class="predefined-constant">self</span>.rados = rados
<span class="keyword">def</span> <span class="function">_mon_command</span>(<span class="predefined-constant">self</span>, <span class="predefined">dict</span>) -> CommandResult:
...
<span class="keyword">def</span> <span class="function">orch_rm_daemon</span>(name: <span class="predefined">str</span>) -> CommandResult
...
<span class="keyword">def</span> <span class="function">osd_create_pool</span>(pool: <span class="predefined">str</span>, pg_num: <span class="predefined">int</span>, size: <span class="predefined">int</span>)
...
<span class="comment"># example use:</span>
<span class="keyword">def</span> <span class="function">orch_rm_daemon</span>(daemon_name):
api = MonCommandApi(...)
<span class="keyword">return</span> api.orch_rm_daemon(name=daemon_name)
<span class="keyword">def</span> <span class="function">other_function</span>(daemon_name):
api = MonCommandApi(...)
<span class="keyword">return</span> api.osd_create_pool(pool=<span class="string"><span class="delimiter">'</span><span class="content">my_pool</span><span class="delimiter">'</span></span>, pg_num=<span class="integer">42</span>, size=<span class="integer">3</span>)
</span></code></pre>
Orchestrator - Cleanup #54000 (New): cephadm: upgrade commands should return yaml
https://tracker.ceph.com/issues/54000
2022-01-24T15:39:53Z
Sebastian Wagner
<p>Right now, commands like</p>
<ul>
<li>ceph orch upgrade ls</li>
<li>ceph orch upgrade status</li>
</ul>
<p>are returning json. YAML is much more readable. Let's return yaml instead!</p>
<p><strong>Note</strong>, now that needs a <strong>--format=...</strong> argument.</p>
Orchestrator - Cleanup #53999 (New): orch interface: cephadm contains a lot of special apply methods
https://tracker.ceph.com/issues/53999
2022-01-24T15:28:49Z
Sebastian Wagner
<pre><code class="python syntaxhl"><span class="CodeRay"> <span class="decorator">@handle_orch_error</span>
<span class="keyword">def</span> <span class="function">apply_rgw</span>(<span class="predefined-constant">self</span>, spec: ServiceSpec) -> <span class="predefined">str</span>:
<span class="keyword">return</span> <span class="predefined-constant">self</span>._apply(spec)
</span></code></pre>
<p>Let's remove them! It's getting out of hand by now. Just use <strong>apply</strong> for everything.</p>
Orchestrator - Bug #53422 (New): tasks.cephfs.test_nfs.TestNFS.test_export_create_with_non_existi...
https://tracker.ceph.com/issues/53422
2021-11-29T08:47:58Z
Sebastian Wagner
<p><a class="external" href="https://pulpito.ceph.com/swagner-2021-11-26_13:52:15-orch:cephadm-wip-swagner2-testing-2021-11-26-1129-distro-default-smithi/6528237">https://pulpito.ceph.com/swagner-2021-11-26_13:52:15-orch:cephadm-wip-swagner2-testing-2021-11-26-1129-distro-default-smithi/6528237</a></p>
<pre>
teuthology.orchestra.run.smithi145:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph orch ps --service_name=nfs.test
teuthology.orchestra.run.smithi145.stdout:No daemons reported
test_export_create_with_non_existing_fsname (tasks.cephfs.test_nfs.TestNFS) ... FAIL
======================================================================
FAIL: test_export_create_with_non_existing_fsname (tasks.cephfs.test_nfs.TestNFS)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_ceph-c_c9fd0972675c568f5d517be49820a12e77fab497/qa/tasks/cephfs/test_nfs.py", line 412, in test_export_create_with_non_existing_fsname
self._test_create_cluster()
File "/home/teuthworker/src/git.ceph.com_ceph-c_c9fd0972675c568f5d517be49820a12e77fab497/qa/tasks/cephfs/test_nfs.py", line 125, in _test_create_cluster
self._check_nfs_cluster_status('running', 'NFS Ganesha cluster deployment failed')
File "/home/teuthworker/src/git.ceph.com_ceph-c_c9fd0972675c568f5d517be49820a12e77fab497/qa/tasks/cephfs/test_nfs.py", line 89, in _check_nfs_cluster_status
self.fail(fail_msg)
AssertionError: NFS Ganesha cluster deployment failed
</pre>
<p>Looking at the log, we see the <strong>mfs.test</strong> cluster being created multiple times successfully, but it was never created right before the last test case:</p>
<pre>
2021-11-26T16:46:09: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:46:09: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.phdkqk
2021-11-26T16:46:09: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:46:09: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:46:09: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.phdkqk-rgw
2021-11-26T16:46:09: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.phdkqk on smithi145
2021-11-26T16:46:41: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:46:41: cephadm [INF] Remove service nfs.test
2021-11-26T16:46:41: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.phdkqk...
2021-11-26T16:46:41: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.phdkqk from smithi145
2021-11-26T16:46:45: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.phdkqk
2021-11-26T16:46:45: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.phdkqk-rgw
2021-11-26T16:46:45: cephadm [INF] Purge service nfs.test
2021-11-26T16:46:45: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:46:57: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:46:57: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.vgdnuw
2021-11-26T16:46:57: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:46:57: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:46:57: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.vgdnuw-rgw
2021-11-26T16:46:57: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.vgdnuw on smithi145
2021-11-26T16:47:51: cephadm [INF] Restart service nfs.test
2021-11-26T16:48:22: cephadm [INF] Restart service nfs.test
2021-11-26T16:49:02: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:49:02: cephadm [INF] Remove service nfs.test
2021-11-26T16:49:15: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.vgdnuw...
2021-11-26T16:49:15: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.vgdnuw from smithi145
2021-11-26T16:49:19: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.vgdnuw
2021-11-26T16:49:19: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.vgdnuw-rgw
2021-11-26T16:49:19: cephadm [INF] Purge service nfs.test
2021-11-26T16:49:19: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:49:47: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:49:47: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.heyhit
2021-11-26T16:49:47: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:49:47: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:49:47: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.heyhit-rgw
2021-11-26T16:49:47: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.heyhit on smithi145
2021-11-26T16:50:18: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:50:18: cephadm [INF] Remove service nfs.test
2021-11-26T16:50:18: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.heyhit...
2021-11-26T16:50:18: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.heyhit from smithi145
2021-11-26T16:50:22: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.heyhit
2021-11-26T16:50:22: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.heyhit-rgw
2021-11-26T16:50:22: cephadm [INF] Purge service nfs.test
2021-11-26T16:50:22: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:50:32: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:50:32: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.woegpi
2021-11-26T16:50:32: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:50:32: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:50:32: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.woegpi-rgw
2021-11-26T16:50:32: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.woegpi on smithi145
2021-11-26T16:51:19: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:51:19: cephadm [INF] Remove service nfs.test
2021-11-26T16:51:19: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.woegpi...
2021-11-26T16:51:19: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.woegpi from smithi145
2021-11-26T16:51:34: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.woegpi
2021-11-26T16:51:34: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.woegpi-rgw
2021-11-26T16:51:34: cephadm [INF] Purge service nfs.test
2021-11-26T16:51:34: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:51:56: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:51:56: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.uwhrnz
2021-11-26T16:51:56: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:51:56: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:51:56: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.uwhrnz-rgw
2021-11-26T16:51:56: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.uwhrnz on smithi145
2021-11-26T16:52:27: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:52:27: cephadm [INF] Remove service nfs.test
2021-11-26T16:52:27: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.uwhrnz...
2021-11-26T16:52:27: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.uwhrnz from smithi145
2021-11-26T16:52:32: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.uwhrnz
2021-11-26T16:52:32: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.uwhrnz-rgw
2021-11-26T16:52:32: cephadm [INF] Purge service nfs.test
2021-11-26T16:52:32: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:52:41: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:52:41: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.ccuxek
2021-11-26T16:52:41: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:52:41: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:52:41: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.ccuxek-rgw
2021-11-26T16:52:41: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.ccuxek on smithi145
2021-11-26T16:53:15: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:53:15: cephadm [INF] Remove service nfs.test
2021-11-26T16:53:15: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.ccuxek...
2021-11-26T16:53:15: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.ccuxek from smithi145
2021-11-26T16:53:19: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.ccuxek
2021-11-26T16:53:19: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.ccuxek-rgw
2021-11-26T16:53:19: cephadm [INF] Purge service nfs.test
2021-11-26T16:53:19: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:53:28: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:53:28: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.dduovn
2021-11-26T16:53:28: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:53:28: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:53:28: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.dduovn-rgw
2021-11-26T16:53:28: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.dduovn on smithi145
2021-11-26T16:54:10: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:54:10: cephadm [INF] Remove service nfs.test
2021-11-26T16:54:10: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.dduovn...
2021-11-26T16:54:10: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.dduovn from smithi145
2021-11-26T16:54:14: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.dduovn
2021-11-26T16:54:14: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.dduovn-rgw
2021-11-26T16:54:14: cephadm [INF] Purge service nfs.test
2021-11-26T16:54:14: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:54:23: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.dduovn...
2021-11-26T16:54:23: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.dduovn from smithi145
2021-11-26T16:54:24: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:54:25: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.mcpayf
2021-11-26T16:54:25: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:54:25: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:54:25: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.mcpayf-rgw
2021-11-26T16:54:25: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.mcpayf on smithi145
2021-11-26T16:55:00: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:55:00: cephadm [INF] Remove service nfs.test
2021-11-26T16:55:00: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.mcpayf...
2021-11-26T16:55:00: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.mcpayf from smithi145
2021-11-26T16:55:04: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.mcpayf
2021-11-26T16:55:04: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.mcpayf-rgw
2021-11-26T16:55:04: cephadm [INF] Purge service nfs.test
2021-11-26T16:55:04: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:55:17: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.mcpayf...
2021-11-26T16:55:17: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.mcpayf from smithi145
</pre>
Orchestrator - Cleanup #53276 (New): cephadm: don't download any images in `cephadm deploy`
https://tracker.ceph.com/issues/53276
2021-11-16T13:34:06Z
Sebastian Wagner
<p>I think it might make sense to make `deploy` avoid downloading any images and instead let systemd pull them. Let's avoid making ourselves a bottleneck. Deploying a daemon should be a fast operation.</p>
<p>Especially the gid/uid detection should be streamlined</p>
Orchestrator - Cleanup #52918 (New): two competing MTU checks
https://tracker.ceph.com/issues/52918
2021-10-13T16:07:31Z
Sebastian Wagner
<p><a class="external" href="https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/configchecks.py#L482">https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/configchecks.py#L482</a></p>
<p><a class="external" href="https://github.com/ceph/ceph/blob/master/monitoring/prometheus/alerts/ceph_default_alerts.yml#L235">https://github.com/ceph/ceph/blob/master/monitoring/prometheus/alerts/ceph_default_alerts.yml#L235</a></p>
Orchestrator - Bug #52888 (Rejected): ceph-mon: stderr too many arguments: [--default-log-to-jour...
https://tracker.ceph.com/issues/52888
2021-10-11T13:22:43Z
Sebastian Wagner
<pre>
cmd:
- cephadm
- --image
- quay.ceph.io/ceph-ci/daemon-base:latest-pacific-devel
- bootstrap
- --mon-ip
- 192.168.30.10
- --skip-pull
- --initial-dashboard-user
- admin
- --initial-dashboard-password
- VALUE_SPECIFIED_IN_NO_LOG_PARAMETER
- --skip-monitoring-stack
delta: '0:00:35.237981'
end: '2021-10-11 12:08:13.395685'
rc: 1
start: '2021-10-11 12:07:38.157704'
stderr: |-
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman (/bin/podman) version 3.2.3 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: d4061134-2a8b-11ec-bfc5-525400c05239
Verifying IP 192.168.30.10 port 3300 ...
Verifying IP 192.168.30.10 port 6789 ...
Mon IP `192.168.30.10` is in CIDR network `192.168.30.0/24`
- internal network (--cluster-network) has not been provided, OSD replication will default to the public_network
Ceph version: ceph version 16.2.6-210-gb56eac81 (b56eac81c3c91b10bbed736a3b40663b238eeb82) pacific (stable)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Non-zero exit code 1 from /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph-mon --init -e CONTAINER_IMAGE=quay.ceph.io/ceph-ci/daemon-base:latest-pacific-devel -e NODE_NAME=mon0 -e CEPH_USE_RANDOM_NONCE=1 -v /var/log/ceph/d4061134-2a8b-11ec-bfc5-525400c05239:/var/log/ceph:z -v /var/lib/ceph/d4061134-2a8b-11ec-bfc5-525400c05239/mon.mon0:/var/lib/ceph/mon/ceph-mon0:z -v /tmp/ceph-tmpi69j5lvo:/tmp/keyring:z -v /tmp/ceph-tmpmtuz6to4:/tmp/monmap:z quay.ceph.io/ceph-ci/daemon-base:latest-pacific-devel --mkfs -i mon0 --fsid d4061134-2a8b-11ec-bfc5-525400c05239 -c /dev/null --monmap /tmp/monmap --keyring /tmp/keyring --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-journald=true --default-log-to-stderr=false --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-journald=true --default-mon-cluster-log-to-stderr=false
/usr/bin/ceph-mon: stderr too many arguments: [--default-log-to-journald=true,--default-mon-cluster-log-to-journald=true]
Traceback (most recent call last):
File "/sbin/cephadm", line 8055, in <module>
main()
File "/sbin/cephadm", line 8043, in main
r = ctx.func(ctx)
File "/sbin/cephadm", line 1787, in _default_image
return func(ctx)
File "/sbin/cephadm", line 4499, in command_bootstrap
bootstrap_keyring.name, monmap.name)
File "/sbin/cephadm", line 4074, in prepare_create_mon
monmap_path: '/tmp/monmap:z',
File "/sbin/cephadm", line 3442, in run
desc=self.entrypoint, timeout=timeout)
File "/sbin/cephadm", line 1467, in call_throws
raise RuntimeError('Failed command: %s' % ' '.join(command))
RuntimeError: Failed command: /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph-mon --init -e CONTAINER_IMAGE=quay.ceph.io/ceph-ci/daemon-base:latest-pacific-devel -e NODE_NAME=mon0 -e CEPH_USE_RANDOM_NONCE=1 -v /var/log/ceph/d4061134-2a8b-11ec-bfc5-525400c05239:/var/log/ceph:z -v /var/lib/ceph/d4061134-2a8b-11ec-bfc5-525400c05239/mon.mon0:/var/lib/ceph/mon/ceph-mon0:z -v /tmp/ceph-tmpi69j5lvo:/tmp/keyring:z -v /tmp/ceph-tmpmtuz6to4:/tmp/monmap:z quay.ceph.io/ceph-ci/daemon-base:latest-pacific-devel --mkfs -i mon0 --fsid d4061134-2a8b-11ec-bfc5-525400c05239 -c /dev/null --monmap /tmp/monmap --keyring /tmp/keyring --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-journald=true --default-log-to-stderr=false --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-journald=true --default-mon-cluster-log-to-stderr=false
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
</pre>
<p><strong>Reason: This tries to use a quincy cephadm binary to bootstrap a pacific clutser.</strong></p>
Orchestrator - Bug #51806 (Need More Info): cephadm: stopped contains end up in error state
https://tracker.ceph.com/issues/51806
2021-07-22T15:26:52Z
Sebastian Wagner
<pre>
[ceph: root@sebastians-laptop /]# ceph orch stop node-exporter
Scheduled to stop node-exporter.sebastians-laptop on host 'sebastians-laptop'
[ceph: root@sebastians-laptop /]# ceph orch ps --service-name node-exporter
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID
node-exporter.sebastians-laptop sebastians-laptop *:9100 error 3m ago 4m - - <unknown> <unknown>
</pre>
<pre>
➜ cephadm git:(cephadm-container-name-dashes) ✗ sudo systemctl status ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop | cat
● ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service - Ceph node-exporter.sebastians-laptop for b2f78482-ead5-11eb-9ac0-482ae35a5fbb
Loaded: loaded (/etc/systemd/system/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2021-07-22 16:04:16 CEST; 14s ago
Process: 2907757 ExecStartPre=/bin/rm -f /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-pid /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-cid (code=exited, status=0/SUCCESS)
Process: 2907758 ExecStart=/bin/bash /var/lib/ceph/b2f78482-ead5-11eb-9ac0-482ae35a5fbb/node-exporter.sebastians-laptop/unit.run (code=exited, status=0/SUCCESS)
Main PID: 2907955 (conmon)
Tasks: 8 (limit: 38293)
Memory: 2.7M
CPU: 562ms
CGroup: /system.slice/system-ceph\x2db2f78482\x2dead5\x2d11eb\x2d9ac0\x2d482ae35a5fbb.slice/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service
├─container
│ ├─2907958 /dev/init -- /bin/node_exporter --no-collector.timex --web.listen-address=:9100
│ └─2907960 /bin/node_exporter --no-collector.timex --web.listen-address=:9100
└─supervisor
└─2907955 /usr/bin/conmon --api-version 1 -c b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1 -u b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1 -r /usr/bin/crun -b /var/lib/containers/storage/overlay-containers/b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1/userdata -p /run/containers/storage/overlay-containers/b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1/userdata/pidfile -n ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter-sebastians-laptop --exit-dir /run/libpod/exits --socket-dir-path /run/libpod/socket -l journald --log-level warning --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/containers/storage/overlay-containers/b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1/userdata/oci-log --conmon-pidfile /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /run/containers/storage --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/libpod --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mountopt=nodev --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg=" - textfile" source="node_exporter.go:104"
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg=" - time" source="node_exporter.go:104"
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg=" - uname" source="node_exporter.go:104"
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg=" - vmstat" source="node_exporter.go:104"
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg=" - xfs" source="node_exporter.go:104"
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg=" - zfs" source="node_exporter.go:104"
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg="Listening on :9100" source="node_exporter.go:170"
Jul 22 16:04:16 sebastians-laptop podman[2907914]: 2021-07-22 16:04:16.490922626 +0200 CEST m=+0.269857979 container start b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1 (image=docker.io/prom/node-exporter:v0.18.1, name=ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter-sebastians-laptop, maintainer=The Prometheus Authors <prometheus-developers@googlegroups.com>)
Jul 22 16:04:16 sebastians-laptop bash[2907914]: b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1
Jul 22 16:04:16 sebastians-laptop systemd[1]: Started Ceph node-exporter.sebastians-laptop for b2f78482-ead5-11eb-9ac0-482ae35a5fbb.
➜ cephadm git:(cephadm-container-name-dashes) ✗ sudo systemctl status ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop | cat
● ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service - Ceph node-exporter.sebastians-laptop for b2f78482-ead5-11eb-9ac0-482ae35a5fbb
Loaded: loaded (/etc/systemd/system/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2021-07-22 16:05:04 CEST; 5s ago
Process: 2907757 ExecStartPre=/bin/rm -f /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-pid /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-cid (code=exited, status=0/SUCCESS)
Process: 2907758 ExecStart=/bin/bash /var/lib/ceph/b2f78482-ead5-11eb-9ac0-482ae35a5fbb/node-exporter.sebastians-laptop/unit.run (code=exited, status=0/SUCCESS)
Process: 2908848 ExecStop=/bin/bash -c /bin/podman stop ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter.sebastians-laptop ; bash /var/lib/ceph/b2f78482-ead5-11eb-9ac0-482ae35a5fbb/node-exporter.sebastians-laptop/unit.stop (code=exited, status=0/SUCCESS)
Process: 2909007 ExecStopPost=/bin/bash /var/lib/ceph/b2f78482-ead5-11eb-9ac0-482ae35a5fbb/node-exporter.sebastians-laptop/unit.poststop (code=exited, status=0/SUCCESS)
Process: 2909008 ExecStopPost=/bin/rm -f /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-pid /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-cid (code=exited, status=0/SUCCESS)
Main PID: 2907955 (code=exited, status=143)
CPU: 1.030s
Jul 22 16:04:16 sebastians-laptop systemd[1]: Started Ceph node-exporter.sebastians-laptop for b2f78482-ead5-11eb-9ac0-482ae35a5fbb.
Jul 22 16:05:03 sebastians-laptop systemd[1]: Stopping Ceph node-exporter.sebastians-laptop for b2f78482-ead5-11eb-9ac0-482ae35a5fbb...
Jul 22 16:05:03 sebastians-laptop bash[2908849]: Error: no container with name or ID "ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter.sebastians-laptop" found: no such container
Jul 22 16:05:04 sebastians-laptop podman[2908926]: 2021-07-22 16:05:04.364723428 +0200 CEST m=+0.120139014 container remove b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1 (image=docker.io/prom/node-exporter:v0.18.1, name=ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter-sebastians-laptop, maintainer=The Prometheus Authors <prometheus-developers@googlegroups.com>)
Jul 22 16:05:04 sebastians-laptop bash[2908886]: ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter-sebastians-laptop
Jul 22 16:05:04 sebastians-laptop systemd[1]: ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service: Main process exited, code=exited, status=143/n/a
Jul 22 16:05:04 sebastians-laptop bash[2908968]: Error: no container with name or ID "ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter.sebastians-laptop" found: no such container
Jul 22 16:05:04 sebastians-laptop systemd[1]: ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service: Failed with result 'exit-code'.
Jul 22 16:05:04 sebastians-laptop systemd[1]: Stopped Ceph node-exporter.sebastians-laptop for b2f78482-ead5-11eb-9ac0-482ae35a5fbb.
Jul 22 16:05:04 sebastians-laptop systemd[1]: ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service: Consumed 1.030s CPU time.
➜ cephadm git:(cephadm-container-name-dashes) ✗
</pre>
<p>might be related to conmon is exiting with 143 instead of 0</p>
Orchestrator - Bug #51761 (Closed): journald logs are broken up again
https://tracker.ceph.com/issues/51761
2021-07-21T12:05:47Z
Sebastian Wagner
<pre>
● ceph-58ea4790-ea12-11eb-b009-482ae35a5fbb@mon.sebastians-laptop.service - Ceph mon.sebastians-laptop for 58ea4790-ea12-11eb-b009-482ae35a5fbb
Loaded: loaded (/etc/systemd/system/ceph-58ea4790-ea12-11eb-b009-482ae35a5fbb@.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2021-07-21 12:58:58 CEST; 1h 5min ago
Main PID: 2604173 (conmon)
Tasks: 2 (limit: 38293)
Memory: 780.0K
CPU: 1.985s
CGroup: /system.slice/system-ceph\x2d58ea4790\x2dea12\x2d11eb\x2db009\x2d482ae35a5fbb.slice/ceph-58ea4790-ea12-11eb-b009-482ae35a5fbb@mon.sebastians-laptop.service
└─2604173 /usr/bin/conmon --api-version 1 -c e18002c4d14872ebed3fa366b33755698b3cd1d9733e2450681941123d2c4ff5 -u e18002c4d14872ebed3fa366b33755698b3cd1d9733e2450681941123d2c4ff5 -r /usr/bin/crun -b>
Jul 21 14:04:17 sebastians-laptop conmon[2604173]: audit 2021-07-
Jul 21 14:04:17 sebastians-laptop conmon[2604173]: 21T12:04:17.400646+0000 mon.sebastians-laptop (mon.0)
Jul 21 14:04:17 sebastians-laptop conmon[2604173]: 632 : audit [DBG] from='mgr.14116 127.0.0.1:0/906050170' entity='mgr.sebastians-laptop.upresu' cmd=[{"prefix": "config get", "who": "mon", "key": "public_netwo>
Jul 21 14:04:17 sebastians-laptop conmon[2604173]: debug 2021-07-21T12:04:17.737+0000 7fea0e0fd700 1 mon.sebastians-laptop@0(leader).osd e2 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 369098752 full_>
Jul 21 14:04:18 sebastians-laptop conmon[2604173]: cephadm 2021-07-21T12:04
Jul 21 14:04:18 sebastians-laptop conmon[2604173]: :17.398305+0000 mgr.sebastians-laptop.upresu (mgr.14116) 2097 : cephadm [INF] Filtered out host sebastians-laptop: does not belong to mon public_network ()
Jul 21 14:04:18 sebastians-laptop conmon[2604173]: cephadm 2021-07-21T12:04:17.399260+0000 mgr.sebastians-laptop.upresu (mgr.14116) 2098 : cephadm
Jul 21 14:04:18 sebastians-laptop conmon[2604173]: [INF] Reconfiguring mon.sebastians-laptop (unknown last config time)...
Jul 21 14:04:19 sebastians-laptop conmon[2604173]: cluster 2021-07-21T12:04:19.002733+0000 mgr.sebastians-laptop.upresu (mgr.14116) 2099 : cluster [DBG] pgmap v1944: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
Jul 21 14:04:21 sebastians-laptop conmon[2604173]: cluster 2021-07-21T12:04:21.003262+0000 mgr.sebastians-laptop.upresu (mgr.14116) 2100 : cluster [DBG] pgmap v1945: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
~
</pre>
<p>This is super unfortunate</p>
Orchestrator - Bug #51361 (New): KillMode=none is deprecated
https://tracker.ceph.com/issues/51361
2021-06-25T09:05:39Z
Sebastian Wagner
<p>We chaged systemd unit file killmode to none in <a class="external" href="https://github.com/ceph/ceph/pull/33162#issuecomment-584183316">https://github.com/ceph/ceph/pull/33162#issuecomment-584183316</a></p>
<p>Now we're getting a new warning:</p>
<pre>
Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
</pre>
teuthology - Bug #47441 (Closed): teuthology/task/install: verify_package_version: RuntimeError: ...
https://tracker.ceph.com/issues/47441
2020-09-14T14:25:51Z
Sebastian Wagner
<pre>
2020-09-14T13:32:56.135 INFO:teuthology.packaging:The installed version of ceph is 16.0.0-5509.g7f41e68.el8
2020-09-14T13:32:56.136 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 31, in nested
vars.append(enter())
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 218, in install
install_packages(ctx, package_list, config)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 87, in install_packages
verify_package_version(ctx, config, remote)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 61, in verify_package_version
pkg=pkg_to_check
RuntimeError: ceph version 16.0.0-5509.g7f41e68c8af was not installed, found 16.0.0-5509.g7f41e68.el8.
</pre>
<p>Looks like the builds were duplicated: See <a class="external" href="https://shaman.ceph.com/repos/ceph/wip-swagner-testing-2020-09-14-1230/7f41e68c8afa3f6a917ca548770374067fdb433f/">https://shaman.ceph.com/repos/ceph/wip-swagner-testing-2020-09-14-1230/7f41e68c8afa3f6a917ca548770374067fdb433f/</a></p>
rbd - Bug #46875 (New): TestLibRBD.TestPendingAio: test_librbd.cc:4539: Failure or SIGSEGV
https://tracker.ceph.com/issues/46875
2020-08-10T01:03:45Z
Sebastian Wagner
<pre>
[ RUN ] TestLibRBD.TestPendingAio
using new format!
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/librbd/test_librbd.cc:4539: Failure
Expected equality of these values:
1
rbd_aio_is_complete(comps[i])
Which is: 0
[ FAILED ] TestLibRBD.TestPendingAio (68 ms)
</pre>
<p><a class="external" href="https://jenkins.ceph.com/job/ceph-pull-requests/57209/consoleFull#-361705261e840cee4-f4a4-4183-81dd-42855615f2c1">https://jenkins.ceph.com/job/ceph-pull-requests/57209/consoleFull#-361705261e840cee4-f4a4-4183-81dd-42855615f2c1</a></p>
RADOS - Feature #45079 (New): HEALTH_WARN, if require-osd-release is < mimic and OSD wants to joi...
https://tracker.ceph.com/issues/45079
2020-04-14T10:25:33Z
Sebastian Wagner
<p>When upgrading a cluster to octopus, users should get a warning, if require-osd-release is < mimic as this prevents osds from joining the cluster.</p>
<p>Right now, we get a INF in the logs:</p>
<pre>
cluster [INF] disallowing boot of octopus+ OSD osd.1 v1:172.16.1.25:6800/3051821808 because require_osd
</pre>
<p>this should be a HEALTH_WARN instead.</p>