Ceph : Issues
https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2022-01-24T15:39:53Z
Ceph
Redmine
Orchestrator - Cleanup #54000 (New): cephadm: upgrade commands should return yaml
https://tracker.ceph.com/issues/54000
2022-01-24T15:39:53Z
Sebastian Wagner
<p>Right now, commands like</p>
<ul>
<li>ceph orch upgrade ls</li>
<li>ceph orch upgrade status</li>
</ul>
<p>are returning json. YAML is much more readable. Let's return yaml instead!</p>
<p><strong>Note</strong>, now that needs a <strong>--format=...</strong> argument.</p>
Orchestrator - Cleanup #53999 (New): orch interface: cephadm contains a lot of special apply methods
https://tracker.ceph.com/issues/53999
2022-01-24T15:28:49Z
Sebastian Wagner
<pre><code class="python syntaxhl"><span class="CodeRay"> <span class="decorator">@handle_orch_error</span>
<span class="keyword">def</span> <span class="function">apply_rgw</span>(<span class="predefined-constant">self</span>, spec: ServiceSpec) -> <span class="predefined">str</span>:
<span class="keyword">return</span> <span class="predefined-constant">self</span>._apply(spec)
</span></code></pre>
<p>Let's remove them! It's getting out of hand by now. Just use <strong>apply</strong> for everything.</p>
Orchestrator - Feature #53562 (New): cephadm doesn't support osd crush_location_hook
https://tracker.ceph.com/issues/53562
2021-12-09T11:59:53Z
Sebastian Wagner
<p>crush_location_hook is a path to an executable that is executed in order to update the current OSD's crush location. Executed like so:</p>
<pre>
$crush_location_hook --cluster {cluster-name} --id {ID} --type {daemon-type}
</pre>
<p>and prints out the current crush locations.</p>
<p>Workarounds:</p>
<ul>
<li>For a per-host based location, we have: <a class="external" href="https://docs.ceph.com/en/latest/cephadm/host-management/#setting-the-initial-crush-location-of-host">https://docs.ceph.com/en/latest/cephadm/host-management/#setting-the-initial-crush-location-of-host</a> which should cover a lot of use cases.</li>
<li>Build a new container image locally and add the crush_location_hook executable to it. Then set the config option to the file path within the container</li>
</ul>
Orchestrator - Feature #53539 (New): ceph orch exposes no way to see what’s queued or remove work...
https://tracker.ceph.com/issues/53539
2021-12-08T13:50:27Z
Sebastian Wagner
<p>ceph orch exposes no way to see what’s queued or remove work from the queue</p>
<p>We should probably expose the Queue of scheduled ops in <strong>cpeh orch status</strong>.</p>
Orchestrator - Bug #53422 (New): tasks.cephfs.test_nfs.TestNFS.test_export_create_with_non_existi...
https://tracker.ceph.com/issues/53422
2021-11-29T08:47:58Z
Sebastian Wagner
<p><a class="external" href="https://pulpito.ceph.com/swagner-2021-11-26_13:52:15-orch:cephadm-wip-swagner2-testing-2021-11-26-1129-distro-default-smithi/6528237">https://pulpito.ceph.com/swagner-2021-11-26_13:52:15-orch:cephadm-wip-swagner2-testing-2021-11-26-1129-distro-default-smithi/6528237</a></p>
<pre>
teuthology.orchestra.run.smithi145:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph orch ps --service_name=nfs.test
teuthology.orchestra.run.smithi145.stdout:No daemons reported
test_export_create_with_non_existing_fsname (tasks.cephfs.test_nfs.TestNFS) ... FAIL
======================================================================
FAIL: test_export_create_with_non_existing_fsname (tasks.cephfs.test_nfs.TestNFS)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_ceph-c_c9fd0972675c568f5d517be49820a12e77fab497/qa/tasks/cephfs/test_nfs.py", line 412, in test_export_create_with_non_existing_fsname
self._test_create_cluster()
File "/home/teuthworker/src/git.ceph.com_ceph-c_c9fd0972675c568f5d517be49820a12e77fab497/qa/tasks/cephfs/test_nfs.py", line 125, in _test_create_cluster
self._check_nfs_cluster_status('running', 'NFS Ganesha cluster deployment failed')
File "/home/teuthworker/src/git.ceph.com_ceph-c_c9fd0972675c568f5d517be49820a12e77fab497/qa/tasks/cephfs/test_nfs.py", line 89, in _check_nfs_cluster_status
self.fail(fail_msg)
AssertionError: NFS Ganesha cluster deployment failed
</pre>
<p>Looking at the log, we see the <strong>mfs.test</strong> cluster being created multiple times successfully, but it was never created right before the last test case:</p>
<pre>
2021-11-26T16:46:09: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:46:09: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.phdkqk
2021-11-26T16:46:09: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:46:09: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:46:09: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.phdkqk-rgw
2021-11-26T16:46:09: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.phdkqk on smithi145
2021-11-26T16:46:41: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:46:41: cephadm [INF] Remove service nfs.test
2021-11-26T16:46:41: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.phdkqk...
2021-11-26T16:46:41: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.phdkqk from smithi145
2021-11-26T16:46:45: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.phdkqk
2021-11-26T16:46:45: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.phdkqk-rgw
2021-11-26T16:46:45: cephadm [INF] Purge service nfs.test
2021-11-26T16:46:45: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:46:57: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:46:57: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.vgdnuw
2021-11-26T16:46:57: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:46:57: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:46:57: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.vgdnuw-rgw
2021-11-26T16:46:57: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.vgdnuw on smithi145
2021-11-26T16:47:51: cephadm [INF] Restart service nfs.test
2021-11-26T16:48:22: cephadm [INF] Restart service nfs.test
2021-11-26T16:49:02: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:49:02: cephadm [INF] Remove service nfs.test
2021-11-26T16:49:15: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.vgdnuw...
2021-11-26T16:49:15: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.vgdnuw from smithi145
2021-11-26T16:49:19: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.vgdnuw
2021-11-26T16:49:19: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.vgdnuw-rgw
2021-11-26T16:49:19: cephadm [INF] Purge service nfs.test
2021-11-26T16:49:19: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:49:47: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:49:47: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.heyhit
2021-11-26T16:49:47: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:49:47: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:49:47: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.heyhit-rgw
2021-11-26T16:49:47: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.heyhit on smithi145
2021-11-26T16:50:18: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:50:18: cephadm [INF] Remove service nfs.test
2021-11-26T16:50:18: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.heyhit...
2021-11-26T16:50:18: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.heyhit from smithi145
2021-11-26T16:50:22: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.heyhit
2021-11-26T16:50:22: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.heyhit-rgw
2021-11-26T16:50:22: cephadm [INF] Purge service nfs.test
2021-11-26T16:50:22: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:50:32: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:50:32: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.woegpi
2021-11-26T16:50:32: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:50:32: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:50:32: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.woegpi-rgw
2021-11-26T16:50:32: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.woegpi on smithi145
2021-11-26T16:51:19: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:51:19: cephadm [INF] Remove service nfs.test
2021-11-26T16:51:19: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.woegpi...
2021-11-26T16:51:19: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.woegpi from smithi145
2021-11-26T16:51:34: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.woegpi
2021-11-26T16:51:34: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.woegpi-rgw
2021-11-26T16:51:34: cephadm [INF] Purge service nfs.test
2021-11-26T16:51:34: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:51:56: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:51:56: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.uwhrnz
2021-11-26T16:51:56: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:51:56: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:51:56: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.uwhrnz-rgw
2021-11-26T16:51:56: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.uwhrnz on smithi145
2021-11-26T16:52:27: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:52:27: cephadm [INF] Remove service nfs.test
2021-11-26T16:52:27: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.uwhrnz...
2021-11-26T16:52:27: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.uwhrnz from smithi145
2021-11-26T16:52:32: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.uwhrnz
2021-11-26T16:52:32: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.uwhrnz-rgw
2021-11-26T16:52:32: cephadm [INF] Purge service nfs.test
2021-11-26T16:52:32: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:52:41: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:52:41: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.ccuxek
2021-11-26T16:52:41: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:52:41: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:52:41: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.ccuxek-rgw
2021-11-26T16:52:41: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.ccuxek on smithi145
2021-11-26T16:53:15: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:53:15: cephadm [INF] Remove service nfs.test
2021-11-26T16:53:15: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.ccuxek...
2021-11-26T16:53:15: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.ccuxek from smithi145
2021-11-26T16:53:19: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.ccuxek
2021-11-26T16:53:19: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.ccuxek-rgw
2021-11-26T16:53:19: cephadm [INF] Purge service nfs.test
2021-11-26T16:53:19: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:53:28: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:53:28: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.dduovn
2021-11-26T16:53:28: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:53:28: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:53:28: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.dduovn-rgw
2021-11-26T16:53:28: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.dduovn on smithi145
2021-11-26T16:54:10: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:54:10: cephadm [INF] Remove service nfs.test
2021-11-26T16:54:10: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.dduovn...
2021-11-26T16:54:10: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.dduovn from smithi145
2021-11-26T16:54:14: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.dduovn
2021-11-26T16:54:14: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.dduovn-rgw
2021-11-26T16:54:14: cephadm [INF] Purge service nfs.test
2021-11-26T16:54:14: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:54:23: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.dduovn...
2021-11-26T16:54:23: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.dduovn from smithi145
2021-11-26T16:54:24: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:54:25: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.mcpayf
2021-11-26T16:54:25: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:54:25: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:54:25: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.mcpayf-rgw
2021-11-26T16:54:25: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.mcpayf on smithi145
2021-11-26T16:55:00: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:55:00: cephadm [INF] Remove service nfs.test
2021-11-26T16:55:00: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.mcpayf...
2021-11-26T16:55:00: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.mcpayf from smithi145
2021-11-26T16:55:04: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.mcpayf
2021-11-26T16:55:04: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.mcpayf-rgw
2021-11-26T16:55:04: cephadm [INF] Purge service nfs.test
2021-11-26T16:55:04: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:55:17: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.mcpayf...
2021-11-26T16:55:17: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.mcpayf from smithi145
</pre>
Orchestrator - Bug #53154 (New): t8y: cephadm: error: unrecognized arguments: --keep-logs
https://tracker.ceph.com/issues/53154
2021-11-04T09:38:54Z
Sebastian Wagner
<pre>
2021-11-03T13:15:09.452 DEBUG:teuthology.orchestra.run.smithi191:> sudo /home/ubuntu/cephtest/cephadm rm-cluster --fsid f2abfd4e-3ca4-11ec-8c28-001a4aab830c --force --keep-logs
2021-11-03T13:15:09.584 INFO:teuthology.orchestra.run.smithi191.stderr:usage: cephadm [-h] [--image IMAGE] [--docker] [--data-dir DATA_DIR]
2021-11-03T13:15:09.584 INFO:teuthology.orchestra.run.smithi191.stderr: [--log-dir LOG_DIR] [--logrotate-dir LOGROTATE_DIR]
2021-11-03T13:15:09.585 INFO:teuthology.orchestra.run.smithi191.stderr: [--unit-dir UNIT_DIR] [--verbose] [--timeout TIMEOUT]
2021-11-03T13:15:09.585 INFO:teuthology.orchestra.run.smithi191.stderr: [--retry RETRY] [--env ENV] [--no-container-init]
2021-11-03T13:15:09.585 INFO:teuthology.orchestra.run.smithi191.stderr: {version,pull,inspect-image,ls,list-networks,adopt,rm-daemon,rm-cluster,run,shell,enter,ceph-volume,unit,logs,bootstrap,deplo
y,check-host,prepare-host,add-repo,rm-repo,install,registry-login,gather-facts}
2021-11-03T13:15:09.585 INFO:teuthology.orchestra.run.smithi191.stderr: ...
2021-11-03T13:15:09.585 INFO:teuthology.orchestra.run.smithi191.stderr:cephadm: error: unrecognized arguments: --keep-logs
2021-11-03T13:15:09.595 DEBUG:teuthology.orchestra.run:got remote process result: 2
</pre>
<p><a class="external" href="https://pulpito.ceph.com/swagner-2021-11-03_11:47:26-orch:cephadm-wip-swagner-testing-2021-11-03-0958-distro-basic-smithi/6481219">https://pulpito.ceph.com/swagner-2021-11-03_11:47:26-orch:cephadm-wip-swagner-testing-2021-11-03-0958-distro-basic-smithi/6481219</a></p>
Orchestrator - Bug #51806 (Need More Info): cephadm: stopped contains end up in error state
https://tracker.ceph.com/issues/51806
2021-07-22T15:26:52Z
Sebastian Wagner
<pre>
[ceph: root@sebastians-laptop /]# ceph orch stop node-exporter
Scheduled to stop node-exporter.sebastians-laptop on host 'sebastians-laptop'
[ceph: root@sebastians-laptop /]# ceph orch ps --service-name node-exporter
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID
node-exporter.sebastians-laptop sebastians-laptop *:9100 error 3m ago 4m - - <unknown> <unknown>
</pre>
<pre>
➜ cephadm git:(cephadm-container-name-dashes) ✗ sudo systemctl status ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop | cat
● ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service - Ceph node-exporter.sebastians-laptop for b2f78482-ead5-11eb-9ac0-482ae35a5fbb
Loaded: loaded (/etc/systemd/system/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2021-07-22 16:04:16 CEST; 14s ago
Process: 2907757 ExecStartPre=/bin/rm -f /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-pid /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-cid (code=exited, status=0/SUCCESS)
Process: 2907758 ExecStart=/bin/bash /var/lib/ceph/b2f78482-ead5-11eb-9ac0-482ae35a5fbb/node-exporter.sebastians-laptop/unit.run (code=exited, status=0/SUCCESS)
Main PID: 2907955 (conmon)
Tasks: 8 (limit: 38293)
Memory: 2.7M
CPU: 562ms
CGroup: /system.slice/system-ceph\x2db2f78482\x2dead5\x2d11eb\x2d9ac0\x2d482ae35a5fbb.slice/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service
├─container
│ ├─2907958 /dev/init -- /bin/node_exporter --no-collector.timex --web.listen-address=:9100
│ └─2907960 /bin/node_exporter --no-collector.timex --web.listen-address=:9100
└─supervisor
└─2907955 /usr/bin/conmon --api-version 1 -c b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1 -u b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1 -r /usr/bin/crun -b /var/lib/containers/storage/overlay-containers/b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1/userdata -p /run/containers/storage/overlay-containers/b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1/userdata/pidfile -n ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter-sebastians-laptop --exit-dir /run/libpod/exits --socket-dir-path /run/libpod/socket -l journald --log-level warning --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/containers/storage/overlay-containers/b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1/userdata/oci-log --conmon-pidfile /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /run/containers/storage --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/libpod --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mountopt=nodev --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg=" - textfile" source="node_exporter.go:104"
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg=" - time" source="node_exporter.go:104"
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg=" - uname" source="node_exporter.go:104"
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg=" - vmstat" source="node_exporter.go:104"
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg=" - xfs" source="node_exporter.go:104"
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg=" - zfs" source="node_exporter.go:104"
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg="Listening on :9100" source="node_exporter.go:170"
Jul 22 16:04:16 sebastians-laptop podman[2907914]: 2021-07-22 16:04:16.490922626 +0200 CEST m=+0.269857979 container start b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1 (image=docker.io/prom/node-exporter:v0.18.1, name=ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter-sebastians-laptop, maintainer=The Prometheus Authors <prometheus-developers@googlegroups.com>)
Jul 22 16:04:16 sebastians-laptop bash[2907914]: b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1
Jul 22 16:04:16 sebastians-laptop systemd[1]: Started Ceph node-exporter.sebastians-laptop for b2f78482-ead5-11eb-9ac0-482ae35a5fbb.
➜ cephadm git:(cephadm-container-name-dashes) ✗ sudo systemctl status ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop | cat
● ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service - Ceph node-exporter.sebastians-laptop for b2f78482-ead5-11eb-9ac0-482ae35a5fbb
Loaded: loaded (/etc/systemd/system/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2021-07-22 16:05:04 CEST; 5s ago
Process: 2907757 ExecStartPre=/bin/rm -f /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-pid /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-cid (code=exited, status=0/SUCCESS)
Process: 2907758 ExecStart=/bin/bash /var/lib/ceph/b2f78482-ead5-11eb-9ac0-482ae35a5fbb/node-exporter.sebastians-laptop/unit.run (code=exited, status=0/SUCCESS)
Process: 2908848 ExecStop=/bin/bash -c /bin/podman stop ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter.sebastians-laptop ; bash /var/lib/ceph/b2f78482-ead5-11eb-9ac0-482ae35a5fbb/node-exporter.sebastians-laptop/unit.stop (code=exited, status=0/SUCCESS)
Process: 2909007 ExecStopPost=/bin/bash /var/lib/ceph/b2f78482-ead5-11eb-9ac0-482ae35a5fbb/node-exporter.sebastians-laptop/unit.poststop (code=exited, status=0/SUCCESS)
Process: 2909008 ExecStopPost=/bin/rm -f /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-pid /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-cid (code=exited, status=0/SUCCESS)
Main PID: 2907955 (code=exited, status=143)
CPU: 1.030s
Jul 22 16:04:16 sebastians-laptop systemd[1]: Started Ceph node-exporter.sebastians-laptop for b2f78482-ead5-11eb-9ac0-482ae35a5fbb.
Jul 22 16:05:03 sebastians-laptop systemd[1]: Stopping Ceph node-exporter.sebastians-laptop for b2f78482-ead5-11eb-9ac0-482ae35a5fbb...
Jul 22 16:05:03 sebastians-laptop bash[2908849]: Error: no container with name or ID "ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter.sebastians-laptop" found: no such container
Jul 22 16:05:04 sebastians-laptop podman[2908926]: 2021-07-22 16:05:04.364723428 +0200 CEST m=+0.120139014 container remove b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1 (image=docker.io/prom/node-exporter:v0.18.1, name=ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter-sebastians-laptop, maintainer=The Prometheus Authors <prometheus-developers@googlegroups.com>)
Jul 22 16:05:04 sebastians-laptop bash[2908886]: ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter-sebastians-laptop
Jul 22 16:05:04 sebastians-laptop systemd[1]: ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service: Main process exited, code=exited, status=143/n/a
Jul 22 16:05:04 sebastians-laptop bash[2908968]: Error: no container with name or ID "ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter.sebastians-laptop" found: no such container
Jul 22 16:05:04 sebastians-laptop systemd[1]: ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service: Failed with result 'exit-code'.
Jul 22 16:05:04 sebastians-laptop systemd[1]: Stopped Ceph node-exporter.sebastians-laptop for b2f78482-ead5-11eb-9ac0-482ae35a5fbb.
Jul 22 16:05:04 sebastians-laptop systemd[1]: ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service: Consumed 1.030s CPU time.
➜ cephadm git:(cephadm-container-name-dashes) ✗
</pre>
<p>might be related to conmon is exiting with 143 instead of 0</p>
Orchestrator - Bug #51361 (New): KillMode=none is deprecated
https://tracker.ceph.com/issues/51361
2021-06-25T09:05:39Z
Sebastian Wagner
<p>We chaged systemd unit file killmode to none in <a class="external" href="https://github.com/ceph/ceph/pull/33162#issuecomment-584183316">https://github.com/ceph/ceph/pull/33162#issuecomment-584183316</a></p>
<p>Now we're getting a new warning:</p>
<pre>
Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
</pre>
Orchestrator - Bug #49287 (New): podman: setting cgroup config for procHooks process caused: Unit...
https://tracker.ceph.com/issues/49287
2021-02-13T00:54:57Z
Sebastian Wagner
<pre>
2021-02-12T16:27:55.195 INFO:teuthology.orchestra.run.smithi014.stderr:Non-zero exit code 127 from /bin/podman run --rm --ipc=host --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.ceph.io/ceph-ci/ceph:52fc503cf18cf3bb446b840ba00be073017b8373 -e NODE_NAME=smithi014 quay.ceph.io/ceph-ci/ceph:52fc503cf18cf3bb446b840ba0
0be073017b8373 -c %u %g /var/lib/ceph
2021-02-12T16:27:55.195 INFO:teuthology.orchestra.run.smithi014.stderr:stat: stderr Error: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: Unit libpod-056038e1126191fba41d8a037275136f2d7aeec9710b9ee
ff792c06d8544b983.scope not found.: OCI runtime error
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr:Traceback (most recent call last):
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: File "/home/ubuntu/cephtest/cephadm", line 7697, in <module>
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: main()
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: File "/home/ubuntu/cephtest/cephadm", line 7686, in main
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: r = ctx.func(ctx)
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: File "/home/ubuntu/cephtest/cephadm", line 1566, in _infer_fsid
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: return func(ctx)
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: File "/home/ubuntu/cephtest/cephadm", line 1603, in _infer_config
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: return func(ctx)
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: File "/home/ubuntu/cephtest/cephadm", line 1650, in _infer_image
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: return func(ctx)
2021-02-12T16:27:55.202 INFO:teuthology.orchestra.run.smithi014.stderr: File "/home/ubuntu/cephtest/cephadm", line 4128, in command_shell
2021-02-12T16:27:55.202 INFO:teuthology.orchestra.run.smithi014.stderr: make_log_dir(ctx, ctx.fsid)
2021-02-12T16:27:55.202 INFO:teuthology.orchestra.run.smithi014.stderr: File "/home/ubuntu/cephtest/cephadm", line 1752, in make_log_dir
2021-02-12T16:27:55.202 INFO:teuthology.orchestra.run.smithi014.stderr: uid, gid = extract_uid_gid(ctx)
2021-02-12T16:27:55.202 INFO:teuthology.orchestra.run.smithi014.stderr: File "/home/ubuntu/cephtest/cephadm", line 2428, in extract_uid_gid
2021-02-12T16:27:55.202 INFO:teuthology.orchestra.run.smithi014.stderr: raise RuntimeError('uid/gid not found')
2021-02-12T16:27:55.202 INFO:teuthology.orchestra.run.smithi014.stderr:RuntimeError: uid/gid not found
</pre>
<p><a class="external" href="https://pulpito.ceph.com/swagner-2021-02-11_11:00:52-rados:cephadm-wip-swagner3-testing-2021-02-10-1322-distro-basic-smithi/5874630">https://pulpito.ceph.com/swagner-2021-02-11_11:00:52-rados:cephadm-wip-swagner3-testing-2021-02-10-1322-distro-basic-smithi/5874630</a></p>
Orchestrator - Bug #49233 (New): cephadm shell: TLS handshake timeout
https://tracker.ceph.com/issues/49233
2021-02-10T11:26:38Z
Sebastian Wagner
<p><a class="external" href="https://pulpito.ceph.com/swagner-2021-02-09_10:28:14-rados:cephadm-wip-swagner2-testing-2021-02-08-1109-pacific-distro-basic-smithi/5871391">https://pulpito.ceph.com/swagner-2021-02-09_10:28:14-rados:cephadm-wip-swagner2-testing-2021-02-08-1109-pacific-distro-basic-smithi/5871391</a></p>
<pre>
2021-02-10T06:43:35.452 INFO:teuthology.orchestra.run.smithi132.stderr:Non-zero exit code 125 from /usr/bin/docker run --rm --ipc=host --net=host --entrypoint stat -e CONTAINER_IMAGE=quay.ceph.io/ceph-ci/ceph:282cc83b9d6c73ac8a35502bb969bc4e36afefcc -e NODE_NAME=smithi132 quay.ceph.io/ceph-ci/ceph:282cc83b9d6c73ac8a35502bb969b
c4e36afefcc -c %u %g /var/lib/ceph
2021-02-10T06:43:35.453 INFO:teuthology.orchestra.run.smithi132.stderr:stat: stderr Unable to find image 'quay.ceph.io/ceph-ci/ceph:282cc83b9d6c73ac8a35502bb969bc4e36afefcc' locally
2021-02-10T06:43:35.453 INFO:teuthology.orchestra.run.smithi132.stderr:stat: stderr /usr/bin/docker: Error response from daemon: Get https://quay.ceph.io/v2/ceph-ci/ceph/manifests/282cc83b9d6c73ac8a35502bb969bc4e36afefcc: net/http: TLS handshake timeout.
2021-02-10T06:43:35.453 INFO:teuthology.orchestra.run.smithi132.stderr:stat: stderr See '/usr/bin/docker run --help'.
2021-02-10T06:43:35.465 INFO:teuthology.orchestra.run.smithi132.stderr:Traceback (most recent call last):
2021-02-10T06:43:35.465 INFO:teuthology.orchestra.run.smithi132.stderr: File "/usr/sbin/cephadm", line 7639, in <module>
2021-02-10T06:43:35.465 INFO:teuthology.orchestra.run.smithi132.stderr: main()
2021-02-10T06:43:35.465 INFO:teuthology.orchestra.run.smithi132.stderr: File "/usr/sbin/cephadm", line 7628, in main
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: r = ctx.func(ctx)
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: File "/usr/sbin/cephadm", line 1616, in _infer_fsid
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: return func(ctx)
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: File "/usr/sbin/cephadm", line 1653, in _infer_config
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: return func(ctx)
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: File "/usr/sbin/cephadm", line 1700, in _infer_image
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: return func(ctx)
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: File "/usr/sbin/cephadm", line 4115, in command_shell
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: make_log_dir(ctx, ctx.fsid)
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: File "/usr/sbin/cephadm", line 1802, in make_log_dir
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: uid, gid = extract_uid_gid(ctx)
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: File "/usr/sbin/cephadm", line 2469, in extract_uid_gid
2021-02-10T06:43:35.467 INFO:teuthology.orchestra.run.smithi132.stderr: raise RuntimeError('uid/gid not found')
2021-02-10T06:43:35.467 INFO:teuthology.orchestra.run.smithi132.stderr:RuntimeError: uid/gid not found
2
</pre>
<p>Looks like we need to use the ignore list form</p>
<p><a class="external" href="https://github.com/ceph/ceph/blob/40d5c37930f9b7f883c3b6da57be481f1fe6fb6c/src/cephadm/cephadm#L3078-L3086">https://github.com/ceph/ceph/blob/40d5c37930f9b7f883c3b6da57be481f1fe6fb6c/src/cephadm/cephadm#L3078-L3086</a></p>
<p>also for command_shell()</p>
teuthology - Bug #47441 (Closed): teuthology/task/install: verify_package_version: RuntimeError: ...
https://tracker.ceph.com/issues/47441
2020-09-14T14:25:51Z
Sebastian Wagner
<pre>
2020-09-14T13:32:56.135 INFO:teuthology.packaging:The installed version of ceph is 16.0.0-5509.g7f41e68.el8
2020-09-14T13:32:56.136 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 31, in nested
vars.append(enter())
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 218, in install
install_packages(ctx, package_list, config)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 87, in install_packages
verify_package_version(ctx, config, remote)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 61, in verify_package_version
pkg=pkg_to_check
RuntimeError: ceph version 16.0.0-5509.g7f41e68c8af was not installed, found 16.0.0-5509.g7f41e68.el8.
</pre>
<p>Looks like the builds were duplicated: See <a class="external" href="https://shaman.ceph.com/repos/ceph/wip-swagner-testing-2020-09-14-1230/7f41e68c8afa3f6a917ca548770374067fdb433f/">https://shaman.ceph.com/repos/ceph/wip-swagner-testing-2020-09-14-1230/7f41e68c8afa3f6a917ca548770374067fdb433f/</a></p>
rbd - Bug #46875 (New): TestLibRBD.TestPendingAio: test_librbd.cc:4539: Failure or SIGSEGV
https://tracker.ceph.com/issues/46875
2020-08-10T01:03:45Z
Sebastian Wagner
<pre>
[ RUN ] TestLibRBD.TestPendingAio
using new format!
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/librbd/test_librbd.cc:4539: Failure
Expected equality of these values:
1
rbd_aio_is_complete(comps[i])
Which is: 0
[ FAILED ] TestLibRBD.TestPendingAio (68 ms)
</pre>
<p><a class="external" href="https://jenkins.ceph.com/job/ceph-pull-requests/57209/consoleFull#-361705261e840cee4-f4a4-4183-81dd-42855615f2c1">https://jenkins.ceph.com/job/ceph-pull-requests/57209/consoleFull#-361705261e840cee4-f4a4-4183-81dd-42855615f2c1</a></p>
Orchestrator - Feature #45876 (New): cephadm: handle port conflicts gracefully
https://tracker.ceph.com/issues/45876
2020-06-04T10:10:45Z
Sebastian Wagner
<pre>
INFO:cephadm:Verifying port 9100 ...
WARNING:cephadm:Cannot bind to IP 0.0.0.0 port 9100: [Errno 98] Address already in use
ERROR: TCP Port(s) '9100' required for node-exporter is already in use
Traceback (most recent call last):
File "/usr/share/ceph/mgr/cephadm/module.py", line 1638, in _run_cephadm
code, '\n'.join(err)))
RuntimeError: cephadm exited with an error code: 1, stderr:INFO:cephadm:Deploying daemon node-exporter.ceph-mon ...
INFO:cephadm:Verifying port 9100 ...
WARNING:cephadm:Cannot bind to IP 0.0.0.0 port 9100: [Errno 98] Address already in use
ERROR: TCP Port(s) '9100' required for node-exporter is already in use
2020-05-15T13:33:46.966159+0000 mgr.ceph-mgr.dixgvy (mgr.14161) 678 : cephadm [WRN] Failed to apply node-exporter spec ServiceSpec(
{'placement': PlacementSpec(host_pattern='*'), 'service_type': 'node-exporter', 'service_id': None, 'unmanaged': False}
): cephadm exited with an error code: 1, stderr:INFO:cephadm:Deploying daemon node-exporter.ceph-mon ...
INFO:cephadm:Verifying port 9100 ...
WARNING:cephadm:Cannot bind to IP 0.0.0.0 port 9100: [Errno 98] Address already in use
ERROR: TCP Port(s) '9100' required for node-exporter is already in use
</pre>
<p>Important bits are:</p>
<ul>
<li><strong>We already know which services want which ports.</strong> </li>
<li>we can easily prevent port conflicts for known daemons.</li>
<li>open Q: how to handle unknown daemons (i.e. pre-existing node expoter)</li>
</ul>
Orchestrator - Feature #44864 (New): cephadm: garbage collect old container images
https://tracker.ceph.com/issues/44864
2020-03-31T15:45:58Z
Sebastian Wagner
<p>cephadm: garbage collect old container images</p>
Orchestrator - Cleanup #43710 (New): Refactor k8sevents module
https://tracker.ceph.com/issues/43710
2020-01-20T15:40:36Z
Sebastian Wagner
<ul>
<li>Unify Rook env vars with mgr/rook</li>
<li>Unify events watcher with mgr/rook</li>
<li>Add to CI</li>
<li>Add Documentation</li>
</ul>