Ceph : Issues
https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2021-11-29T08:47:58Z
Ceph
Redmine
Orchestrator - Bug #53422 (New): tasks.cephfs.test_nfs.TestNFS.test_export_create_with_non_existi...
https://tracker.ceph.com/issues/53422
2021-11-29T08:47:58Z
Sebastian Wagner
<p><a class="external" href="https://pulpito.ceph.com/swagner-2021-11-26_13:52:15-orch:cephadm-wip-swagner2-testing-2021-11-26-1129-distro-default-smithi/6528237">https://pulpito.ceph.com/swagner-2021-11-26_13:52:15-orch:cephadm-wip-swagner2-testing-2021-11-26-1129-distro-default-smithi/6528237</a></p>
<pre>
teuthology.orchestra.run.smithi145:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph orch ps --service_name=nfs.test
teuthology.orchestra.run.smithi145.stdout:No daemons reported
test_export_create_with_non_existing_fsname (tasks.cephfs.test_nfs.TestNFS) ... FAIL
======================================================================
FAIL: test_export_create_with_non_existing_fsname (tasks.cephfs.test_nfs.TestNFS)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_ceph-c_c9fd0972675c568f5d517be49820a12e77fab497/qa/tasks/cephfs/test_nfs.py", line 412, in test_export_create_with_non_existing_fsname
self._test_create_cluster()
File "/home/teuthworker/src/git.ceph.com_ceph-c_c9fd0972675c568f5d517be49820a12e77fab497/qa/tasks/cephfs/test_nfs.py", line 125, in _test_create_cluster
self._check_nfs_cluster_status('running', 'NFS Ganesha cluster deployment failed')
File "/home/teuthworker/src/git.ceph.com_ceph-c_c9fd0972675c568f5d517be49820a12e77fab497/qa/tasks/cephfs/test_nfs.py", line 89, in _check_nfs_cluster_status
self.fail(fail_msg)
AssertionError: NFS Ganesha cluster deployment failed
</pre>
<p>Looking at the log, we see the <strong>mfs.test</strong> cluster being created multiple times successfully, but it was never created right before the last test case:</p>
<pre>
2021-11-26T16:46:09: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:46:09: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.phdkqk
2021-11-26T16:46:09: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:46:09: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:46:09: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.phdkqk-rgw
2021-11-26T16:46:09: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.phdkqk on smithi145
2021-11-26T16:46:41: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:46:41: cephadm [INF] Remove service nfs.test
2021-11-26T16:46:41: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.phdkqk...
2021-11-26T16:46:41: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.phdkqk from smithi145
2021-11-26T16:46:45: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.phdkqk
2021-11-26T16:46:45: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.phdkqk-rgw
2021-11-26T16:46:45: cephadm [INF] Purge service nfs.test
2021-11-26T16:46:45: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:46:57: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:46:57: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.vgdnuw
2021-11-26T16:46:57: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:46:57: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:46:57: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.vgdnuw-rgw
2021-11-26T16:46:57: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.vgdnuw on smithi145
2021-11-26T16:47:51: cephadm [INF] Restart service nfs.test
2021-11-26T16:48:22: cephadm [INF] Restart service nfs.test
2021-11-26T16:49:02: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:49:02: cephadm [INF] Remove service nfs.test
2021-11-26T16:49:15: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.vgdnuw...
2021-11-26T16:49:15: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.vgdnuw from smithi145
2021-11-26T16:49:19: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.vgdnuw
2021-11-26T16:49:19: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.vgdnuw-rgw
2021-11-26T16:49:19: cephadm [INF] Purge service nfs.test
2021-11-26T16:49:19: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:49:47: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:49:47: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.heyhit
2021-11-26T16:49:47: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:49:47: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:49:47: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.heyhit-rgw
2021-11-26T16:49:47: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.heyhit on smithi145
2021-11-26T16:50:18: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:50:18: cephadm [INF] Remove service nfs.test
2021-11-26T16:50:18: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.heyhit...
2021-11-26T16:50:18: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.heyhit from smithi145
2021-11-26T16:50:22: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.heyhit
2021-11-26T16:50:22: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.heyhit-rgw
2021-11-26T16:50:22: cephadm [INF] Purge service nfs.test
2021-11-26T16:50:22: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:50:32: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:50:32: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.woegpi
2021-11-26T16:50:32: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:50:32: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:50:32: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.woegpi-rgw
2021-11-26T16:50:32: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.woegpi on smithi145
2021-11-26T16:51:19: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:51:19: cephadm [INF] Remove service nfs.test
2021-11-26T16:51:19: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.woegpi...
2021-11-26T16:51:19: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.woegpi from smithi145
2021-11-26T16:51:34: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.woegpi
2021-11-26T16:51:34: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.woegpi-rgw
2021-11-26T16:51:34: cephadm [INF] Purge service nfs.test
2021-11-26T16:51:34: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:51:56: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:51:56: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.uwhrnz
2021-11-26T16:51:56: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:51:56: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:51:56: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.uwhrnz-rgw
2021-11-26T16:51:56: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.uwhrnz on smithi145
2021-11-26T16:52:27: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:52:27: cephadm [INF] Remove service nfs.test
2021-11-26T16:52:27: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.uwhrnz...
2021-11-26T16:52:27: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.uwhrnz from smithi145
2021-11-26T16:52:32: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.uwhrnz
2021-11-26T16:52:32: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.uwhrnz-rgw
2021-11-26T16:52:32: cephadm [INF] Purge service nfs.test
2021-11-26T16:52:32: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:52:41: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:52:41: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.ccuxek
2021-11-26T16:52:41: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:52:41: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:52:41: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.ccuxek-rgw
2021-11-26T16:52:41: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.ccuxek on smithi145
2021-11-26T16:53:15: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:53:15: cephadm [INF] Remove service nfs.test
2021-11-26T16:53:15: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.ccuxek...
2021-11-26T16:53:15: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.ccuxek from smithi145
2021-11-26T16:53:19: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.ccuxek
2021-11-26T16:53:19: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.ccuxek-rgw
2021-11-26T16:53:19: cephadm [INF] Purge service nfs.test
2021-11-26T16:53:19: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:53:28: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:53:28: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.dduovn
2021-11-26T16:53:28: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:53:28: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:53:28: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.dduovn-rgw
2021-11-26T16:53:28: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.dduovn on smithi145
2021-11-26T16:54:10: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:54:10: cephadm [INF] Remove service nfs.test
2021-11-26T16:54:10: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.dduovn...
2021-11-26T16:54:10: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.dduovn from smithi145
2021-11-26T16:54:14: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.dduovn
2021-11-26T16:54:14: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.dduovn-rgw
2021-11-26T16:54:14: cephadm [INF] Purge service nfs.test
2021-11-26T16:54:14: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:54:23: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.dduovn...
2021-11-26T16:54:23: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.dduovn from smithi145
2021-11-26T16:54:24: cephadm [INF] Saving service nfs.test spec with placement count:1
2021-11-26T16:54:25: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.mcpayf
2021-11-26T16:54:25: cephadm [INF] Ensuring nfs.test.0 is in the ganesha grace table
2021-11-26T16:54:25: cephadm [INF] Rados config object exists: conf-nfs.test
2021-11-26T16:54:25: cephadm [INF] Creating key for client.nfs.test.0.0.smithi145.mcpayf-rgw
2021-11-26T16:54:25: cephadm [INF] Deploying daemon nfs.test.0.0.smithi145.mcpayf on smithi145
2021-11-26T16:55:00: cephadm [INF] Remove service ingress.nfs.test
2021-11-26T16:55:00: cephadm [INF] Remove service nfs.test
2021-11-26T16:55:00: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.mcpayf...
2021-11-26T16:55:00: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.mcpayf from smithi145
2021-11-26T16:55:04: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.mcpayf
2021-11-26T16:55:04: cephadm [INF] Removing key for client.nfs.test.0.0.smithi145.mcpayf-rgw
2021-11-26T16:55:04: cephadm [INF] Purge service nfs.test
2021-11-26T16:55:04: cephadm [INF] Removing grace file for nfs.test
2021-11-26T16:55:17: cephadm [INF] Removing orphan daemon nfs.test.0.0.smithi145.mcpayf...
2021-11-26T16:55:17: cephadm [INF] Removing daemon nfs.test.0.0.smithi145.mcpayf from smithi145
</pre>
Orchestrator - Bug #53169 (Resolved): octopus: AssertionError: assert osd.fullname is not None
https://tracker.ceph.com/issues/53169
2021-11-05T10:07:49Z
Sebastian Wagner
<pre>
File "/usr/share/ceph/mgr/cephadm/module.py", line 449, in serve
serve.serve()
File "/usr/share/ceph/mgr/cephadm/serve.py", line 67, in serve
self.mgr.to_remove_osds.process_removal_queue()
File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 664, in process_removal_queue
assert osd.fullname is not None
AssertionError
</pre>
<p>Workaround:</p>
<pre>
ceph config-key rm mgr/cephadm/osd_remove_queue
</pre>
Orchestrator - Bug #53154 (New): t8y: cephadm: error: unrecognized arguments: --keep-logs
https://tracker.ceph.com/issues/53154
2021-11-04T09:38:54Z
Sebastian Wagner
<pre>
2021-11-03T13:15:09.452 DEBUG:teuthology.orchestra.run.smithi191:> sudo /home/ubuntu/cephtest/cephadm rm-cluster --fsid f2abfd4e-3ca4-11ec-8c28-001a4aab830c --force --keep-logs
2021-11-03T13:15:09.584 INFO:teuthology.orchestra.run.smithi191.stderr:usage: cephadm [-h] [--image IMAGE] [--docker] [--data-dir DATA_DIR]
2021-11-03T13:15:09.584 INFO:teuthology.orchestra.run.smithi191.stderr: [--log-dir LOG_DIR] [--logrotate-dir LOGROTATE_DIR]
2021-11-03T13:15:09.585 INFO:teuthology.orchestra.run.smithi191.stderr: [--unit-dir UNIT_DIR] [--verbose] [--timeout TIMEOUT]
2021-11-03T13:15:09.585 INFO:teuthology.orchestra.run.smithi191.stderr: [--retry RETRY] [--env ENV] [--no-container-init]
2021-11-03T13:15:09.585 INFO:teuthology.orchestra.run.smithi191.stderr: {version,pull,inspect-image,ls,list-networks,adopt,rm-daemon,rm-cluster,run,shell,enter,ceph-volume,unit,logs,bootstrap,deplo
y,check-host,prepare-host,add-repo,rm-repo,install,registry-login,gather-facts}
2021-11-03T13:15:09.585 INFO:teuthology.orchestra.run.smithi191.stderr: ...
2021-11-03T13:15:09.585 INFO:teuthology.orchestra.run.smithi191.stderr:cephadm: error: unrecognized arguments: --keep-logs
2021-11-03T13:15:09.595 DEBUG:teuthology.orchestra.run:got remote process result: 2
</pre>
<p><a class="external" href="https://pulpito.ceph.com/swagner-2021-11-03_11:47:26-orch:cephadm-wip-swagner-testing-2021-11-03-0958-distro-basic-smithi/6481219">https://pulpito.ceph.com/swagner-2021-11-03_11:47:26-orch:cephadm-wip-swagner-testing-2021-11-03-0958-distro-basic-smithi/6481219</a></p>
Orchestrator - Bug #52888 (Rejected): ceph-mon: stderr too many arguments: [--default-log-to-jour...
https://tracker.ceph.com/issues/52888
2021-10-11T13:22:43Z
Sebastian Wagner
<pre>
cmd:
- cephadm
- --image
- quay.ceph.io/ceph-ci/daemon-base:latest-pacific-devel
- bootstrap
- --mon-ip
- 192.168.30.10
- --skip-pull
- --initial-dashboard-user
- admin
- --initial-dashboard-password
- VALUE_SPECIFIED_IN_NO_LOG_PARAMETER
- --skip-monitoring-stack
delta: '0:00:35.237981'
end: '2021-10-11 12:08:13.395685'
rc: 1
start: '2021-10-11 12:07:38.157704'
stderr: |-
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman (/bin/podman) version 3.2.3 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: d4061134-2a8b-11ec-bfc5-525400c05239
Verifying IP 192.168.30.10 port 3300 ...
Verifying IP 192.168.30.10 port 6789 ...
Mon IP `192.168.30.10` is in CIDR network `192.168.30.0/24`
- internal network (--cluster-network) has not been provided, OSD replication will default to the public_network
Ceph version: ceph version 16.2.6-210-gb56eac81 (b56eac81c3c91b10bbed736a3b40663b238eeb82) pacific (stable)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Non-zero exit code 1 from /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph-mon --init -e CONTAINER_IMAGE=quay.ceph.io/ceph-ci/daemon-base:latest-pacific-devel -e NODE_NAME=mon0 -e CEPH_USE_RANDOM_NONCE=1 -v /var/log/ceph/d4061134-2a8b-11ec-bfc5-525400c05239:/var/log/ceph:z -v /var/lib/ceph/d4061134-2a8b-11ec-bfc5-525400c05239/mon.mon0:/var/lib/ceph/mon/ceph-mon0:z -v /tmp/ceph-tmpi69j5lvo:/tmp/keyring:z -v /tmp/ceph-tmpmtuz6to4:/tmp/monmap:z quay.ceph.io/ceph-ci/daemon-base:latest-pacific-devel --mkfs -i mon0 --fsid d4061134-2a8b-11ec-bfc5-525400c05239 -c /dev/null --monmap /tmp/monmap --keyring /tmp/keyring --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-journald=true --default-log-to-stderr=false --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-journald=true --default-mon-cluster-log-to-stderr=false
/usr/bin/ceph-mon: stderr too many arguments: [--default-log-to-journald=true,--default-mon-cluster-log-to-journald=true]
Traceback (most recent call last):
File "/sbin/cephadm", line 8055, in <module>
main()
File "/sbin/cephadm", line 8043, in main
r = ctx.func(ctx)
File "/sbin/cephadm", line 1787, in _default_image
return func(ctx)
File "/sbin/cephadm", line 4499, in command_bootstrap
bootstrap_keyring.name, monmap.name)
File "/sbin/cephadm", line 4074, in prepare_create_mon
monmap_path: '/tmp/monmap:z',
File "/sbin/cephadm", line 3442, in run
desc=self.entrypoint, timeout=timeout)
File "/sbin/cephadm", line 1467, in call_throws
raise RuntimeError('Failed command: %s' % ' '.join(command))
RuntimeError: Failed command: /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph-mon --init -e CONTAINER_IMAGE=quay.ceph.io/ceph-ci/daemon-base:latest-pacific-devel -e NODE_NAME=mon0 -e CEPH_USE_RANDOM_NONCE=1 -v /var/log/ceph/d4061134-2a8b-11ec-bfc5-525400c05239:/var/log/ceph:z -v /var/lib/ceph/d4061134-2a8b-11ec-bfc5-525400c05239/mon.mon0:/var/lib/ceph/mon/ceph-mon0:z -v /tmp/ceph-tmpi69j5lvo:/tmp/keyring:z -v /tmp/ceph-tmpmtuz6to4:/tmp/monmap:z quay.ceph.io/ceph-ci/daemon-base:latest-pacific-devel --mkfs -i mon0 --fsid d4061134-2a8b-11ec-bfc5-525400c05239 -c /dev/null --monmap /tmp/monmap --keyring /tmp/keyring --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-journald=true --default-log-to-stderr=false --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-journald=true --default-mon-cluster-log-to-stderr=false
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
</pre>
<p><strong>Reason: This tries to use a quincy cephadm binary to bootstrap a pacific clutser.</strong></p>
Orchestrator - Bug #52481 (Closed): cephadm: install_sysctl: FileNotFoundError: [Errno 2] No suc...
https://tracker.ceph.com/issues/52481
2021-09-01T11:12:32Z
Sebastian Wagner
<pre>
2021-08-30T03:49:00.267808Z service:osd.all-available-devices [ERROR] "Failed to apply: cephadm exited with an error code: 1, stderr:Non-zero exit code 1 from /usr/bin/docker container inspect --format {{.State.Status}} ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.0
/usr/bin/docker: stdout
/usr/bin/docker: stderr Error: No such container: ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.0
Deploy daemon osd.0 ...
Traceback (most recent call last):
File "/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8230, in <module>
main()
File "/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8218, in main
r = ctx.func(ctx)
File "/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 1759, in _default_image
return func(ctx)
File "/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 4326, in command_deploy
ports=daemon_ports)
File "/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 2632, in deploy_daemon
c, osd_fsid=osd_fsid, ports=ports)
File "/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 2801, in deploy_daemon_units
install_sysctl(ctx, fsid, daemon_type)
File "/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 2963, in install_sysctl
_write(conf, lines)
File "/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 2948, in _write
with open(conf, 'w') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/usr/lib/sysctl.d/90-ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.conf'"
</pre>
<p><a class="external" href="https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/W2ZTQSBKBANBH74EMCCOY5HNQ5GHUIEO/">https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/W2ZTQSBKBANBH74EMCCOY5HNQ5GHUIEO/</a></p>
<p>Maybe the /usr/lib/sysctl.d/ folder is missing????</p>
Orchestrator - Bug #51806 (Need More Info): cephadm: stopped contains end up in error state
https://tracker.ceph.com/issues/51806
2021-07-22T15:26:52Z
Sebastian Wagner
<pre>
[ceph: root@sebastians-laptop /]# ceph orch stop node-exporter
Scheduled to stop node-exporter.sebastians-laptop on host 'sebastians-laptop'
[ceph: root@sebastians-laptop /]# ceph orch ps --service-name node-exporter
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID
node-exporter.sebastians-laptop sebastians-laptop *:9100 error 3m ago 4m - - <unknown> <unknown>
</pre>
<pre>
➜ cephadm git:(cephadm-container-name-dashes) ✗ sudo systemctl status ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop | cat
● ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service - Ceph node-exporter.sebastians-laptop for b2f78482-ead5-11eb-9ac0-482ae35a5fbb
Loaded: loaded (/etc/systemd/system/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2021-07-22 16:04:16 CEST; 14s ago
Process: 2907757 ExecStartPre=/bin/rm -f /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-pid /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-cid (code=exited, status=0/SUCCESS)
Process: 2907758 ExecStart=/bin/bash /var/lib/ceph/b2f78482-ead5-11eb-9ac0-482ae35a5fbb/node-exporter.sebastians-laptop/unit.run (code=exited, status=0/SUCCESS)
Main PID: 2907955 (conmon)
Tasks: 8 (limit: 38293)
Memory: 2.7M
CPU: 562ms
CGroup: /system.slice/system-ceph\x2db2f78482\x2dead5\x2d11eb\x2d9ac0\x2d482ae35a5fbb.slice/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service
├─container
│ ├─2907958 /dev/init -- /bin/node_exporter --no-collector.timex --web.listen-address=:9100
│ └─2907960 /bin/node_exporter --no-collector.timex --web.listen-address=:9100
└─supervisor
└─2907955 /usr/bin/conmon --api-version 1 -c b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1 -u b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1 -r /usr/bin/crun -b /var/lib/containers/storage/overlay-containers/b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1/userdata -p /run/containers/storage/overlay-containers/b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1/userdata/pidfile -n ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter-sebastians-laptop --exit-dir /run/libpod/exits --socket-dir-path /run/libpod/socket -l journald --log-level warning --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/containers/storage/overlay-containers/b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1/userdata/oci-log --conmon-pidfile /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /run/containers/storage --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/libpod --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mountopt=nodev --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg=" - textfile" source="node_exporter.go:104"
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg=" - time" source="node_exporter.go:104"
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg=" - uname" source="node_exporter.go:104"
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg=" - vmstat" source="node_exporter.go:104"
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg=" - xfs" source="node_exporter.go:104"
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg=" - zfs" source="node_exporter.go:104"
Jul 22 16:04:16 sebastians-laptop conmon[2907955]: time="2021-07-22T14:04:16Z" level=info msg="Listening on :9100" source="node_exporter.go:170"
Jul 22 16:04:16 sebastians-laptop podman[2907914]: 2021-07-22 16:04:16.490922626 +0200 CEST m=+0.269857979 container start b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1 (image=docker.io/prom/node-exporter:v0.18.1, name=ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter-sebastians-laptop, maintainer=The Prometheus Authors <prometheus-developers@googlegroups.com>)
Jul 22 16:04:16 sebastians-laptop bash[2907914]: b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1
Jul 22 16:04:16 sebastians-laptop systemd[1]: Started Ceph node-exporter.sebastians-laptop for b2f78482-ead5-11eb-9ac0-482ae35a5fbb.
➜ cephadm git:(cephadm-container-name-dashes) ✗ sudo systemctl status ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop | cat
● ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service - Ceph node-exporter.sebastians-laptop for b2f78482-ead5-11eb-9ac0-482ae35a5fbb
Loaded: loaded (/etc/systemd/system/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2021-07-22 16:05:04 CEST; 5s ago
Process: 2907757 ExecStartPre=/bin/rm -f /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-pid /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-cid (code=exited, status=0/SUCCESS)
Process: 2907758 ExecStart=/bin/bash /var/lib/ceph/b2f78482-ead5-11eb-9ac0-482ae35a5fbb/node-exporter.sebastians-laptop/unit.run (code=exited, status=0/SUCCESS)
Process: 2908848 ExecStop=/bin/bash -c /bin/podman stop ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter.sebastians-laptop ; bash /var/lib/ceph/b2f78482-ead5-11eb-9ac0-482ae35a5fbb/node-exporter.sebastians-laptop/unit.stop (code=exited, status=0/SUCCESS)
Process: 2909007 ExecStopPost=/bin/bash /var/lib/ceph/b2f78482-ead5-11eb-9ac0-482ae35a5fbb/node-exporter.sebastians-laptop/unit.poststop (code=exited, status=0/SUCCESS)
Process: 2909008 ExecStopPost=/bin/rm -f /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-pid /run/ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service-cid (code=exited, status=0/SUCCESS)
Main PID: 2907955 (code=exited, status=143)
CPU: 1.030s
Jul 22 16:04:16 sebastians-laptop systemd[1]: Started Ceph node-exporter.sebastians-laptop for b2f78482-ead5-11eb-9ac0-482ae35a5fbb.
Jul 22 16:05:03 sebastians-laptop systemd[1]: Stopping Ceph node-exporter.sebastians-laptop for b2f78482-ead5-11eb-9ac0-482ae35a5fbb...
Jul 22 16:05:03 sebastians-laptop bash[2908849]: Error: no container with name or ID "ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter.sebastians-laptop" found: no such container
Jul 22 16:05:04 sebastians-laptop podman[2908926]: 2021-07-22 16:05:04.364723428 +0200 CEST m=+0.120139014 container remove b771c1403b134d57e8378aa979b297257a14880c249ce901263e1e771725c1d1 (image=docker.io/prom/node-exporter:v0.18.1, name=ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter-sebastians-laptop, maintainer=The Prometheus Authors <prometheus-developers@googlegroups.com>)
Jul 22 16:05:04 sebastians-laptop bash[2908886]: ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter-sebastians-laptop
Jul 22 16:05:04 sebastians-laptop systemd[1]: ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service: Main process exited, code=exited, status=143/n/a
Jul 22 16:05:04 sebastians-laptop bash[2908968]: Error: no container with name or ID "ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb-node-exporter.sebastians-laptop" found: no such container
Jul 22 16:05:04 sebastians-laptop systemd[1]: ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service: Failed with result 'exit-code'.
Jul 22 16:05:04 sebastians-laptop systemd[1]: Stopped Ceph node-exporter.sebastians-laptop for b2f78482-ead5-11eb-9ac0-482ae35a5fbb.
Jul 22 16:05:04 sebastians-laptop systemd[1]: ceph-b2f78482-ead5-11eb-9ac0-482ae35a5fbb@node-exporter.sebastians-laptop.service: Consumed 1.030s CPU time.
➜ cephadm git:(cephadm-container-name-dashes) ✗
</pre>
<p>might be related to conmon is exiting with 143 instead of 0</p>
Orchestrator - Bug #51761 (Closed): journald logs are broken up again
https://tracker.ceph.com/issues/51761
2021-07-21T12:05:47Z
Sebastian Wagner
<pre>
● ceph-58ea4790-ea12-11eb-b009-482ae35a5fbb@mon.sebastians-laptop.service - Ceph mon.sebastians-laptop for 58ea4790-ea12-11eb-b009-482ae35a5fbb
Loaded: loaded (/etc/systemd/system/ceph-58ea4790-ea12-11eb-b009-482ae35a5fbb@.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2021-07-21 12:58:58 CEST; 1h 5min ago
Main PID: 2604173 (conmon)
Tasks: 2 (limit: 38293)
Memory: 780.0K
CPU: 1.985s
CGroup: /system.slice/system-ceph\x2d58ea4790\x2dea12\x2d11eb\x2db009\x2d482ae35a5fbb.slice/ceph-58ea4790-ea12-11eb-b009-482ae35a5fbb@mon.sebastians-laptop.service
└─2604173 /usr/bin/conmon --api-version 1 -c e18002c4d14872ebed3fa366b33755698b3cd1d9733e2450681941123d2c4ff5 -u e18002c4d14872ebed3fa366b33755698b3cd1d9733e2450681941123d2c4ff5 -r /usr/bin/crun -b>
Jul 21 14:04:17 sebastians-laptop conmon[2604173]: audit 2021-07-
Jul 21 14:04:17 sebastians-laptop conmon[2604173]: 21T12:04:17.400646+0000 mon.sebastians-laptop (mon.0)
Jul 21 14:04:17 sebastians-laptop conmon[2604173]: 632 : audit [DBG] from='mgr.14116 127.0.0.1:0/906050170' entity='mgr.sebastians-laptop.upresu' cmd=[{"prefix": "config get", "who": "mon", "key": "public_netwo>
Jul 21 14:04:17 sebastians-laptop conmon[2604173]: debug 2021-07-21T12:04:17.737+0000 7fea0e0fd700 1 mon.sebastians-laptop@0(leader).osd e2 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 369098752 full_>
Jul 21 14:04:18 sebastians-laptop conmon[2604173]: cephadm 2021-07-21T12:04
Jul 21 14:04:18 sebastians-laptop conmon[2604173]: :17.398305+0000 mgr.sebastians-laptop.upresu (mgr.14116) 2097 : cephadm [INF] Filtered out host sebastians-laptop: does not belong to mon public_network ()
Jul 21 14:04:18 sebastians-laptop conmon[2604173]: cephadm 2021-07-21T12:04:17.399260+0000 mgr.sebastians-laptop.upresu (mgr.14116) 2098 : cephadm
Jul 21 14:04:18 sebastians-laptop conmon[2604173]: [INF] Reconfiguring mon.sebastians-laptop (unknown last config time)...
Jul 21 14:04:19 sebastians-laptop conmon[2604173]: cluster 2021-07-21T12:04:19.002733+0000 mgr.sebastians-laptop.upresu (mgr.14116) 2099 : cluster [DBG] pgmap v1944: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
Jul 21 14:04:21 sebastians-laptop conmon[2604173]: cluster 2021-07-21T12:04:21.003262+0000 mgr.sebastians-laptop.upresu (mgr.14116) 2100 : cluster [DBG] pgmap v1945: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
~
</pre>
<p>This is super unfortunate</p>
Orchestrator - Bug #51361 (New): KillMode=none is deprecated
https://tracker.ceph.com/issues/51361
2021-06-25T09:05:39Z
Sebastian Wagner
<p>We chaged systemd unit file killmode to none in <a class="external" href="https://github.com/ceph/ceph/pull/33162#issuecomment-584183316">https://github.com/ceph/ceph/pull/33162#issuecomment-584183316</a></p>
<p>Now we're getting a new warning:</p>
<pre>
Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
</pre>
Orchestrator - Bug #51095 (Can't reproduce): cephadm bootstrap will create two MGRs, if hostname ...
https://tracker.ceph.com/issues/51095
2021-06-04T09:10:08Z
Sebastian Wagner
<pre>
➜ cephadm git:(master) ✗ hostname
localhost.localdomain
➜ cephadm git:(master) ✗ hostname -f
localhost
➜ cephadm git:(master) ✗ hostname --fqdn
localhost
</pre>
<p>I ended up with</p>
<ul>
<li>mgr.localhost.xxxyyyzzz</li>
<li>mgr.localhost.localdomain.jijijijiji</li>
</ul>
Orchestrator - Bug #49740 (Rejected): cephadm: error looking up supplemental groups for container...
https://tracker.ceph.com/issues/49740
2021-03-11T14:45:26Z
Sebastian Wagner
<pre>
2021-03-11T02:22:52.753 DEBUG:teuthology.orchestra.run.smithi155:> sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:409dc4b843da18e60e08a06ad16195db022fb227 shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 4ed20244-8210-11eb-9080-001a4aab830c -- ceph orch host ls --format=json
2021-03-11T02:23:04.927 INFO:teuthology.orchestra.run.smithi155.stderr:Error: error looking up supplemental groups for container ca2233e555ae8ce7d36d0d9febddc59e395702db9da66664ca495315f6ccc323: Unable to find group disk
2021-03-11T02:23:04.958 DEBUG:teuthology.orchestra.run:got remote process result: 126
2021-03-11T02:23:04.959 INFO:tasks.cephadm:Cleaning up testdir ceph.* files...
2021-03-11T02:23:04.960 DEBUG:teuthology.orchestra.run.smithi019:> rm -f /home/ubuntu/cephtest/seed.ceph.conf /home/ubuntu/cephtest/ceph.pub
2021-03-11T02:23:04.967 DEBUG:teuthology.orchestra.run.smithi155:> rm -f /home/ubuntu/cephtest/seed.ceph.conf /home/ubuntu/cephtest/ceph.pub
2021-03-11T02:23:04.974 INFO:tasks.cephadm:Stopping all daemons...
2021-03-11T02:23:04.975 INFO:tasks.cephadm.mon.a:Stopping mon.a...
2021-03-11T02:23:04.975 DEBUG:teuthology.orchestra.run.smithi019:> sudo systemctl stop ceph-4ed20244-8210-11eb-9080-001a4aab830c@mon.a
2021-03-11T02:23:05.731 INFO:journalctl@ceph.mon.a.smithi019.stdout:Mar 11 02:23:05 smithi019 podman[28354]: ceph-4ed20244-8210-11eb-9080-001a4aab830c-mon.a
2021-03-11T02:23:06.038 DEBUG:teuthology.orchestra.run.smithi019:> sudo pkill -f 'journalctl -f -n 0 -u ceph-4ed20244-8210-11eb-9080-001a4aab830c@mon.a.service'
2021-03-11T02:23:06.154 DEBUG:teuthology.orchestra.run:got remote process result: None
2021-03-11T02:23:06.154 INFO:tasks.cephadm.mon.a:Stopped mon.a
2021-03-11T02:23:06.155 INFO:tasks.cephadm.mgr.a:Stopping mgr.a...
2021-03-11T02:23:06.155 DEBUG:teuthology.orchestra.run.smithi019:> sudo systemctl stop ceph-4ed20244-8210-11eb-9080-001a4aab830c@mgr.a
2021-03-11T02:23:07.294 DEBUG:teuthology.orchestra.run.smithi019:> sudo pkill -f 'journalctl -f -n 0 -u ceph-4ed20244-8210-11eb-9080-001a4aab830c@mgr.a.service'
2021-03-11T02:23:07.396 DEBUG:teuthology.orchestra.run:got remote process result: None
2021-03-11T02:23:07.397 INFO:tasks.cephadm.mgr.a:Stopped mgr.a
2021-03-11T02:23:07.397 DEBUG:teuthology.orchestra.run.smithi019:> sudo rm -f /etc/ceph/ceph.conf /etc/ceph/ceph.client.admin.keyring
2021-03-11T02:23:07.410 DEBUG:teuthology.orchestra.run.smithi155:> sudo rm -f /etc/ceph/ceph.conf /etc/ceph/ceph.client.admin.keyring
2021-03-11T02:23:07.427 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_b6eecfe89a4363750d696160843a096384b357a8/teuthology/contextutil.py", line 31, in nested
vars.append(enter())
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/home/teuthworker/src/github.com_liewegas_ceph_1073261e44318a8defeeea181af2bd82af2ac71d/qa/tasks/cephadm.py", line 452, in ceph_bootstrap
stdout=StringIO())
File "/home/teuthworker/src/github.com_liewegas_ceph_1073261e44318a8defeeea181af2bd82af2ac71d/qa/tasks/cephadm.py", line 45, in _shell
**kwargs
File "/home/teuthworker/src/git.ceph.com_git_teuthology_b6eecfe89a4363750d696160843a096384b357a8/teuthology/orchestra/remote.py", line 215, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_b6eecfe89a4363750d696160843a096384b357a8/teuthology/orchestra/run.py", line 455, in run
r.wait()
File "/home/teuthworker/src/git.ceph.com_git_teuthology_b6eecfe89a4363750d696160843a096384b357a8/teuthology/orchestra/run.py", line 161, in wait
self._raise_for_status()
File "/home/teuthworker/src/git.ceph.com_git_teuthology_b6eecfe89a4363750d696160843a096384b357a8/teuthology/orchestra/run.py", line 183, in _raise_for_status
node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi155 with status 126: 'sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:409dc4b843da18e60e08a06ad16195db022fb227 shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 4ed20244-8210-11eb-9080-001a4aab830c -- ceph orch host ls --format=json'
</pre>
<p><a class="external" href="https://pulpito.ceph.com/sage-2021-03-11_01:59:56-rados:cephadm-wip-sage-testing-2021-03-10-1754-distro-basic-smithi/5954943/">https://pulpito.ceph.com/sage-2021-03-11_01:59:56-rados:cephadm-wip-sage-testing-2021-03-10-1754-distro-basic-smithi/5954943/</a></p>
Orchestrator - Bug #49287 (New): podman: setting cgroup config for procHooks process caused: Unit...
https://tracker.ceph.com/issues/49287
2021-02-13T00:54:57Z
Sebastian Wagner
<pre>
2021-02-12T16:27:55.195 INFO:teuthology.orchestra.run.smithi014.stderr:Non-zero exit code 127 from /bin/podman run --rm --ipc=host --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.ceph.io/ceph-ci/ceph:52fc503cf18cf3bb446b840ba00be073017b8373 -e NODE_NAME=smithi014 quay.ceph.io/ceph-ci/ceph:52fc503cf18cf3bb446b840ba0
0be073017b8373 -c %u %g /var/lib/ceph
2021-02-12T16:27:55.195 INFO:teuthology.orchestra.run.smithi014.stderr:stat: stderr Error: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: Unit libpod-056038e1126191fba41d8a037275136f2d7aeec9710b9ee
ff792c06d8544b983.scope not found.: OCI runtime error
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr:Traceback (most recent call last):
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: File "/home/ubuntu/cephtest/cephadm", line 7697, in <module>
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: main()
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: File "/home/ubuntu/cephtest/cephadm", line 7686, in main
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: r = ctx.func(ctx)
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: File "/home/ubuntu/cephtest/cephadm", line 1566, in _infer_fsid
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: return func(ctx)
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: File "/home/ubuntu/cephtest/cephadm", line 1603, in _infer_config
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: return func(ctx)
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: File "/home/ubuntu/cephtest/cephadm", line 1650, in _infer_image
2021-02-12T16:27:55.201 INFO:teuthology.orchestra.run.smithi014.stderr: return func(ctx)
2021-02-12T16:27:55.202 INFO:teuthology.orchestra.run.smithi014.stderr: File "/home/ubuntu/cephtest/cephadm", line 4128, in command_shell
2021-02-12T16:27:55.202 INFO:teuthology.orchestra.run.smithi014.stderr: make_log_dir(ctx, ctx.fsid)
2021-02-12T16:27:55.202 INFO:teuthology.orchestra.run.smithi014.stderr: File "/home/ubuntu/cephtest/cephadm", line 1752, in make_log_dir
2021-02-12T16:27:55.202 INFO:teuthology.orchestra.run.smithi014.stderr: uid, gid = extract_uid_gid(ctx)
2021-02-12T16:27:55.202 INFO:teuthology.orchestra.run.smithi014.stderr: File "/home/ubuntu/cephtest/cephadm", line 2428, in extract_uid_gid
2021-02-12T16:27:55.202 INFO:teuthology.orchestra.run.smithi014.stderr: raise RuntimeError('uid/gid not found')
2021-02-12T16:27:55.202 INFO:teuthology.orchestra.run.smithi014.stderr:RuntimeError: uid/gid not found
</pre>
<p><a class="external" href="https://pulpito.ceph.com/swagner-2021-02-11_11:00:52-rados:cephadm-wip-swagner3-testing-2021-02-10-1322-distro-basic-smithi/5874630">https://pulpito.ceph.com/swagner-2021-02-11_11:00:52-rados:cephadm-wip-swagner3-testing-2021-02-10-1322-distro-basic-smithi/5874630</a></p>
Orchestrator - Bug #49233 (New): cephadm shell: TLS handshake timeout
https://tracker.ceph.com/issues/49233
2021-02-10T11:26:38Z
Sebastian Wagner
<p><a class="external" href="https://pulpito.ceph.com/swagner-2021-02-09_10:28:14-rados:cephadm-wip-swagner2-testing-2021-02-08-1109-pacific-distro-basic-smithi/5871391">https://pulpito.ceph.com/swagner-2021-02-09_10:28:14-rados:cephadm-wip-swagner2-testing-2021-02-08-1109-pacific-distro-basic-smithi/5871391</a></p>
<pre>
2021-02-10T06:43:35.452 INFO:teuthology.orchestra.run.smithi132.stderr:Non-zero exit code 125 from /usr/bin/docker run --rm --ipc=host --net=host --entrypoint stat -e CONTAINER_IMAGE=quay.ceph.io/ceph-ci/ceph:282cc83b9d6c73ac8a35502bb969bc4e36afefcc -e NODE_NAME=smithi132 quay.ceph.io/ceph-ci/ceph:282cc83b9d6c73ac8a35502bb969b
c4e36afefcc -c %u %g /var/lib/ceph
2021-02-10T06:43:35.453 INFO:teuthology.orchestra.run.smithi132.stderr:stat: stderr Unable to find image 'quay.ceph.io/ceph-ci/ceph:282cc83b9d6c73ac8a35502bb969bc4e36afefcc' locally
2021-02-10T06:43:35.453 INFO:teuthology.orchestra.run.smithi132.stderr:stat: stderr /usr/bin/docker: Error response from daemon: Get https://quay.ceph.io/v2/ceph-ci/ceph/manifests/282cc83b9d6c73ac8a35502bb969bc4e36afefcc: net/http: TLS handshake timeout.
2021-02-10T06:43:35.453 INFO:teuthology.orchestra.run.smithi132.stderr:stat: stderr See '/usr/bin/docker run --help'.
2021-02-10T06:43:35.465 INFO:teuthology.orchestra.run.smithi132.stderr:Traceback (most recent call last):
2021-02-10T06:43:35.465 INFO:teuthology.orchestra.run.smithi132.stderr: File "/usr/sbin/cephadm", line 7639, in <module>
2021-02-10T06:43:35.465 INFO:teuthology.orchestra.run.smithi132.stderr: main()
2021-02-10T06:43:35.465 INFO:teuthology.orchestra.run.smithi132.stderr: File "/usr/sbin/cephadm", line 7628, in main
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: r = ctx.func(ctx)
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: File "/usr/sbin/cephadm", line 1616, in _infer_fsid
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: return func(ctx)
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: File "/usr/sbin/cephadm", line 1653, in _infer_config
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: return func(ctx)
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: File "/usr/sbin/cephadm", line 1700, in _infer_image
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: return func(ctx)
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: File "/usr/sbin/cephadm", line 4115, in command_shell
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: make_log_dir(ctx, ctx.fsid)
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: File "/usr/sbin/cephadm", line 1802, in make_log_dir
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: uid, gid = extract_uid_gid(ctx)
2021-02-10T06:43:35.466 INFO:teuthology.orchestra.run.smithi132.stderr: File "/usr/sbin/cephadm", line 2469, in extract_uid_gid
2021-02-10T06:43:35.467 INFO:teuthology.orchestra.run.smithi132.stderr: raise RuntimeError('uid/gid not found')
2021-02-10T06:43:35.467 INFO:teuthology.orchestra.run.smithi132.stderr:RuntimeError: uid/gid not found
2
</pre>
<p>Looks like we need to use the ignore list form</p>
<p><a class="external" href="https://github.com/ceph/ceph/blob/40d5c37930f9b7f883c3b6da57be481f1fe6fb6c/src/cephadm/cephadm#L3078-L3086">https://github.com/ceph/ceph/blob/40d5c37930f9b7f883c3b6da57be481f1fe6fb6c/src/cephadm/cephadm#L3078-L3086</a></p>
<p>also for command_shell()</p>
teuthology - Bug #47441 (Closed): teuthology/task/install: verify_package_version: RuntimeError: ...
https://tracker.ceph.com/issues/47441
2020-09-14T14:25:51Z
Sebastian Wagner
<pre>
2020-09-14T13:32:56.135 INFO:teuthology.packaging:The installed version of ceph is 16.0.0-5509.g7f41e68.el8
2020-09-14T13:32:56.136 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 31, in nested
vars.append(enter())
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 218, in install
install_packages(ctx, package_list, config)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 87, in install_packages
verify_package_version(ctx, config, remote)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/install/__init__.py", line 61, in verify_package_version
pkg=pkg_to_check
RuntimeError: ceph version 16.0.0-5509.g7f41e68c8af was not installed, found 16.0.0-5509.g7f41e68.el8.
</pre>
<p>Looks like the builds were duplicated: See <a class="external" href="https://shaman.ceph.com/repos/ceph/wip-swagner-testing-2020-09-14-1230/7f41e68c8afa3f6a917ca548770374067fdb433f/">https://shaman.ceph.com/repos/ceph/wip-swagner-testing-2020-09-14-1230/7f41e68c8afa3f6a917ca548770374067fdb433f/</a></p>
Orchestrator - Bug #47078 (Need More Info): cephadm: ForwardToSyslog=yes (aka really huge syslog)
https://tracker.ceph.com/issues/47078
2020-08-22T17:07:33Z
Sebastian Wagner
<p><code>ForwardToSyslog=yes</code> is default in man distros.</p>
<p>Which means, we have a chain here:</p>
<p>container's stdout -> systemd service -> journald -> really huge syslog</p>
<p>Which means every log message of every container ends up in /var/log/syslog or something similar.</p>
<p>The Problem is explained here: <a class="external" href="https://github.com/DataDog/datadog-agent/issues/1606#issuecomment-382384062">https://github.com/DataDog/datadog-agent/issues/1606#issuecomment-382384062</a></p>
<p>Q: is this still the case?</p>
<p><strong>Is there any other way to disable ForwardToSyslog for our services?</strong> If we could find a way to disable this behavior for our<br /> containers, we should be fine.</p>
rbd - Bug #46875 (New): TestLibRBD.TestPendingAio: test_librbd.cc:4539: Failure or SIGSEGV
https://tracker.ceph.com/issues/46875
2020-08-10T01:03:45Z
Sebastian Wagner
<pre>
[ RUN ] TestLibRBD.TestPendingAio
using new format!
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/librbd/test_librbd.cc:4539: Failure
Expected equality of these values:
1
rbd_aio_is_complete(comps[i])
Which is: 0
[ FAILED ] TestLibRBD.TestPendingAio (68 ms)
</pre>
<p><a class="external" href="https://jenkins.ceph.com/job/ceph-pull-requests/57209/consoleFull#-361705261e840cee4-f4a4-4183-81dd-42855615f2c1">https://jenkins.ceph.com/job/ceph-pull-requests/57209/consoleFull#-361705261e840cee4-f4a4-4183-81dd-42855615f2c1</a></p>