Ceph : Issues
https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2022-01-17T16:07:38Z
Ceph
Redmine
Orchestrator - Bug #53904 (Duplicate): cephadm: ingress jobs stuck
https://tracker.ceph.com/issues/53904
2022-01-17T16:07:38Z
Sebastian Wagner
<p><a class="external" href="https://pulpito.ceph.com/swagner-2022-01-17_12:42:04-orch:cephadm-wip-swagner-testing-2022-01-17-1014-distro-default-smithi/">https://pulpito.ceph.com/swagner-2022-01-17_12:42:04-orch:cephadm-wip-swagner-testing-2022-01-17-1014-distro-default-smithi/</a></p>
<pre>
2022-01-17T13:17:17.053 DEBUG:teuthology.orchestra.run.smithi155:> sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:1cdf02ebbbdd98a055173cbac4d0171328a564dc shell -c /etc/ceph/ceph.conf -k />
2022-01-17T13:17:17.054 DEBUG:teuthology.orchestra.run.smithi155:> for haproxy in `ceph orch ps | grep ^haproxy.nfs.foo. | awk '"'"'{print $1}'"'"'`; do
2022-01-17T13:17:17.054 DEBUG:teuthology.orchestra.run.smithi155:> ceph orch daemon stop $haproxy
2022-01-17T13:17:17.054 DEBUG:teuthology.orchestra.run.smithi155:> while ! ceph orch ps | grep $haproxy | grep stopped; do sleep 1 ; done
2022-01-17T13:17:17.055 DEBUG:teuthology.orchestra.run.smithi155:> cat /mnt/foo/testfile
2022-01-17T13:17:17.055 DEBUG:teuthology.orchestra.run.smithi155:> echo $haproxy > /mnt/foo/testfile
2022-01-17T13:17:17.055 DEBUG:teuthology.orchestra.run.smithi155:> sync
2022-01-17T13:17:17.055 DEBUG:teuthology.orchestra.run.smithi155:> ceph orch daemon start $haproxy
2022-01-17T13:17:17.056 DEBUG:teuthology.orchestra.run.smithi155:> while ! ceph orch ps | grep $haproxy | grep running; do sleep 1 ; done
2022-01-17T13:17:17.056 DEBUG:teuthology.orchestra.run.smithi155:> done
2022-01-17T13:17:17.056 DEBUG:teuthology.orchestra.run.smithi155:> '
</pre><br />...snip...<br /><pre>
2022-01-17T13:17:20.571 INFO:teuthology.orchestra.run.smithi155.stdout:Check with each haproxy down in turn...
2022-01-17T13:17:21.281 INFO:teuthology.orchestra.run.smithi155.stdout:Scheduled to stop haproxy.nfs.foo.smithi155.xhswck on host 'smithi155'
</pre><br />...snip...
<pre>
2022-01-17T13:17:36.893 INFO:teuthology.orchestra.run.smithi155.stdout:haproxy.nfs.foo.smithi155.xhswck smithi155 *:2049,9002 stopped 0s ago 79s - - <unknown> <un>
2022-01-17T13:17:36.898 INFO:teuthology.orchestra.run.smithi155.stdout:test
2022-01-17T13:17:37.528 INFO:teuthology.orchestra.run.smithi155.stdout:Scheduled to start haproxy.nfs.foo.smithi155.xhswck on host 'smithi155'
</pre><br />...snip...<br /><pre>
2022-01-17T13:17:53.182 INFO:teuthology.orchestra.run.smithi155.stdout:haproxy.nfs.foo.smithi155.xhswck smithi155 *:2049,9002 running (5s) 0s ago 95s - - 2.3.17-d1c9119 14b>
2022-01-17T13:17:53.519 INFO:teuthology.orchestra.run.smithi155.stdout:Scheduled to stop haproxy.nfs.foo.smithi162.mahcqs on host 'smithi162'
</pre><br />...snip...<br /><pre>
2022-01-17T13:18:07.810 INFO:teuthology.orchestra.run.smithi155.stdout:haproxy.nfs.foo.smithi162.mahcqs smithi162 *:2049,9002 stopped 0s ago 102s - - <unknown> <unk>
</pre><br />...snip..<br /><pre>
h[14066]: cephadm 2022-01-17T13:17:53.516345+0000 mgr.smithi155.uoijyc (mgr.14206) 339 : cephadm [INF] Schedule stop daemon haproxy.nfs.foo.smithi162.mahcqs
</pre>
<p>But I never see a start of haproxy.nfs.foo.smithi162.mahcqs again.</p>
Orchestrator - Bug #53321 (Duplicate): cephadm tries to use the system disk for osd specs
https://tracker.ceph.com/issues/53321
2021-11-18T15:31:20Z
Sebastian Wagner
<p>Having this spec:</p>
<pre><code class="yaml syntaxhl"><span class="CodeRay"><span class="key">service_type</span>: <span class="string"><span class="content">osd</span></span>
<span class="key">service_id</span>: <span class="string"><span class="content">hybrid</span></span>
<span class="key">service_name</span>: <span class="string"><span class="content">osd.hybrid</span></span>
<span class="key">placement</span>:
<span class="key">host_pattern</span>: <span class="string"><span class="content">host1</span></span>
<span class="key">spec</span>:
<span class="key">data_devices</span>:
<span class="key">rotational</span>: <span class="string"><span class="content">1</span></span>
<span class="key">db_devices</span>:
<span class="key">rotational</span>: <span class="string"><span class="content">0</span></span>
<span class="key">filter_logic</span>: <span class="string"><span class="content">AND</span></span>
<span class="key">objectstore</span>: <span class="string"><span class="content">bluestore</span></span>
</span></code></pre>
<p>And having the system partition being locked:</p>
<pre>
# ceph orch device ls host1
HOST PATH TYPE DEVICE ID SIZE AVAILABLE REJECT REASONS
host1 /dev/nvme0n1 ssd NVMENVMENVMENVMENVMENVMENVMENVME1 1600G Yes
host1 /dev/nvme1n1 ssd NVMENVMENVMENVMENVMENVMENVMENVME4 1600G Yes
host1 /dev/sda hdd IDSDM_012345678901 64.2G Has GPT headers, locked
host1 /dev/sdb hdd AAAAAAAAAAA_0000000000000b0f1 16.0T Yes
host1 /dev/sdc hdd AAAAAAAAAAA_000000000000089cd 16.0T Yes
host1 /dev/sdd hdd AAAAAAAAAAA_0000000000000af6d 16.0T Yes
host1 /dev/sde hdd AAAAAAAAAAA_00000000000008a4d 16.0T Yes
host1 /dev/sdf hdd AAAAAAAAAAA_0000000000000af9d 16.0T Yes
host1 /dev/sdg hdd BBBBBBBBBBBb_00000000000044a5 16.0T Yes
host1 /dev/sdh hdd BBBBBBBBBBBb_000000000000f7f9 16.0T Yes
host1 /dev/sdi hdd AAAAAAAAAAA_000000000000089a1 16.0T Yes
host1 /dev/sdj hdd AAAAAAAAAAA_00000000000008601 16.0T Yes
host1 /dev/sdk hdd AAAAAAAAAAA_00000000000008a71 16.0T Yes
host1 /dev/sdl hdd CCCCCCCCCCCC_000000000000ebdd 16.0T Yes
host1 /dev/sdm hdd AAAAAAAAAAA_000000000000089bd 16.0T Yes
host1 /dev/sdn hdd AAAAAAAAAAA_0000000000000fd31 16.0T Yes
host1 /dev/sdo hdd AAAAAAAAAAA_0000000000000f9a9 16.0T Yes
host1 /dev/sdp hdd AAAAAAAAAAA_00000000000008565 16.0T Yes
host1 /dev/sdq hdd BBBBBBBBBBBb_000000000000f3e5 16.0T Yes
host1 /dev/sdr hdd MG08SCA16TEY_5000039aa858002d 16.0T Yes
host1 /dev/sds hdd AAAAAAAAAAA_0000000000000fa61 16.0T Yes
host1 /dev/sdt hdd BBBBBBBBBBBb_00000000000046a5 16.0T Yes
host1 /dev/sdu hdd BBBBBBBBBBBb_00000000000041a1 16.0T Yes
host1 /dev/sdv hdd BBBBBBBBBBBb_000000000000f46d 16.0T Yes
host1 /dev/sdw hdd BBBBBBBBBBBb_00000000000046a9 16.0T Yes
host1 /dev/sdx hdd CCCCCCCCCCCC_000000000000eec1 16.0T Yes
host1 /dev/sdy hdd BBBBBBBBBBBb_0000000000004509 16.0T Yes
# mount -l | grep sda
/dev/sda2 on / type ext4 (rw,relatime) [root]
/dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro) [boot-efi]
</pre>
<p>It fails to apply it:</p>
<pre>
# ceph orch ls --format yaml --service-type osd | python3 -c 'import sys, yaml, json; y=yaml.safe_load_all(sys.stdin.read()); print(json.dumps(list(y)))' | jq -r .[0].events[0]
2021-11-18T14:09:26.296374Z service:osd.hybrid [ERROR] "Failed to apply: cephadm exited with an error code: 1, stderr:Non-zero exit code 2 from /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk --init -e CONTAINER_IMAGE=quay.ceph.io/ceph-ci/ceph@sha256:sha1 -e NODE_NAME=host1 -e CEPH_USE_RANDOM_NONCE=1 -e CEPH_VOLUME_OSDSPEC_AFFINITY=hybrid -v /var/run/ceph/fsid:/var/run/ceph:z -v /var/log/ceph/fsid:/var/log/ceph:z -v /var/lib/ceph/fsid/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /tmp/ceph-tmp1gohyav7:/etc/ceph/ceph.conf:z -v /tmp/ceph-tmpblgd2lz7:/var/lib/ceph/bootstrap-osd/ceph.keyring:z quay.ceph.io/ceph-ci/ceph@sha256:sha1 lvm batch --no-auto /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy --db-devices /dev/nvme0n1 /dev/nvme1n1 --yes --no-systemd
/usr/bin/docker: stderr usage: ceph-volume lvm batch [-h] [--db-devices [DB_DEVICES [DB_DEVICES ...]]]
/usr/bin/docker: stderr [--wal-devices [WAL_DEVICES [WAL_DEVICES ...]]]
/usr/bin/docker: stderr [--journal-devices [JOURNAL_DEVICES [JOURNAL_DEVICES ...]]]
/usr/bin/docker: stderr [--auto] [--no-auto] [--bluestore] [--filestore]
/usr/bin/docker: stderr [--report] [--yes]
/usr/bin/docker: stderr [--format {json,json-pretty,pretty}] [--dmcrypt]
/usr/bin/docker: stderr [--crush-device-class CRUSH_DEVICE_CLASS]
/usr/bin/docker: stderr [--no-systemd]
/usr/bin/docker: stderr [--osds-per-device OSDS_PER_DEVICE]
/usr/bin/docker: stderr [--data-slots DATA_SLOTS]
/usr/bin/docker: stderr [--data-allocate-fraction DATA_ALLOCATE_FRACTION]
/usr/bin/docker: stderr [--block-db-size BLOCK_DB_SIZE]
/usr/bin/docker: stderr [--block-db-slots BLOCK_DB_SLOTS]
/usr/bin/docker: stderr [--block-wal-size BLOCK_WAL_SIZE]
/usr/bin/docker: stderr [--block-wal-slots BLOCK_WAL_SLOTS]
/usr/bin/docker: stderr [--journal-size JOURNAL_SIZE]
/usr/bin/docker: stderr [--journal-slots JOURNAL_SLOTS] [--prepare]
/usr/bin/docker: stderr [--osd-ids [OSD_IDS [OSD_IDS ...]]]
/usr/bin/docker: stderr [DEVICES [DEVICES ...]]
/usr/bin/docker: stderr ceph-volume lvm batch: error: GPT headers found, they must be removed on: /dev/sda
Traceback (most recent call last):
File "/var/lib/ceph/fsid/cephadm.hash", line 8331, in <module>
main()
File "/var/lib/ceph/fsid/cephadm.hash", line 8319, in main
r = ctx.func(ctx)
File "/var/lib/ceph/fsid/cephadm.hash", line 1735, in _infer_config
return func(ctx)
File "/var/lib/ceph/fsid/cephadm.hash", line 1676, in _infer_fsid
return func(ctx)
File "/var/lib/ceph/fsid/cephadm.hash", line 1763, in _infer_image
return func(ctx)
File "/var/lib/ceph/fsid/cephadm.hash", line 1663, in _validate_fsid
return func(ctx)
File "/var/lib/ceph/fsid/cephadm.hash", line 5285, in command_ceph_volume
out, err, code = call_throws(ctx, c.run_cmd())
File "/var/lib/ceph/fsid/cephadm.hash", line 1465, in call_throws
raise RuntimeError('Failed command: %s' % ' '.join(command))
RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk --init -e CONTAINER_IMAGE=quay.ceph.io/ceph-ci/ceph@sha256:sha1 -e NODE_NAME=host1 -e CEPH_USE_RANDOM_NONCE=1 -e CEPH_VOLUME_OSDSPEC_AFFINITY=hybrid -v /var/run/ceph/fsid:/var/run/ceph:z -v /var/log/ceph/fsid:/var/log/ceph:z -v /var/lib/ceph/fsid/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /tmp/ceph-tmp1gohyav7:/etc/ceph/ceph.conf:z -v /tmp/ceph-tmpblgd2lz7:/var/lib/ceph/bootstrap-osd/ceph.keyring:z quay.ceph.io/ceph-ci/ceph@sha256:sha1 lvm batch --no-auto /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy --db-devices /dev/nvme0n1 /dev/nvme1n1 --yes --no-systemd"
</pre>
mgr - Bug #52603 (Duplicate): pacific ERROR: test_diskprediction_local (tasks.mgr.test_module_sel...
https://tracker.ceph.com/issues/52603
2021-09-14T11:29:31Z
Sebastian Wagner
<pre>
2021-09-12T15:40:17.598 INFO:teuthology.orchestra.run.smithi033.stderr:2021-09-12T15:40:17.603+0000 7f11dac3c700 20 mgrc start_command cmd: [{"prefix": "mgr self-test module", "module": "diskprediction_local", >
2021-09-12T15:40:17.598 INFO:teuthology.orchestra.run.smithi033.stderr:2021-09-12T15:40:17.603+0000 7f11dac3c700 1 -- 172.21.15.33:0/2949193497 --> [v2:172.21.15.200:6824/18725,v1:172.21.15.200:6825/18725] -- >
2021-09-12T15:40:17.599 INFO:tasks.ceph.mgr.z.smithi200.stderr:2021-09-12T15:40:17.607+0000 7f0443f9c700 -1 no module 'diskprediction_local'
2021-09-12T15:40:17.600 INFO:tasks.ceph.mgr.z.smithi200.stderr:2021-09-12T15:40:17.607+0000 7f0443f9c700 -1 mgr handle_command module 'selftest' command handler threw exception: Module not found
2021-09-12T15:40:17.601 INFO:tasks.ceph.mgr.z.smithi200.stderr:2021-09-12T15:40:17.607+0000 7f0443f9c700 -1 mgr.server reply reply (22) Invalid argument Traceback (most recent call last):
2021-09-12T15:40:17.601 INFO:tasks.ceph.mgr.z.smithi200.stderr: File "/usr/share/ceph/mgr/mgr_module.py", line 1353, in _handle_command
2021-09-12T15:40:17.602 INFO:tasks.ceph.mgr.z.smithi200.stderr: return self.handle_command(inbuf, cmd)
2021-09-12T15:40:17.602 INFO:tasks.ceph.mgr.z.smithi200.stderr: File "/usr/share/ceph/mgr/selftest/module.py", line 142, in handle_command
2021-09-12T15:40:17.602 INFO:tasks.ceph.mgr.z.smithi200.stderr: r = self.remote(command['module'], "self_test")
2021-09-12T15:40:17.602 INFO:tasks.ceph.mgr.z.smithi200.stderr: File "/usr/share/ceph/mgr/mgr_module.py", line 1738, in remote
2021-09-12T15:40:17.603 INFO:tasks.ceph.mgr.z.smithi200.stderr: return self._ceph_dispatch_remote(module_name, method_name,
2021-09-12T15:40:17.603 INFO:tasks.ceph.mgr.z.smithi200.stderr:ImportError: Module not found
2021-09-12T15:40:17.603 INFO:tasks.ceph.mgr.z.smithi200.stderr:
</pre>
<p><a class="external" href="https://pulpito.ceph.com/yuriw-2021-09-12_15:18:23-rados-pacific-distro-basic-smithi/6386597">https://pulpito.ceph.com/yuriw-2021-09-12_15:18:23-rados-pacific-distro-basic-smithi/6386597</a></p>
mgr - Bug #51564 (Duplicate): Dashboard URL address is incorrect after bootstrap with IPV6 address
https://tracker.ceph.com/issues/51564
2021-07-07T12:27:06Z
Sebastian Wagner
<p>Description of problem:<br />After cephadm bootstrap with IPV6 address as mon ip gives the dashboard url which is incorrect,</p>
<p>It got missed square brackets</p>
<pre>
[ceph: root@magna081 /]# ceph mgr services
{
"dashboard": "https://2620:52:0:880:225:90ff:fefc:2536:8443/",
"prometheus": "http://2620:52:0:880:225:90ff:fefc:2536:9283/"
}
[root@magna081 ~]# curl -k https://2620:52:0:880:225:90ff:fefc:2536:8443/
curl: (3) Port number ended with ':'
</pre>
<p>working URL</p>
<pre>
[root@magna081 ~]# curl -k https://[2620:52:0:880:225:90ff:fefc:2536]:8443/
<!doctype html>
<html lang="en-US">
<head>
<meta charset="utf-8">
<title>Red Hat Ceph Storage</title>
<script>
document.write('<base href="' + document.location+ '" />');
</script>
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<link rel="icon" type="image/x-icon" id="cdFavicon" href="assets/RedHat_favicon_0319.svg">
<link rel="stylesheet" href="styles.ba0e881c90a69f89b28e.css"></head>
<body>
<noscript>
<div class="noscript container"
ng-if="false">
<div class="jumbotron alert alert-danger">
<h2 i18n>JavaScript required!</h2>
<p i18n>A browser with JavaScript enabled is required in order to use this service.</p>
<p i18n>When using Internet Explorer, please check your security settings and add this address to your trusted sites.</p>
</div>
</div>
</noscript>
<cd-root></cd-root>
<script src="runtime.15da3e1803be577b1f00.js" defer></script><script src="polyfills.b66d1515aae6fe3887b1.js" defer></script><script src="scripts.6bda3fa7e09a87cd4228.js" defer></script><script src="main.b310ff35ff1005ba4a64.js" defer></script></body>
</html>
</pre>
<p>Steps to Reproduce:</p>
<ol>
<li>Bootstrap with IPV6 address as --mon-ip</li>
<li>Get dashboard URL using `ceph mgr services`</li>
<li>try to login to dashboard</li>
</ol>
Orchestrator - Documentation #50883 (Duplicate): cephadm: mds_cache_memory_limit
https://tracker.ceph.com/issues/50883
2021-05-19T11:18:57Z
Sebastian Wagner
<p>Users can apply:</p>
<pre><code class="yaml syntaxhl"><span class="CodeRay"><span class="key">service_type</span>: <span class="string"><span class="content">mds</span></span>
<span class="key">service_id</span>: <span class="string"><span class="content">fsname</span></span>
<span class="key">placement</span>:
<span class="key">count</span>: <span class="string"><span class="content">2</span></span>
<span class="key">config</span>:
<span class="key">mds_cache_memory_limit</span>: <span class="string"><span class="content">8Gi</span></span>
</span></code></pre>
<p>And this would set the config option for all daemons of that service.</p>
<p>We should probably add this to the documentation</p>
Orchestrator - Cleanup #50117 (Duplicate): orch apply kind: introduce another layer on top of ser...
https://tracker.ceph.com/issues/50117
2021-04-02T22:32:23Z
Sebastian Wagner
<p><strong>Current situation</strong></p>
<p>Right now, we have already three different type of things that <strong>ceph orch apply</strong> supports:</p>
<ul>
<li>Standard services: <strong>service_type: mon|mgr|rgw...</strong></li>
<li>Host specs: <strong>service_type: host</strong></li>
<li>Generic Deployments: <strong>service_type: container</strong></li>
</ul>
<p><strong>Idea</strong></p>
<p>The idea is to distinguish them better for the user. Make it clear that a host spec is not a service.</p>
<pre><code class="yaml syntaxhl"><span class="CodeRay"><span class="key">kind</span>: <span class="string"><span class="content">host</span></span>
<span class="key">hostname</span>: <span class="string"><span class="content">foobar</span></span>
<span class="key">addr</span>: <span class="string"><span class="content">127.0.0.1</span></span>
</span></code></pre>
<p>is better than</p>
<pre><code class="yaml syntaxhl"><span class="CodeRay"><span class="key">service_type</span>: <span class="string"><span class="content">host</span></span>
<span class="key">hostname</span>: <span class="string"><span class="content">foobar</span></span>
<span class="key">addr</span>: <span class="string"><span class="content">127.0.0.1</span></span>
</span></code></pre>
<p>Thus, I propose to add another layer called <strong>kind: host|service|deployment</strong> on top of service_type.</p>
<p>Like so:</p>
<pre><code class="yaml syntaxhl"><span class="CodeRay"><span class="key">kind</span>: <span class="string"><span class="content">host</span></span>
<span class="key">hostname</span>: <span class="string"><span class="content">myhost1</span></span>
<span class="key">labels</span>:
- <span class="string"><span class="content">rgw</span></span>
<span class="head"><span class="head">---</span></span>
<span class="key">kind</span>: <span class="string"><span class="content">service</span></span>
<span class="key">service_type</span>: <span class="string"><span class="content">rgw</span></span>
<span class="key">service_id</span>: <span class="string"><span class="content">foobar</span></span>
<span class="key">spec</span>:
<span class="key">realm</span>: <span class="string"><span class="content">myrealm</span></span>
<span class="key">zone</span>: <span class="string"><span class="content">myzone</span></span>
<span class="head"><span class="head">---</span></span>
<span class="key">kind</span>: <span class="string"><span class="content">rgw_realm</span></span>
<span class="key">name</span>: <span class="string"><span class="content">realm-a</span></span>
<span class="key">pull_endpoint</span>: <span class="string"><span class="content">http://10.2.105.133:80</span></span>
<span class="head"><span class="head">---</span></span>
<span class="key">kind</span>: <span class="string"><span class="content">rgw_zone</span></span>
<span class="key">name</span>: <span class="string"><span class="content">zone-a</span></span>
<span class="head"><span class="head">---</span></span>
<span class="key">kind</span>: <span class="string"><span class="content">deployment</span></span>
<span class="key">service_name</span>: <span class="string"><span class="content">prometheus-webhook-snmp</span></span>
<span class="key">placement</span>:
<span class="error">ADD_PLACEMENT_HERE</span>
<span class="key">image</span>: <span class="string"><span class="content">hello:latest</span></span>
<span class="key">ports</span>:
- <span class="string"><span class="content">999: 999</span></span>
<span class="key">envs</span>:
- <span class="string"><span class="content">args="--debug --snmp-host=ADD_HOST_GATEWAY_HERE --metrics" </span></span>
</span></code></pre>
<pre>
ceph orch apply -i cluster.yaml
</pre>
<p>or</p>
<pre>
cephadm bootstrap --apply-spec cluster.yaml
</pre>
Orchestrator - Bug #49191 (Duplicate): cephadm: service_type: osd: Failed to apply: ''NoneType'' ...
https://tracker.ceph.com/issues/49191
2021-02-05T14:29:33Z
Sebastian Wagner
<pre>
ceph orch ls --format yaml
</pre><br /><pre><code class="yaml syntaxhl"><span class="CodeRay"><span class="key">service_type</span>: <span class="string"><span class="content">osd</span></span>
<span class="key">service_id</span>: <span class="string"><span class="content">dashboard-p14040</span></span>
<span class="key">service_name</span>: <span class="string"><span class="content">osd.dashboard-p14040</span></span>
<span class="key">placement</span>:
<span class="key">hosts</span>:
- <span class="string"><span class="content">lnx92252</span></span>
<span class="key">spec</span>:
<span class="key">filter_logic</span>: <span class="string"><span class="content">AND</span></span>
<span class="key">objectstore</span>: <span class="string"><span class="content">bluestore</span></span>
<span class="key">status</span>:
<span class="key">container_image_id</span>: <span class="string"><span class="content">xyz</span></span>
<span class="key">container_image_name</span>: <span class="string"><span class="content">ceph/ceph:latest</span></span>
<span class="key">last_refresh</span>: <span class="string"><span class="content">'2021-02-01T13:53:11.218862'</span></span>
<span class="key">running</span>: <span class="string"><span class="content">1</span></span>
<span class="key">size</span>: <span class="string"><span class="content">0</span></span>
<span class="key">events</span>:
- <span class="string"><span class="content">'2021-01-30T18:40:07.414091 service:osd.dashboard [ERROR] "Failed</span></span>
<span class="key">to apply</span>: <span class="string"><span class="content">''NoneType'' object has no attribute ''paths''"'</span></span>
</span></code></pre>
Orchestrator - Bug #48463 (Duplicate): mon.c: Error: invalid config provided: CapAdd and privileg...
https://tracker.ceph.com/issues/48463
2020-12-04T11:17:07Z
Sebastian Wagner
<p><a class="external" href="https://pulpito.ceph.com/swagner-2020-12-04_10:02:29-rados:cephadm-wip-jmolmo-testing-2020-12-02-1452-distro-basic-smithi/5680473/">https://pulpito.ceph.com/swagner-2020-12-04_10:02:29-rados:cephadm-wip-jmolmo-testing-2020-12-02-1452-distro-basic-smithi/5680473/</a></p>
<pre>
['/bin/podman', 'run', '--rm', '--net=host', '-e', 'CONTAINER_IMAGE=docker.io/ceph/ceph:v15.2.0', '-e', 'NODE_NAME=smithi135', '-v', '/var/log/ceph/c4502caa-3619-11eb-980d-001a4
aab830c:/var/log/ceph:z', '-v', '/tmp/ceph-tmpr_67xdiq:/etc/ceph/ceph.client.admin.keyring:z', '-v', '/tmp/ceph-tmpl0cckyum:/etc/ceph/ceph.conf:z', '-v', '/var/lib/ceph/c4502caa-3619-11eb-980d-001a4aab830c/mon.a:/var/lib/ceph/mon/ceph-a:z', '--entrypoint', '/usr
/bin/ceph', 'docker.io/ceph/ceph:v15.2.0', 'config', 'generate-minimal-conf', '-o', '/var/lib/ceph/mon/ceph-a/config']
</pre>
<p>Turns out, we're now installing podman 2 and then strting the upgrade from 15.2.0, which does not support podman 2.</p>
Orchestrator - Bug #48275 (Duplicate): cephadm: get_last_local_ceph_image returns "<none>:<none>"
https://tracker.ceph.com/issues/48275
2020-11-18T13:17:23Z
Sebastian Wagner
<pre>
node2:~ # cephadm ceph-volume inventory
Inferring fsid 83c06a6e-298b-11eb-9c2a-525400501d50
Using recent ceph image <none>:<none>
Non-zero exit code 125 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint stat -e CONTAINER_IMAGE=<none>:<none> -e NODE_NAME=node2 <none>:<none> -c %u %g /var/lib/ceph
stat:stderr Error: invalid reference format
Traceback (most recent call last):
File "/usr/sbin/cephadm", line 6043, in <module>
r = args.func()
File "/usr/sbin/cephadm", line 1299, in _infer_fsid
return func()
File "/usr/sbin/cephadm", line 1358, in _infer_image
return func()
File "/usr/sbin/cephadm", line 3558, in command_ceph_volume
make_log_dir(args.fsid)
File "/usr/sbin/cephadm", line 1453, in make_log_dir
uid, gid = extract_uid_gid()
File "/usr/sbin/cephadm", line 2061, in extract_uid_gid
raise RuntimeError('uid/gid not found')
RuntimeError: uid/gid not found
</pre>
Orchestrator - Bug #47340 (Duplicate): _list_devices: 'NoneType' object has no attribute 'get'
https://tracker.ceph.com/issues/47340
2020-09-07T16:01:49Z
Sebastian Wagner
<p><a class="external" href="https://pulpito.ceph.com/swagner-2020-09-07_12:17:11-rados:cephadm-wip-swagner2-testing-2020-09-07-1101-distro-basic-smithi/5415754/">https://pulpito.ceph.com/swagner-2020-09-07_12:17:11-rados:cephadm-wip-swagner2-testing-2020-09-07-1101-distro-basic-smithi/5415754/</a></p>
<pre>
2020-09-07T12:59:11.859 INFO:teuthology.orchestra.run.smithi150.stderr:Error EINVAL: Traceback (most recent call last):
2020-09-07T12:59:11.860 INFO:teuthology.orchestra.run.smithi150.stderr: File "/usr/share/ceph/mgr/mgr_module.py", line 1177, in _handle_command
2020-09-07T12:59:11.860 INFO:teuthology.orchestra.run.smithi150.stderr: return self.handle_command(inbuf, cmd)
2020-09-07T12:59:11.860 INFO:teuthology.orchestra.run.smithi150.stderr: File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 141, in handle_command
2020-09-07T12:59:11.861 INFO:teuthology.orchestra.run.smithi150.stderr: return dispatch[cmd['prefix']].call(self, cmd, inbuf)
2020-09-07T12:59:11.861 INFO:teuthology.orchestra.run.smithi150.stderr: File "/usr/share/ceph/mgr/mgr_module.py", line 318, in call
2020-09-07T12:59:11.861 INFO:teuthology.orchestra.run.smithi150.stderr: return self.func(mgr, **kwargs)
2020-09-07T12:59:11.861 INFO:teuthology.orchestra.run.smithi150.stderr: File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 103, in <lambda>
2020-09-07T12:59:11.861 INFO:teuthology.orchestra.run.smithi150.stderr: wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)
2020-09-07T12:59:11.862 INFO:teuthology.orchestra.run.smithi150.stderr: File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 92, in wrapper
2020-09-07T12:59:11.862 INFO:teuthology.orchestra.run.smithi150.stderr: return func(*args, **kwargs)
2020-09-07T12:59:11.862 INFO:teuthology.orchestra.run.smithi150.stderr: File "/usr/share/ceph/mgr/orchestrator/module.py", line 421, in _list_devices
2020-09-07T12:59:11.862 INFO:teuthology.orchestra.run.smithi150.stderr: if d.lsm_data.get('ledSupport', None):
2020-09-07T12:59:11.862 INFO:teuthology.orchestra.run.smithi150.stderr:AttributeError: 'NoneType' object has no attribute 'get'
</pre>
Orchestrator - Feature #45905 (Duplicate): cephadm: errors in serve() should create a HEALTH warning
https://tracker.ceph.com/issues/45905
2020-06-05T09:50:18Z
Sebastian Wagner
<p>othwerwise users need to search the mgr log for hints manually.</p>
Orchestrator - Documentation #45564 (Duplicate): cephadm: document workaround for accessing the a...
https://tracker.ceph.com/issues/45564
2020-05-15T09:35:17Z
Sebastian Wagner
<pre>
$ ceph daemon mgr.ceph03 config show
admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
</pre>
<pre>
$ ceph --admin-daemon /var/run/ceph/mgr.ceph03 config show
No such file or directory
</pre>
<p>Probably, users have to run</p>
<pre>
cephadm enter mgr.ceph03
</pre>
<p>to access the admin socket.</p>
Orchestrator - Bug #45258 (Duplicate): cephadm: iSCSIServiceSpec: user/password should be mandato...
https://tracker.ceph.com/issues/45258
2020-04-24T11:59:15Z
Sebastian Wagner
<p>Some arguments in iSCSIServiceSpec should be mandatory, like user/password/port or at least cephadm should generate default value for them.</p>
Orchestrator - Bug #45197 (Duplicate): cephadm: rgw: failed to bind address 0.0.0.0:80
https://tracker.ceph.com/issues/45197
2020-04-23T09:01:45Z
Sebastian Wagner
<p>Despite running as root, RGW still cannot bind to port 80.</p>
<pre>
Apr 22 22:36:04 node02 systemd[1]: Starting Ceph rgw.myorg.us-east-1.node02.wzdozc for xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxx...
Apr 22 22:36:04 node02 podman[3306]: Error: no container with name or ID ceph-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxx-rgw.myorg.us-east-1.node02.wzdozc found: no such container
Apr 22 22:36:04 node02 systemd[1]: Started Ceph rgw.myorg.us-east-1.node02.wzdozc for xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxx.
Apr 22 22:36:04 node02 podman[3316]: 2020-04-22 22:36:04.900573616 +0300 +03 m=+0.195480600 container create 2df051c99b6a1a2b069576796b7a283d195da289ebace699674e60e5671beb52 (image=docker.io/ceph/ceph:v15, name=ceph-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxx-rgw.myorg.us-east-1.node02.wzdozc)
Apr 22 22:36:04 node02 systemd[1]: Started libpod-conmon-2df051c99b6a1a2b069576796b7a283d195da289ebace699674e60e5671beb52.scope.
Apr 22 22:36:05 node02 systemd[1]: Started libcontainer container 2df051c99b6a1a2b069576796b7a283d195da289ebace699674e60e5671beb52.
Apr 22 22:36:05 node02 bash[1571]: debug 2020-04-22T19:36:05.146+0000 7fee9ce70700 1 mon.node02@1(peon).osd e55 e55: 4 total, 4 up, 4 in
Apr 22 22:36:05 node02 bash[1576]: debug 2020-04-22T19:36:05.389+0000 7f135643a700 0 <cls> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.1/rpm/el8/BUILD/ceph-15.2.1/src/cls/queue/cls_queue_src.cc:54: ERROR: queue_read_head: failed to decode queue start: buffer::end_of_buffer
Apr 22 22:36:05 node02 bash[1561]: debug 2020-04-22T19:36:05.462+0000 7f10ef94a700 0 log_channel(cluster) log [DBG] : pgmap v616: 129 pgs: 1 creating+activating, 12 creating+peering, 7 unknown, 109 active+clean; 1.9 KiB data, 29 MiB used, 20 GiB / 24 GiB avail
Apr 22 22:36:05 node02 bash[1576]: debug 2020-04-22T19:36:05.521+0000 7f135643a700 0 <cls> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.1/rpm/el8/BUILD/ceph-15.2.1/src/cls/queue/cls_queue_src.cc:54: ERROR: queue_read_head: failed to decode queue start: buffer::end_of_buffer
Apr 22 22:36:05 node02 bash[1576]: debug 2020-04-22T19:36:05.541+0000 7f135643a700 0 <cls> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.1/rpm/el8/BUILD/ceph-15.2.1/src/cls/queue/cls_queue_src.cc:54: ERROR: queue_read_head: failed to decode queue start: buffer::end_of_buffer
Apr 22 22:36:05 node02 bash[1576]: debug 2020-04-22T19:36:05.626+0000 7f135643a700 0 <cls> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.1/rpm/el8/BUILD/ceph-15.2.1/src/cls/queue/cls_queue_src.cc:54: ERROR: queue_read_head: failed to decode queue start: buffer::end_of_buffer
[.....]
Apr 22 22:36:06 node02 bash[1571]: audit 2020-04-22T19:36:05.125075+0000 mon.node01 (mon.0) 82 : audit [INF] from='client.? 192.168.100.101:0/3331055036' entity='client.rgw.myorg.us-east-1.node01.dgfdkv' cmd='[{"prefix": "osd pool set", "pool": "us-east-1.rgw.meta", "var": "pg_num_min", "val": "8"}]': finished
Apr 22 22:36:06 node02 bash[1571]: cluster 2020-04-22T19:36:05.125132+0000 mon.node01 (mon.0) 83 : cluster [DBG] osdmap e55: 4 total, 4 up, 4 in
Apr 22 22:36:06 node02 bash[1571]: cluster 2020-04-22T19:36:05.463711+0000 mgr.node02.qbhwjb (mgr.44101) 604 : cluster [DBG] pgmap v616: 129 pgs: 1 creating+activating, 12 creating+peering, 7 unknown, 109 active+clean; 1.9 KiB data, 29 MiB used, 20 GiB / 24 GiB avail
Apr 22 22:36:06 node02 bash[3314]: debug 2020-04-22T19:36:06.337+0000 7fb93538f240 0 set uid:gid to 167:167 (ceph:ceph)
Apr 22 22:36:06 node02 bash[3314]: debug 2020-04-22T19:36:06.337+0000 7fb93538f240 0 ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable), process radosgw, pid 1
Apr 22 22:36:06 node02 bash[3314]: debug 2020-04-22T19:36:06.337+0000 7fb93538f240 0 framework: beast
Apr 22 22:36:06 node02 bash[3314]: debug 2020-04-22T19:36:06.337+0000 7fb93538f240 0 framework conf key: port, val: 80
Apr 22 22:36:06 node02 bash[3314]: debug 2020-04-22T19:36:06.337+0000 7fb93538f240 1 radosgw_Main not setting numa affinity
Apr 22 22:36:06 node02 bash[3314]: debug 2020-04-22T19:36:06.819+0000 7fb93538f240 0 framework: beast
Apr 22 22:36:06 node02 bash[3314]: debug 2020-04-22T19:36:06.819+0000 7fb93538f240 0 framework conf key: ssl_certificate, val: config://rgw/cert/$realm/$zone.crt
Apr 22 22:36:06 node02 bash[3314]: debug 2020-04-22T19:36:06.819+0000 7fb93538f240 0 framework conf key: ssl_private_key, val: config://rgw/cert/$realm/$zone.key
Apr 22 22:36:06 node02 bash[3314]: debug 2020-04-22T19:36:06.819+0000 7fb93538f240 0 starting handler: beast
Apr 22 22:36:06 node02 podman[3316]: 2020-04-22 22:36:06.890791974 +0300 +03 m=+2.185699058 container died 2df051c99b6a1a2b069576796b7a283d195da289ebace699674e60e5671beb52 (image=docker.io/ceph/ceph:v15, name=ceph-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxx-rgw.myorg.us-east-1.node02.wzdozc)
Apr 22 22:36:06 node02 systemd[1]: libpod-2df051c99b6a1a2b069576796b7a283d195da289ebace699674e60e5671beb52.scope: Consumed 512ms CPU time
Apr 22 22:36:07 node02 podman[3316]: 2020-04-22 22:36:07.033069665 +0300 +03 m=+2.327976749 container remove 2df051c99b6a1a2b069576796b7a283d195da289ebace699674e60e5671beb52 (image=docker.io/ceph/ceph:v15, name=ceph-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxx-rgw.myorg.us-east-1.node02.wzdozc)
Apr 22 22:36:07 node02 systemd[1]: ceph-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxx@rgw.myorg.us-east-1.node02.wzdozc.service: Main process exited, code=exited, status=13/n/a
Apr 22 22:36:07 node02 systemd[1]: ceph-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxx@rgw.myorg.us-east-1.node02.wzdozc.service: Failed with result 'exit-code'.
Apr 22 22:36:07 node02 bash[1571]: debug 2020-04-22T19:36:07.341+0000 7fee9ce70700 0 mon.node02@1(peon) e3 handle_command mon_command({"prefix": "osd pool set", "pool": "us-east-1.rgw.meta", "var": "pg_num", "val": "8"} v 0) v1
Apr 22 22:36:07 node02 bash[1571]: debug 2020-04-22T19:36:07.341+0000 7fee9ce70700 0 log_channel(audit) log [INF] : from='mgr.44101 192.168.100.102:0/2789319733' entity='mgr.node02.qbhwjb' cmd=[{"prefix": "osd pool set", "pool": "us-east-1.rgw.meta", "var": "pg_num", "val": "8"}]: dispatch
Apr 22 22:36:07 node02 bash[1561]: debug 2020-04-22T19:36:07.464+0000 7f10ef94a700 0 log_channel(cluster) log [DBG] : pgmap v617: 129 pgs: 1 creating+activating, 12 creating+peering, 116 active+clean; 2.2 KiB data, 30 MiB used, 20 GiB / 24 GiB avail; 0 B/s rd, 381 B/s wr, 1 op/s
Apr 22 22:36:07 node02 firewalld[956]: WARNING: AllowZoneDrifting is enabled. This is considered an insecure configuration option. It will be removed in a future release. Please consider disabling it now.
[....]
cmd=[{"prefix": "osd pool set", "pool": "us-east-1.rgw.meta", "var": "pg_num_actual", "val": "30"}]: dispatch
Apr 22 22:36:19 node02 bash[3695]: debug 2020-04-22T19:36:19.533+0000 7f7e36959240 0 framework: beast
Apr 22 22:36:19 node02 bash[3695]: debug 2020-04-22T19:36:19.533+0000 7f7e36959240 0 framework conf key: ssl_certificate, val: config://rgw/cert/$realm/$zone.crt
Apr 22 22:36:19 node02 bash[3695]: debug 2020-04-22T19:36:19.533+0000 7f7e36959240 0 framework conf key: ssl_private_key, val: config://rgw/cert/$realm/$zone.key
Apr 22 22:36:19 node02 bash[3695]: debug 2020-04-22T19:36:19.533+0000 7f7e36959240 0 starting handler: beast
Apr 22 22:36:19 node02 bash[3695]: debug 2020-04-22T19:36:19.552+0000 7f7e36959240 -1 failed to bind address 0.0.0.0:80: Permission denied
Apr 22 22:36:19 node02 bash[3695]: debug 2020-04-22T19:36:19.553+0000 7f7e36959240 -1 ERROR: failed initializing frontend
Apr 22 22:36:19 node02 systemd[1]: libpod-473588aa4eb799f976544fdbee5ca1068346f7c69c802cc037b3d7e933cee1f9.scope: Consumed 522ms CPU time
Apr 22 22:36:19 node02 podman[3697]: 2020-04-22 22:36:19.61003749 +0300 +03 m=+2.235166237 container died 473588aa4eb799f976544fdbee5ca1068346f7c69c802cc037b3d7e933cee1f9 (image=docker.io/ceph/ceph:v15, name=ceph-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxx-rgw.myorg.us-east-1.node02.wzdozc)
Apr 22 22:36:19 node02 podman[3697]: 2020-04-22 22:36:19.841846867 +0300 +03 m=+2.466975714 container remove 473588aa4eb799f976544fdbee5ca1068346f7c69c802cc037b3d7e933cee1f9 (image=docker.io/ceph/ceph:v15, name=ceph-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxx-rgw.myorg.us-east-1.node02.wzdozc)
Apr 22 22:36:19 node02 systemd[1]: ceph-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxx@rgw.myorg.us-east-1.node02.wzdozc.service: Main process exited, code=exited, status=13/n/a
Apr 22 22:36:19 node02 systemd[1]: ceph-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxx@rgw.myorg.us-east-1.node02.wzdozc.service: Failed with result 'exit-code'.
</pre>
<p>Might relate to <a class="external" href="https://github.com/rook/rook/issues/5106">https://github.com/rook/rook/issues/5106</a></p>
Orchestrator - Feature #39057 (Duplicate): orchestrator_cli should check minimum device size
https://tracker.ceph.com/issues/39057
2019-04-01T08:07:47Z
Sebastian Wagner
<p>Otherwise we might get an error like this:</p>
<pre>
2019-04-01 08:01:11.465649 I | --> RuntimeError: Unable to use device 4.00 GB /dev/vdb, LVs would be smaller than 5GB
</pre>
<pre>
[root@kubic-1 /]# k get CephCluster -o yaml
apiVersion: v1
items:
- apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata: ...
spec:
cephVersion:
allowUnsupported: true
image: 192.168.122.1:443/ceph/ceph:latest
dashboard:
enabled: true
dataDirHostPath: /var/lib/rook
mon:
allowMultiplePerNode: true
count: 3
preferredCount: 0
network:
hostNetwork: false
rbdMirroring:
workers: 0
storage:
config:
databaseSizeMB: "1024"
journalSizeMB: "1024"
osdsPerDevice: "1"
directories:
- config: null
path: /var/lib/rook
nodes:
- config: null
devices:
- FullPath: ""
config: null
name: vdb
name: kubic-1
resources: {}
useAllDevices: false
status:
state: Updating
kind: List
metadata:
resourceVersion: ""
selfLink: ""
</pre>
<pre>
[root@kubic-1 /]# ceph orchestrator device ls
Host kubic-1:
Device Path Type Size Rotates Available Model
vdb hdd 5120M False False
vda hdd 24.0G False False
Host kubic-2:
Device Path Type Size Rotates Available Model
vdb hdd 5120M False False
vda hdd 24.0G False False
</pre>