Bug #44270
closedUnder certain circumstances, "ceph orch apply" returns success even when no OSDs are created
0%
Description
On a single-node cluster, the "cephadm bootstrap" command deploys 1 MGR and 1 MON.
On very recent versions of master, if one very quickly runs the following "ceph orch osd create" command after "cephadm bootstrap" finishes
echo '{"testing_dg_admin": {"host_pattern": "admin*", "data_devices": {"all": true}}}' | ceph orch apply -i -
the command will complete with status code 0, yet no OSDs get created!
(Note: this was taken from a system where the host was called "admin.octopus_test1.com". To reproduce, change "admin" to the short hostname of the host.)
If I insert a "sleep 60" between "cephadm bootstrap" and "ceph orch osd create", the OSDs get created according to the drive groups JSON provided.
Note: this behavior was introduced quite recently.
Taken from the workaround:¶
Upon closer examination, we can see that the following commands succeed without causing any OSDs to be deployed:
`echo {\"testing_dg_node2\": {\"host_pattern\": \"node2*\", \"data_devices\": {\"all\": true}}} | ceph orch osd create -i -`
This is because the orchestrator only knows about the drives on node1:
admin:~ # ceph orch device ls HOST PATH TYPE SIZE DEVICE AVAIL REJECT REASONS node1 /dev/vdb hdd 8192M 259451 True node1 /dev/vdc hdd 8192M 652460 True node1 /dev/vda hdd 42.0G False locked
Yet "cephadm ceph-volume inventory" sees the drives when run on node2:
node2:~ # cephadm ceph-volume inventory INFO:cephadm:Inferring fsid a581fad8-5ccb-11ea-966f-525400bb7fa5 INFO:cephadm:/usr/bin/podman:stdout INFO:cephadm:/usr/bin/podman:stdout Device Path Size rotates available Model name INFO:cephadm:/usr/bin/podman:stdout /dev/vdb 8.00 GB True True INFO:cephadm:/usr/bin/podman:stdout /dev/vdc 8.00 GB True True INFO:cephadm:/usr/bin/podman:stdout /dev/vda 42.00 GB True False Device Path Size rotates available Model name /dev/vdb 8.00 GB True True /dev/vdc 8.00 GB True True /dev/vda 42.00 GB True False
Updated by Sebastian Wagner about 4 years ago
we might need some more in-depth validation of drive groups here.
Updated by Sage Weil about 4 years ago
- Status changed from New to Triaged
i bet the problem is that the drive inventory isn't populated yet immediately after bootstrap.
Updated by Nathan Cutler about 4 years ago
For ceph-salt, I have a workaround here:
Updated by Sebastian Wagner about 4 years ago
- Related to Feature #44414: bubble up errors during 'apply' phase to 'cluster warnings' added
Updated by Sebastian Wagner about 4 years ago
new workaround: https://github.com/ceph/ceph-salt/pull/109
Updated by Sebastian Wagner about 4 years ago
- Description updated (diff)
- Priority changed from Normal to High
Which means, we have to track which nodes are scanned and bail out, if we don't have the inventory yet?
Updated by Sebastian Wagner almost 4 years ago
- Related to Bug #44824: cephadm: adding osd device is not idempotent added
Updated by Juan Miguel Olmo Martínez about 3 years ago
- Assignee set to Juan Miguel Olmo Martínez
Updated by Sebastian Wagner almost 3 years ago
- Assignee deleted (
Juan Miguel Olmo Martínez)
Updated by Sebastian Wagner over 2 years ago
- Subject changed from Under certain circumstances, "ceph orch osd create" returns success even when no OSDs are created to Under certain circumstances, "ceph orch apply" returns success even when no OSDs are created
- Description updated (diff)
Updated by Sebastian Wagner over 2 years ago
- Status changed from Triaged to Can't reproduce
Ok, let me close this for now as can't reproduce. Instead let's wait till this pops up again.