Project

General

Profile

Actions

Bug #44270

closed

Under certain circumstances, "ceph orch apply" returns success even when no OSDs are created

Added by Nathan Cutler about 4 years ago. Updated over 2 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
cephadm/osd
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On a single-node cluster, the "cephadm bootstrap" command deploys 1 MGR and 1 MON.

On very recent versions of master, if one very quickly runs the following "ceph orch osd create" command after "cephadm bootstrap" finishes

echo '{"testing_dg_admin": {"host_pattern": "admin*", "data_devices": {"all": true}}}' | ceph orch apply -i -

the command will complete with status code 0, yet no OSDs get created!

(Note: this was taken from a system where the host was called "admin.octopus_test1.com". To reproduce, change "admin" to the short hostname of the host.)

If I insert a "sleep 60" between "cephadm bootstrap" and "ceph orch osd create", the OSDs get created according to the drive groups JSON provided.

Note: this behavior was introduced quite recently.

Taken from the workaround:

Upon closer examination, we can see that the following commands succeed without causing any OSDs to be deployed:

`echo {\"testing_dg_node2\": {\"host_pattern\": \"node2*\", \"data_devices\": {\"all\": true}}} | ceph orch osd create -i -`

This is because the orchestrator only knows about the drives on node1:

admin:~ # ceph orch device ls
HOST   PATH      TYPE   SIZE  DEVICE  AVAIL  REJECT REASONS  
node1  /dev/vdb  hdd   8192M  259451  True                   
node1  /dev/vdc  hdd   8192M  652460  True                   
node1  /dev/vda  hdd   42.0G          False  locked 


Yet "cephadm ceph-volume inventory" sees the drives when run on node2:
node2:~ # cephadm ceph-volume inventory
INFO:cephadm:Inferring fsid a581fad8-5ccb-11ea-966f-525400bb7fa5
INFO:cephadm:/usr/bin/podman:stdout 
INFO:cephadm:/usr/bin/podman:stdout Device Path               Size         rotates available Model name
INFO:cephadm:/usr/bin/podman:stdout /dev/vdb                  8.00 GB      True    True      
INFO:cephadm:/usr/bin/podman:stdout /dev/vdc                  8.00 GB      True    True      
INFO:cephadm:/usr/bin/podman:stdout /dev/vda                  42.00 GB     True    False     

Device Path               Size         rotates available Model name
/dev/vdb                  8.00 GB      True    True      
/dev/vdc                  8.00 GB      True    True      
/dev/vda                  42.00 GB     True    False

Related issues 2 (0 open2 closed)

Related to Orchestrator - Feature #44414: bubble up errors during 'apply' phase to 'cluster warnings'ResolvedMelissa Li

Actions
Related to Orchestrator - Bug #44824: cephadm: adding osd device is not idempotentResolved

Actions
Actions #1

Updated by Nathan Cutler about 4 years ago

  • Description updated (diff)
Actions #2

Updated by Nathan Cutler about 4 years ago

  • Description updated (diff)
Actions #3

Updated by Sebastian Wagner about 4 years ago

we might need some more in-depth validation of drive groups here.

Actions #4

Updated by Nathan Cutler about 4 years ago

  • Description updated (diff)
Actions #5

Updated by Nathan Cutler about 4 years ago

  • Description updated (diff)
Actions #6

Updated by Nathan Cutler about 4 years ago

  • Description updated (diff)
Actions #7

Updated by Sage Weil about 4 years ago

  • Status changed from New to Triaged

i bet the problem is that the drive inventory isn't populated yet immediately after bootstrap.

Actions #8

Updated by Nathan Cutler about 4 years ago

For ceph-salt, I have a workaround here:

https://github.com/ceph/ceph-salt/pull/99

Actions #9

Updated by Joshua Schmid about 4 years ago

  • Assignee set to Joshua Schmid
Actions #10

Updated by Sebastian Wagner about 4 years ago

  • Related to Feature #44414: bubble up errors during 'apply' phase to 'cluster warnings' added
Actions #12

Updated by Sebastian Wagner about 4 years ago

  • Description updated (diff)
  • Priority changed from Normal to High

Which means, we have to track which nodes are scanned and bail out, if we don't have the inventory yet?

Actions #13

Updated by Sebastian Wagner almost 4 years ago

  • Related to Bug #44824: cephadm: adding osd device is not idempotent added
Actions #14

Updated by Sebastian Wagner over 3 years ago

  • Assignee deleted (Joshua Schmid)
Actions #15

Updated by Juan Miguel Olmo Martínez about 3 years ago

  • Assignee set to Juan Miguel Olmo Martínez
Actions #16

Updated by Sebastian Wagner almost 3 years ago

  • Category set to cephadm/osd
Actions #17

Updated by Sebastian Wagner almost 3 years ago

  • Assignee deleted (Juan Miguel Olmo Martínez)
Actions #18

Updated by Sebastian Wagner over 2 years ago

  • Subject changed from Under certain circumstances, "ceph orch osd create" returns success even when no OSDs are created to Under certain circumstances, "ceph orch apply" returns success even when no OSDs are created
  • Description updated (diff)
Actions #19

Updated by Sebastian Wagner over 2 years ago

  • Status changed from Triaged to Can't reproduce

Ok, let me close this for now as can't reproduce. Instead let's wait till this pops up again.

Actions

Also available in: Atom PDF