Bug #44270: Under certain circumstances, "ceph orch apply" returns success even when no OSDs are created - Orchestrator - Ceph

Actions

Copy link

Bug #44270

closed

Under certain circumstances, "ceph orch apply" returns success even when no OSDs are created

Added by Nathan Cutler about 4 years ago. Updated over 2 years ago.

Status:

Can't reproduce

Priority:

High

Assignee:

Category:

cephadm/osd

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

On a single-node cluster, the "cephadm bootstrap" command deploys 1 MGR and 1 MON.

On very recent versions of master, if one very quickly runs the following "ceph orch osd create" command after "cephadm bootstrap" finishes

echo '{"testing_dg_admin": {"host_pattern": "admin*", "data_devices": {"all": true}}}' | ceph orch apply -i -

the command will complete with status code 0, yet no OSDs get created!

(Note: this was taken from a system where the host was called "admin.octopus_test1.com". To reproduce, change "admin" to the short hostname of the host.)

If I insert a "sleep 60" between "cephadm bootstrap" and "ceph orch osd create", the OSDs get created according to the drive groups JSON provided.

Note: this behavior was introduced quite recently.

Taken from the workaround:¶

Upon closer examination, we can see that the following commands succeed without causing any OSDs to be deployed:

`echo {\"testing_dg_node2\": {\"host_pattern\": \"node2*\", \"data_devices\": {\"all\": true}}} | ceph orch osd create -i -`

This is because the orchestrator only knows about the drives on node1:

admin:~ # ceph orch device ls
HOST   PATH      TYPE   SIZE  DEVICE  AVAIL  REJECT REASONS  
node1  /dev/vdb  hdd   8192M  259451  True                   
node1  /dev/vdc  hdd   8192M  652460  True                   
node1  /dev/vda  hdd   42.0G          False  locked

Yet "cephadm ceph-volume inventory" sees the drives when run on node2:

node2:~ # cephadm ceph-volume inventory
INFO:cephadm:Inferring fsid a581fad8-5ccb-11ea-966f-525400bb7fa5
INFO:cephadm:/usr/bin/podman:stdout 
INFO:cephadm:/usr/bin/podman:stdout Device Path               Size         rotates available Model name
INFO:cephadm:/usr/bin/podman:stdout /dev/vdb                  8.00 GB      True    True      
INFO:cephadm:/usr/bin/podman:stdout /dev/vdc                  8.00 GB      True    True      
INFO:cephadm:/usr/bin/podman:stdout /dev/vda                  42.00 GB     True    False     

Device Path               Size         rotates available Model name
/dev/vdb                  8.00 GB      True    True      
/dev/vdc                  8.00 GB      True    True      
/dev/vda                  42.00 GB     True    False

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Nathan Cutler about 4 years ago

Description updated (diff)

Actions

Copy link

Updated by Nathan Cutler about 4 years ago

Description updated (diff)

Actions

Copy link

Updated by Sebastian Wagner about 4 years ago

we might need some more in-depth validation of drive groups here.

Actions

Copy link

Updated by Nathan Cutler about 4 years ago

Description updated (diff)

Actions

Copy link

Updated by Nathan Cutler about 4 years ago

Description updated (diff)

Actions

Copy link

Updated by Nathan Cutler about 4 years ago

Description updated (diff)

Actions

Copy link

Updated by Sage Weil about 4 years ago

Status changed from New to Triaged

i bet the problem is that the drive inventory isn't populated yet immediately after bootstrap.

Actions

Copy link

Updated by Nathan Cutler about 4 years ago

For ceph-salt, I have a workaround here:

https://github.com/ceph/ceph-salt/pull/99

Actions

Copy link

Updated by Joshua Schmid about 4 years ago

Assignee set to Joshua Schmid

Actions

Copy link

#10

Updated by Sebastian Wagner about 4 years ago

Related to Feature #44414: bubble up errors during 'apply' phase to 'cluster warnings' added

Actions

Copy link

#11

Updated by Sebastian Wagner about 4 years ago

new workaround: https://github.com/ceph/ceph-salt/pull/109

Actions

Copy link

#12

Updated by Sebastian Wagner about 4 years ago

Description updated (diff)
Priority changed from Normal to High

Which means, we have to track which nodes are scanned and bail out, if we don't have the inventory yet?

Actions

Copy link

#13

Updated by Sebastian Wagner almost 4 years ago

Related to Bug #44824: cephadm: adding osd device is not idempotent added

Actions

Copy link

#14

Updated by Sebastian Wagner over 3 years ago

Assignee deleted (~~Joshua Schmid~~)

Actions

Copy link

#15

Updated by Juan Miguel Olmo Martínez about 3 years ago

Assignee set to Juan Miguel Olmo Martínez

Actions

Copy link

#16

Updated by Sebastian Wagner almost 3 years ago

Category set to cephadm/osd

Actions

Copy link

#17

Updated by Sebastian Wagner almost 3 years ago

Assignee deleted (~~Juan Miguel Olmo Martínez~~)

Actions

Copy link

#18

Updated by Sebastian Wagner over 2 years ago

Subject changed from Under certain circumstances, "ceph orch osd create" returns success even when no OSDs are created to Under certain circumstances, "ceph orch apply" returns success even when no OSDs are created
Description updated (diff)

Actions

Copy link

#19

Updated by Sebastian Wagner over 2 years ago

Status changed from Triaged to Can't reproduce

Ok, let me close this for now as can't reproduce. Instead let's wait till this pops up again.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Orchestrator

Custom queries

Bug #44270

Under certain circumstances, "ceph orch apply" returns success even when no OSDs are created

Taken from the workaround:¶

Updated by Nathan Cutler about 4 years ago

Updated by Nathan Cutler about 4 years ago

Updated by Sebastian Wagner about 4 years ago

Updated by Nathan Cutler about 4 years ago

Updated by Nathan Cutler about 4 years ago

Updated by Nathan Cutler about 4 years ago

Updated by Sage Weil about 4 years ago

Updated by Nathan Cutler about 4 years ago

Updated by Joshua Schmid about 4 years ago

Updated by Sebastian Wagner about 4 years ago

Updated by Sebastian Wagner about 4 years ago

Updated by Sebastian Wagner about 4 years ago

Updated by Sebastian Wagner almost 4 years ago

Updated by Sebastian Wagner over 3 years ago

Updated by Juan Miguel Olmo Martínez about 3 years ago

Updated by Sebastian Wagner almost 3 years ago

Updated by Sebastian Wagner almost 3 years ago

Updated by Sebastian Wagner over 2 years ago

Updated by Sebastian Wagner over 2 years ago