Project

General

Profile

Actions

Bug #52693

open

orchestrator incorrectly evaluates osd spec filter

Added by Chris K over 2 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
orchestrator
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Orchestrator is internally ignoring applying a spec against SSDs, apparently determining they're rotational.

This seems similar to Bug#52301 https://tracker.ceph.com/issues/52301 however various device display commands correctly describe the devices.

I have five nodes with identical inventories. After applying the following spec, 4 of the nodes filled out their OSDs as expected. Node 5 and all of its OSDs were omitted entirely because one of the SSDs is being identified as a rotational disk.

The osd spec assigns one SSD to handle the db for all the spinning drives and another SSD to handle the wal for those same drives. The SSDs are identical, so I can't refer to them by model number or type. Instead the spec limits the use of only 1 SSD for each role.

A secondary problem this causes is that the osd apply spec can't be deleted or modified because it's unable to complete its mission until all the OSDs already created are manually deleted. I've already redeployed the cluster twice in an effort to overcome the issue; seems that's not a workaround!

osdspec.yaml

service_type: osd
service_id: osd_spec_default
placement:
  host_pattern: '*'
data_devices:
  rotational: 1
db_devices:
  rotational: 0
  limit: 1
wal_devices:
  rotational: 0
  limit: 1

example good node: cephadm ceph-volume inventory

Device Path               Size         rotates available Model name
/dev/sda                  931.51 GB    True    False     ST91000640SS
/dev/sdb                  931.51 GB    True    False     ST91000640SS
/dev/sdc                  931.51 GB    True    False     ST91000640SS
/dev/sdd                  931.51 GB    True    False     ST91000640SS
/dev/sde                  931.51 GB    True    False     ST91000640SS
/dev/sdf                  186.31 GB    False   False     HUSSL4020BSS600
/dev/sdg                  419.19 GB    True    False     ST9450404SS
/dev/sdh                  419.19 GB    True    False     ST9450404SS
/dev/sdi                  186.31 GB    False   False     HUSSL4020BSS600
/dev/sdj                  136.73 GB    True    False     ST9146852SS
/dev/sdk                  136.73 GB    True    False     ST9146852SS
/dev/sdl                  136.73 GB    True    False     ST9146852SS
/dev/sdn                  136.73 GB    True    False     ST9146852SS
/dev/sdo                  136.73 GB    True    False     ST9146852SS
/dev/sdp                  136.73 GB    True    False     ST9146852SS
/dev/sdq                  136.73 GB    True    False     ST9146852SS
/dev/sdr                  136.73 GB    True    False     ST9146852SS
/dev/sds                  136.73 GB    True    False     ST9146852SS
/dev/sdt                  136.73 GB    True    False     ST9146852SS
/dev/sdu                  136.73 GB    True    False     ST9146852SS
/dev/sdv                  136.73 GB    True    False     ST9146852SS
/dev/sdw                  136.73 GB    True    False     ST9146852SS
/dev/sdx                  136.73 GB    True    False     ST9146852SS
/dev/sdy                  136.12 GB    True    False     VIRTUAL DISK

problem node cephadm ceph-volume inventory

Device Path               Size         rotates available Model name
/dev/sdr                  186.31 GB    False   True      HUSSL4020BSS600
/dev/sds                  186.31 GB    False   True      HUSSL4020BSS600
/dev/sdaa                 232.89 GB    True    False     FUJITSU MHZ2250B
/dev/sdc                  136.73 GB    True    False     ST9146852SS
/dev/sdd                  136.73 GB    True    False     ST9146852SS
/dev/sde                  136.73 GB    True    False     ST9146852SS
/dev/sdf                  136.73 GB    True    False     ST9146852SS
/dev/sdg                  136.73 GB    True    False     ST9146852SS
/dev/sdh                  136.73 GB    True    False     ST9146852SS
/dev/sdi                  136.73 GB    True    False     ST9146852SS
/dev/sdk                  136.73 GB    True    False     ST9146852SS
/dev/sdl                  136.73 GB    True    False     ST9146852SS
/dev/sdm                  136.73 GB    True    False     ST9146852SS
/dev/sdn                  136.73 GB    True    False     ST9146852SS
/dev/sdo                  931.51 GB    True    False     ST91000640SS
/dev/sdp                  931.51 GB    True    False     ST91000640SS
/dev/sdq                  419.19 GB    True    False     ST9450404SS
/dev/sdt                  419.19 GB    True    False     ST9450404SS
/dev/sdu                  136.73 GB    True    False     ST9146852SS
/dev/sdv                  931.51 GB    True    False     ST91000640SS
/dev/sdw                  931.51 GB    True    False     ST91000640SS
/dev/sdx                  931.51 GB    True    False     ST91000640SS
/dev/sdy                  931.51 GB    True    False     ST91000640SS
/dev/sdz                  931.51 GB    True    False     ST91000640SS

log excerpt from the ceph mgr container's logs

I can provide a dump of the cephadm log from the mgr container if needed; it's rather repetitive to my untrained eyes.

debug 2021-09-21T18:58:09.883+0000 7f8ceef39700  0 log_channel(cephadm) log [DBG] : Processing disk /dev/sdg
debug 2021-09-21T18:58:09.883+0000 7f8ceef39700  0 log_channel(cephadm) log [DBG] : Found matching disk: /dev/sdg
debug 2021-09-21T18:58:09.883+0000 7f8ceef39700  0 log_channel(cephadm) log [INF] : Refuse to add /dev/sdg due to limit policy of <1>
debug 2021-09-21T18:58:09.883+0000 7f8ceef39700  0 log_channel(cephadm) log [DBG] : Ignoring disk /dev/sdg. Limit reached
debug 2021-09-21T19:59:47.946+0000 7f8ceef39700  0 log_channel(cephadm) log [DBG] : Processing disk /dev/sdr
debug 2021-09-21T19:59:47.946+0000 7f8ceef39700  0 log_channel(cephadm) log [DBG] : Found matching disk: /dev/sdr
debug 2021-09-21T19:59:47.946+0000 7f8ceef39700  0 log_channel(cephadm) log [DBG] : 0 != 1
debug 2021-09-21T19:59:47.946+0000 7f8ceef39700  0 log_channel(cephadm) log [DBG] : Ignoring disk /dev/sdr. Not all filter did match the disk
debug 2021-09-21T19:59:47.946+0000 7f8ceef39700  0 log_channel(cephadm) log [DBG] : Processing disk /dev/sds
debug 2021-09-21T19:59:47.946+0000 7f8ceef39700  0 log_channel(cephadm) log [DBG] : Found matching disk: /dev/sds
debug 2021-09-21T19:59:47.946+0000 7f8ceef39700  0 log_channel(cephadm) log [DBG] : 0 != 1
debug 2021-09-21T19:59:47.946+0000 7f8ceef39700  0 log_channel(cephadm) log [DBG] : Ignoring disk /dev/sds. Not all filter did match the disk

misc details

  • ubuntu 20.04.3 (freshly installed for this purpose--nothing else on board)
  • kernel 5.4.0-81-generic
  • all daemons using
    ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable)
  • non-production system
  • no firewalls yet in place
  • cluster network on separate VLAN, same interfaces

Related issues 1 (1 open0 closed)

Related to ceph-volume - Bug #52301: Wrong device type detected/reported by orch device lsNew

Actions
Actions #1

Updated by Chris K over 2 years ago

Yikes sorry about the amazing formatting... I didn't intend to make reading it more difficult! Rather the opposite!

Actions #2

Updated by Loïc Dachary over 2 years ago

  • Target version deleted (v16.2.6)
Actions #3

Updated by Sebastian Wagner over 2 years ago

  • Related to Bug #52301: Wrong device type detected/reported by orch device ls added
Actions #4

Updated by Sebastian Wagner over 2 years ago

  • Description updated (diff)
Actions

Also available in: Atom PDF