Project

General

Profile

Bug #52693

Updated by Sebastian Wagner over 2 years ago

Orchestrator is internally ignoring applying a spec against SSDs, apparently determining they're rotational. 

 This seems similar to Bug#52301 https://tracker.ceph.com/issues/52301 however various device display commands correctly describe the devices.   

 I have five nodes with identical inventories. After applying the following spec, 4 of the nodes filled out their OSDs as expected.    Node 5 and all of its OSDs were omitted entirely because one of the SSDs is being identified as a rotational disk. 

 The osd spec assigns one SSD to handle the db for all the spinning drives and another SSD to handle the wal for those same drives.    The SSDs are identical, so I can't refer to them by model number or type.    Instead the spec limits the use of only 1 SSD for each role. 

 A secondary problem this causes is that the osd apply spec can't be deleted or modified because it's unable to complete its mission until all the OSDs already created are manually deleted.    I've already redeployed the cluster twice in an effort to overcome the issue; seems that's not a workaround! 

 h1. osdspec.yaml 

 <pre><code class="yaml"> 
 service_type: osd 
 service_id: osd_spec_default 
 placement: 
   host_pattern: '*' 
 data_devices: 
   rotational: 1 
 db_devices: 
   rotational: 0 
   limit: 1 
 wal_devices: 
   rotational: 0 
   limit: 1 
 </code></pre> 


 h1. example good node: cephadm ceph-volume inventory  

 <pre> 
 Device Path                 Size           rotates available Model name 
 /dev/sda                    931.51 GB      True      False       ST91000640SS 
 /dev/sdb                    931.51 GB      True      False       ST91000640SS 
 /dev/sdc                    931.51 GB      True      False       ST91000640SS 
 /dev/sdd                    931.51 GB      True      False       ST91000640SS 
 /dev/sde                    931.51 GB      True      False       ST91000640SS 
 /dev/sdf                    186.31 GB      False     False       HUSSL4020BSS600 
 /dev/sdg                    419.19 GB      True      False       ST9450404SS 
 /dev/sdh                    419.19 GB      True      False       ST9450404SS 
 /dev/sdi                    186.31 GB      False     False       HUSSL4020BSS600 
 /dev/sdj                    136.73 GB      True      False       ST9146852SS 
 /dev/sdk                    136.73 GB      True      False       ST9146852SS 
 /dev/sdl                    136.73 GB      True      False       ST9146852SS 
 /dev/sdn                    136.73 GB      True      False       ST9146852SS 
 /dev/sdo                    136.73 GB      True      False       ST9146852SS 
 /dev/sdp                    136.73 GB      True      False       ST9146852SS 
 /dev/sdq                    136.73 GB      True      False       ST9146852SS 
 /dev/sdr                    136.73 GB      True      False       ST9146852SS 
 /dev/sds                    136.73 GB      True      False       ST9146852SS 
 /dev/sdt                    136.73 GB      True      False       ST9146852SS 
 /dev/sdu                    136.73 GB      True      False       ST9146852SS 
 /dev/sdv                    136.73 GB      True      False       ST9146852SS 
 /dev/sdw                    136.73 GB      True      False       ST9146852SS 
 /dev/sdx                    136.73 GB      True      False       ST9146852SS 
 /dev/sdy                    136.12 GB      True      False       VIRTUAL DISK 
 </pre> 

 h1. problem node cephadm ceph-volume inventory 

 
 <pre> 
 Device Path                 Size           rotates available Model name 
 /dev/sdr                    186.31 GB      False     True        HUSSL4020BSS600 
 /dev/sds                    186.31 GB      False     True        HUSSL4020BSS600 
 /dev/sdaa                   232.89 GB      True      False       FUJITSU MHZ2250B 
 /dev/sdc                    136.73 GB      True      False       ST9146852SS 
 /dev/sdd                    136.73 GB      True      False       ST9146852SS 
 /dev/sde                    136.73 GB      True      False       ST9146852SS 
 /dev/sdf                    136.73 GB      True      False       ST9146852SS 
 /dev/sdg                    136.73 GB      True      False       ST9146852SS 
 /dev/sdh                    136.73 GB      True      False       ST9146852SS 
 /dev/sdi                    136.73 GB      True      False       ST9146852SS 
 /dev/sdk                    136.73 GB      True      False       ST9146852SS 
 /dev/sdl                    136.73 GB      True      False       ST9146852SS 
 /dev/sdm                    136.73 GB      True      False       ST9146852SS 
 /dev/sdn                    136.73 GB      True      False       ST9146852SS 
 /dev/sdo                    931.51 GB      True      False       ST91000640SS 
 /dev/sdp                    931.51 GB      True      False       ST91000640SS 
 /dev/sdq                    419.19 GB      True      False       ST9450404SS 
 /dev/sdt                    419.19 GB      True      False       ST9450404SS 
 /dev/sdu                    136.73 GB      True      False       ST9146852SS 
 /dev/sdv                    931.51 GB      True      False       ST91000640SS 
 /dev/sdw                    931.51 GB      True      False       ST91000640SS 
 /dev/sdx                    931.51 GB      True      False       ST91000640SS 
 /dev/sdy                    931.51 GB      True      False       ST91000640SS 
 /dev/sdz                    931.51 GB      True      False       ST91000640SS 
 </pre> 


 h1. log excerpt from the ceph mgr container's logs 

 I can provide a dump of the cephadm log from the mgr container if needed; it's rather repetitive to my untrained eyes. 

 <pre> 
 debug 2021-09-21T18:58:09.883+0000 7f8ceef39700    0 log_channel(cephadm) log [DBG] : Processing disk /dev/sdg 
 debug 2021-09-21T18:58:09.883+0000 7f8ceef39700    0 log_channel(cephadm) log [DBG] : Found matching disk: /dev/sdg 
 debug 2021-09-21T18:58:09.883+0000 7f8ceef39700    0 log_channel(cephadm) log [INF] : Refuse to add /dev/sdg due to limit policy of <1> 
 debug 2021-09-21T18:58:09.883+0000 7f8ceef39700    0 log_channel(cephadm) log [DBG] : Ignoring disk /dev/sdg. Limit reached 
 </pre> 


 <pre> 
 debug 2021-09-21T19:59:47.946+0000 7f8ceef39700    0 log_channel(cephadm) log [DBG] : Processing disk /dev/sdr 
 debug 2021-09-21T19:59:47.946+0000 7f8ceef39700    0 log_channel(cephadm) log [DBG] : Found matching disk: /dev/sdr 
 debug 2021-09-21T19:59:47.946+0000 7f8ceef39700    0 log_channel(cephadm) log [DBG] : 0 != 1 
 debug 2021-09-21T19:59:47.946+0000 7f8ceef39700    0 log_channel(cephadm) log [DBG] : Ignoring disk /dev/sdr. Not all filter did match the disk 
 debug 2021-09-21T19:59:47.946+0000 7f8ceef39700    0 log_channel(cephadm) log [DBG] : Processing disk /dev/sds 
 debug 2021-09-21T19:59:47.946+0000 7f8ceef39700    0 log_channel(cephadm) log [DBG] : Found matching disk: /dev/sds 
 debug 2021-09-21T19:59:47.946+0000 7f8ceef39700    0 log_channel(cephadm) log [DBG] : 0 != 1 
 debug 2021-09-21T19:59:47.946+0000 7f8ceef39700    0 log_channel(cephadm) log [DBG] : Ignoring disk /dev/sds. Not all filter did match the disk 
 </pre> </pre>ping  


 h1. misc details 

 
 * ubuntu 20.04.3 (freshly installed for this purpose--nothing else on board) 
 * kernel 5.4.0-81-generic 
 * all daemons using <pre>ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable)</pre> 
 * non-production system 
 * no firewalls yet in place 
 * cluster network on separate VLAN, same interfaces 
 

Back