Project

General

Profile

Bug #58306

empty disk rejected with 'Insufficient space (<5GB)'

Added by Lukasz Engel about 1 year ago. Updated 3 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

100%

Source:
Tags:
Backport:
quincy,pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have working cluster (17.2.5) with disks connected to Adaptec 8805 on one of my hosts.
Each disk is exposed from controller as single device.
I am trying to add new OSD, but ceph doesn't detect new HDD (HGST 8TB) - disk is visible and usable in system,
but "ceph orch device list" does not show the device.
When I manually run "ceph-volume inventory /dev/sdX" I got:

====== Device report /dev/sdm ======

     path                      /dev/sdm
     ceph device               False
     lsm data                  {}
     available                 False
     rejected reasons          Insufficient space (<5GB)
     device id                 HGST_HUH721008AL5200_JEGA5Z9X

(Disk was used before but it is empty - first zapped and then fully-zeroed.)

OS is Rocky 8 (latest packages).
I successfully deployed cluster some time this year (July) and did not have problems with adding OSDs.
I don't remember for sure, but probably I had minor ceph upgrade (within quincy release) since installation.

I attach log (with debug log level enabled) for ceph-volume command.

ceph-volume.log View - log for "ceph-volume --log-level debug inventory /dev/sdm" (56.5 KB) Lukasz Engel, 12/17/2022 10:27 PM

History

#1 Updated by Guillaume Abrioux about 1 year ago

  • Status changed from New to In Progress
  • Assignee set to Guillaume Abrioux
  • Backport set to quincy,pacific

#2 Updated by Rongqi Sun about 1 year ago

try:
sudo dmsetup remove ceph--528450f2--9e30--486d--a450--2085a196e586-osd--block--cf91100a--9e4b--40d6--93fb--864028cd0218
sudo wipefs -a /dev/nvme1n1

#3 Updated by Lukasz Engel about 1 year ago

Sorry for long response time, I had busy time...

I debugged through python code and found the reason:
All volumes and raw disks exposed from my controller have removable=1 attribute.
(Even multi-disk volumes have this set).
Function get_block_devs_sysfs in ceph-volume/util/disk rejects such disks.
As I digged in ceph history, get_block_devs_sysfs (using sysfs) appeared in version 17.2.4, earlier lsblk was used - this is the reason I was able to build cluster on earlier version of ceph (17.2.1 or .2 as I remember), but after upgrade to 17.2.5 I am no longer able to add/replace disks.

I am not sure what "removable" flag exactly means / should mean - If I have hot-swap backplane in server (typical situation, also mine), are disks in this backplane "removable" or not ? Technically they are removable (even online), but from practical point of view, they are rather "permanent" storage, not "temporary/external" (something like USB stick or CD/DVD).
As I am checking other RAID/HBA controllers I have in my hardware - they don't set "removable" flag.
So I suppose for Adaptec (aacraid) should by the same.
Currently I don't see any way to switch this in controller setup or kernel module (I will check this more).
I haven't tried another kernel versions yet (if this behavior differs) - I plan to do this.

Regarding ceph - message in situation when attempting to use removable disk is very misleading.

#4 Updated by Konstantin Shalygin 3 months ago

  • Status changed from In Progress to Duplicate
  • Assignee deleted (Guillaume Abrioux)
  • % Done changed from 0 to 100

Also available in: Atom PDF