Bug #63858
openceph-bluestore-tool bluefs-bdev-expand doesn't adjust OSD free space when NCB mode is in use
0%
Description
We are using Rook v1.10 with Ceph v17.2.6 on a hand-built Kubernetes cluster in AWS (not using EKS). We needed to increase the storage capacity of this cluster, so I increased the size of the EBS volumes attached to our nodes from 50GiB per to 150GiB, and noticed that after deleting the pod for osd.0 the `expand-bluefs`container ran, logged 2 errors about reading the OSD label, but seemingly did not `fail` as the pod continued to start up and the OSD became available. The "using 100GiB" is clearly wrong as it isn't possible to have 100GiB in use on a volume that was 50Gib. As you can see in the `ceph osd df` output, the total size of the OSD was updated correctly but the miscalculated space in use is also shown.
# Logs from expand-bluefs container
inferring bluefs devices from bluestore path
1 : device size 0x2580000000 : using 0x1902b80000(100 GiB)
Expanding DB/WAL...
1 : expanding from 0xc80000000 to 0x2580000000
2023-12-19T16:20:26.575+0000 7f95c1d9b880 -1 bluestore(/var/lib/ceph/osd/ceph-0) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-0: (21) Is a directory
2023-12-19T16:20:26.575+0000 7f95c1d9b880 -1 bluestore(/var/lib/ceph/osd/ceph-0) unable to read label for /var/lib/ceph/osd/ceph-0: (21) Is a directory
# Before resize:
bash-4.4$ ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
2 nvme 0.04880 1.00000 50 GiB 38 MiB 7.9 MiB 0 B 31 MiB 50 GiB 0.08 1.05 57 up
0 nvme 0.04880 1.00000 50 GiB 35 MiB 9.9 MiB 0 B 25 MiB 50 GiB 0.07 0.96 62 up
1 nvme 0.04880 1.00000 50 GiB 37 MiB 11 MiB 0 B 26 MiB 50 GiB 0.07 1.00 60 up
4 nvme 0.04880 1.00000 50 GiB 37 MiB 11 MiB 0 B 26 MiB 50 GiB 0.07 1.00 64 up
TOTAL 200 GiB 147 MiB 39 MiB 0 B 107 MiB 200 GiB 0.07
MIN/MAX VAR: 0.96/1.05 STDDEV: 0.00
# After resize
bash-4.4$ ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
2 nvme 0.04880 1.00000 50 GiB 39 MiB 7.9 MiB 0 B 31 MiB 50 GiB 0.08 0.00 57 up
0 nvme 0.04880 1.00000 150 GiB 100 GiB 10 MiB 12 KiB 6.2 MiB 50 GiB 66.68 2.00 62 up
1 nvme 0.04880 1.00000 50 GiB 37 MiB 11 MiB 0 B 26 MiB 50 GiB 0.07 0.00 60 up
4 nvme 0.04880 1.00000 50 GiB 37 MiB 11 MiB 0 B 26 MiB 50 GiB 0.07 0.00 64 up
TOTAL 300 GiB 100 GiB 40 MiB 13 KiB 89 MiB 200 GiB 33.38
MIN/MAX VAR: 0.00/2.00 STDDEV: 33.30
This issue is very repeatable for us - at first it happened on a larger cluster similar to this, but the change in used size pushed the OSD into the `nearfull` state. I have seen this issue reported to Rook, but in that case the user created new OSDs to get past it (as I have done), but the issue still remains and is a serious issue for us.
Also, while this is being investigated, is there some sort of workaround that we could use short of creating new OSDs?