Bug #52406
opencephfs_metadata pool got full after upgrade from Nautilus to Pacific 16.2.5
0%
Description
Hi
I have following setup on my Ceph cluster:
cephfs_metadata pool - using crush rule to use only SSD devices that are not used by any other pools, with replica size 3
cephfs_data pool - using cursh rule to use only HDD devices, EC
SSDs utilization before upgrade was about 1%. After upgrade SSDs utiliztion started raising about 15% per day.
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 119 ssd 0.10918 1.00000 112 GiB 58 GiB 57 GiB 381 MiB 518 MiB 54 GiB 51.92 0.85 128 up 100 ssd 0.10918 1.00000 112 GiB 58 GiB 57 GiB 334 MiB 519 MiB 54 GiB 51.88 0.85 128 up 82 ssd 0.10918 1.00000 112 GiB 58 GiB 57 GiB 405 MiB 494 MiB 54 GiB 51.92 0.85 128 up
CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 192 TiB 75 TiB 118 TiB 118 TiB 61.18 ssd 335 GiB 161 GiB 174 GiB 174 GiB 51.90 TOTAL 192 TiB 75 TiB 118 TiB 118 TiB 61.17 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL cephfs_data 1 8192 50 TiB 16.32M 87 TiB 61.25 33 TiB cephfs_metadata 2 128 1.3 GiB 1.99k 4.0 GiB 2.66 48 GiB device_health_metrics 3 1 89 MiB 438 177 MiB 0 28 TiB
until we got metadata pool full. I checked crush map and crush rules, and all was ok, there wasn't any misconfiguration. We added new SSDs and even their utilization raised immediately. I tried to drain OSDs on SSD drives one by one and to recreate OSD again. But it didn't help.
After restarting the cluster I got all PGs on metadata pool unknown.
Because data on the cluster weren't production I decided to recreate CephFS.
I removed CephFS and all pools from the clutser but OSDs were utilized anyway (I waited an hours)
# ceph -s cluster: id: aac4b123-8351-4442-a07c-e2c62f15591b health: HEALTH_WARN noout flag(s) set 3 nearfull osd(s) services: mon: 3 daemons, quorum cache2-mon2,cache2-mon3,cache2-mon1 (age 20s) mgr: cache2-mon3(active, since 52s), standbys: cache2-mon1, cache2-mon2 osd: 399 osds: 399 up (since 34m), 399 in (since 73m) flags noout data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 54 TiB used, 135 TiB / 188 TiB avail pgs:
So I purged all OSDs and created them again, then I filled CephFS with same data as before destroying CephFS and all looks normal - utilization is reasonable as was before upgrade.
There is definitely something wrong with upgrading from latest Nautilus to Pacific and I was lucky that data aren't production.