Bug #52406
opencephfs_metadata pool got full after upgrade from Nautilus to Pacific 16.2.5
0%
Description
Hi
I have following setup on my Ceph cluster:
cephfs_metadata pool - using crush rule to use only SSD devices that are not used by any other pools, with replica size 3
cephfs_data pool - using cursh rule to use only HDD devices, EC
SSDs utilization before upgrade was about 1%. After upgrade SSDs utiliztion started raising about 15% per day.
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 119 ssd 0.10918 1.00000 112 GiB 58 GiB 57 GiB 381 MiB 518 MiB 54 GiB 51.92 0.85 128 up 100 ssd 0.10918 1.00000 112 GiB 58 GiB 57 GiB 334 MiB 519 MiB 54 GiB 51.88 0.85 128 up 82 ssd 0.10918 1.00000 112 GiB 58 GiB 57 GiB 405 MiB 494 MiB 54 GiB 51.92 0.85 128 up
CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 192 TiB 75 TiB 118 TiB 118 TiB 61.18 ssd 335 GiB 161 GiB 174 GiB 174 GiB 51.90 TOTAL 192 TiB 75 TiB 118 TiB 118 TiB 61.17 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL cephfs_data 1 8192 50 TiB 16.32M 87 TiB 61.25 33 TiB cephfs_metadata 2 128 1.3 GiB 1.99k 4.0 GiB 2.66 48 GiB device_health_metrics 3 1 89 MiB 438 177 MiB 0 28 TiB
until we got metadata pool full. I checked crush map and crush rules, and all was ok, there wasn't any misconfiguration. We added new SSDs and even their utilization raised immediately. I tried to drain OSDs on SSD drives one by one and to recreate OSD again. But it didn't help.
After restarting the cluster I got all PGs on metadata pool unknown.
Because data on the cluster weren't production I decided to recreate CephFS.
I removed CephFS and all pools from the clutser but OSDs were utilized anyway (I waited an hours)
# ceph -s cluster: id: aac4b123-8351-4442-a07c-e2c62f15591b health: HEALTH_WARN noout flag(s) set 3 nearfull osd(s) services: mon: 3 daemons, quorum cache2-mon2,cache2-mon3,cache2-mon1 (age 20s) mgr: cache2-mon3(active, since 52s), standbys: cache2-mon1, cache2-mon2 osd: 399 osds: 399 up (since 34m), 399 in (since 73m) flags noout data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 54 TiB used, 135 TiB / 188 TiB avail pgs:
So I purged all OSDs and created them again, then I filled CephFS with same data as before destroying CephFS and all looks normal - utilization is reasonable as was before upgrade.
There is definitely something wrong with upgrading from latest Nautilus to Pacific and I was lucky that data aren't production.
Updated by Denis Polom over 2 years ago
OSDs with metadata utilization on SSD drives after recreating OSDs and filling CephFS with same data again:
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 41 ssd 0.12003 1.00000 112 GiB 1.5 GiB 475 MiB 128 MiB 929 MiB 110 GiB 1.34 0.07 127 up 9 ssd 0.12003 1.00000 233 GiB 1.1 GiB 501 MiB 115 MiB 555 MiB 232 GiB 0.49 0.02 127 up 10 ssd 0.12003 1.00000 233 GiB 819 MiB 462 MiB 101 MiB 256 MiB 232 GiB 0.34 0.02 130 up 14 ssd 0.12003 1.00000 112 GiB 1.4 GiB 496 MiB 117 MiB 849 MiB 110 GiB 1.28 0.06 127 up 82 ssd 0.12003 1.00000 233 GiB 1.2 GiB 444 MiB 106 MiB 719 MiB 232 GiB 0.53 0.03 129 up 0 ssd 0.12003 1.00000 112 GiB 1.6 GiB 462 MiB 148 MiB 1.0 GiB 110 GiB 1.47 0.07 128 up
Updated by Dan van der Ster over 2 years ago
Same as https://tracker.ceph.com/issues/52260 ?
Updated by Xiubo Li over 2 years ago
- Status changed from New to Need More Info
- Assignee set to Xiubo Li
Did you see any suspect logs in the mds logs ? Such as no mdlog->trim() got called, etc.
There have two similar trackers which were seen under heavy load https://tracker.ceph.com/issues/40002 and https://tracker.ceph.com/issues/52280.