Bug #23994
openmds: OSD space is not reclaimed until MDS is restarted
0%
Description
With my Luminous test cluster on Ubuntu I ran into a situation where I filled up an OSD by putting files on CephFS, and deleting that data did not free the OSD space. Only restarting the MDS helped.
After filling the CephFS mount up until "no space left on device", I got:
# ceph status -w cluster: id: f59e49e7-5539-42fd-8706-9b937a517c5a health: HEALTH_ERR 1 full osd(s) 7 pool(s) full Degraded data redundancy: 1094/1391724 objects degraded (0.079%), 25 pgs degraded Degraded data redundancy (low space): 26 pgs recovery_toofull mons ceph2,ceph3 are low on available space services: mon: 3 daemons, quorum ceph2,ceph3,ceph1 mgr: ceph1(active), standbys: ceph3, ceph2 mds: testfs-1/1/1 up {0=ceph3=up:active}, 2 up:standby osd: 3 osds: 3 up, 3 in rgw: 1 daemon active data: pools: 7 pools, 188 pgs objects: 453k objects, 5763 MB usage: 24897 MB used, 5616 MB / 30513 MB avail pgs: 1094/1391724 objects degraded (0.079%) 162 active+clean 25 active+recovery_toofull+degraded 1 active+recovery_toofull
# ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 30513M 5616M 24897M 81.59 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS .rgw.root 1 1446 100.00 0 5 default.rgw.control 2 0 0 0 8 default.rgw.meta 3 0 0 0 0 default.rgw.log 4 0 0 0 207 mytest 5 180M 100.00 0 14 cephfs_data 6 5522M 100.00 0 463177 cephfs_metadata 7 62153k 100.00 0 497
# ceph health detail HEALTH_ERR 1 full osd(s); 7 pool(s) full; Degraded data redundancy: 1094/1391724 objects degraded (0.079%), 25 pgs degraded; Degraded data redundancy (low space): 26 pgs recovery_toofull; mons ceph2,ceph3 are low on available space OSD_FULL 1 full osd(s) osd.2 is full POOL_FULL 7 pool(s) full pool '.rgw.root' is full (no space) pool 'default.rgw.control' is full (no space) pool 'default.rgw.meta' is full (no space) pool 'default.rgw.log' is full (no space) pool 'mytest' is full (no space) pool 'cephfs_data' is full (no space) pool 'cephfs_metadata' is full (no space) PG_DEGRADED Degraded data redundancy: 1094/1391724 objects degraded (0.079%), 25 pgs degraded pg 1.0 is active+recovery_toofull+degraded, acting [1,2,0] pg 1.5 is active+recovery_toofull+degraded, acting [2,0,1] pg 1.7 is active+recovery_toofull+degraded, acting [1,2,0] pg 2.0 is active+recovery_toofull+degraded, acting [1,2,0] pg 2.1 is active+recovery_toofull+degraded, acting [0,2,1] pg 2.3 is active+recovery_toofull+degraded, acting [1,0,2] pg 2.4 is active+recovery_toofull+degraded, acting [0,2,1] pg 2.5 is active+recovery_toofull+degraded, acting [2,0,1] pg 2.6 is active+recovery_toofull+degraded, acting [0,2,1] pg 2.7 is active+recovery_toofull+degraded, acting [1,2,0] pg 5.0 is active+recovery_toofull+degraded, acting [2,0,1] pg 5.1 is active+recovery_toofull+degraded, acting [1,0,2] pg 5.2 is active+recovery_toofull+degraded, acting [1,0,2] pg 5.3 is active+recovery_toofull+degraded, acting [0,1,2] pg 5.4 is active+recovery_toofull+degraded, acting [2,0,1] pg 5.5 is active+recovery_toofull+degraded, acting [0,1,2] pg 5.6 is active+recovery_toofull+degraded, acting [0,1,2] pg 7.0 is active+recovery_toofull+degraded, acting [1,2,0] pg 7.2 is active+recovery_toofull+degraded, acting [0,2,1] pg 7.3 is active+recovery_toofull+degraded, acting [0,1,2] pg 7.f is active+recovery_toofull+degraded, acting [0,2,1] pg 7.10 is active+recovery_toofull+degraded, acting [0,2,1] pg 7.11 is active+recovery_toofull+degraded, acting [2,1,0] pg 7.12 is active+recovery_toofull+degraded, acting [1,0,2] pg 7.13 is active+recovery_toofull+degraded, acting [0,1,2] PG_DEGRADED_FULL Degraded data redundancy (low space): 26 pgs recovery_toofull pg 1.0 is active+recovery_toofull+degraded, acting [1,2,0] pg 1.5 is active+recovery_toofull+degraded, acting [2,0,1] pg 1.7 is active+recovery_toofull+degraded, acting [1,2,0] pg 2.0 is active+recovery_toofull+degraded, acting [1,2,0] pg 2.1 is active+recovery_toofull+degraded, acting [0,2,1] pg 2.3 is active+recovery_toofull+degraded, acting [1,0,2] pg 2.4 is active+recovery_toofull+degraded, acting [0,2,1] pg 2.5 is active+recovery_toofull+degraded, acting [2,0,1] pg 2.6 is active+recovery_toofull+degraded, acting [0,2,1] pg 2.7 is active+recovery_toofull+degraded, acting [1,2,0] pg 5.0 is active+recovery_toofull+degraded, acting [2,0,1] pg 5.1 is active+recovery_toofull+degraded, acting [1,0,2] pg 5.2 is active+recovery_toofull+degraded, acting [1,0,2] pg 5.3 is active+recovery_toofull+degraded, acting [0,1,2] pg 5.4 is active+recovery_toofull+degraded, acting [2,0,1] pg 5.5 is active+recovery_toofull+degraded, acting [0,1,2] pg 5.6 is active+recovery_toofull+degraded, acting [0,1,2] pg 7.0 is active+recovery_toofull+degraded, acting [1,2,0] pg 7.2 is active+recovery_toofull+degraded, acting [0,2,1] pg 7.3 is active+recovery_toofull+degraded, acting [0,1,2] pg 7.4 is active+recovery_toofull, acting [0,1,2] pg 7.f is active+recovery_toofull+degraded, acting [0,2,1] pg 7.10 is active+recovery_toofull+degraded, acting [0,2,1] pg 7.11 is active+recovery_toofull+degraded, acting [2,1,0] pg 7.12 is active+recovery_toofull+degraded, acting [1,0,2] pg 7.13 is active+recovery_toofull+degraded, acting [0,1,2] MON_DISK_LOW mons ceph2,ceph3 are low on available space mon.ceph2 has 16% avail mon.ceph3 has 27% avail
Doing `rm -r` on the cephfs, deleting all the data, did not change that situation.
Only after a restart of the MDS service was the OSD no longer full and I got:
# ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 30513M 21507M 9006M 29.52 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS .rgw.root 1 1446 0 5140M 5 default.rgw.control 2 0 0 5140M 8 default.rgw.meta 3 0 0 5140M 0 default.rgw.log 4 0 0 5140M 207 mytest 5 180M 0 5140M 14 cephfs_data 6 206M 3.86 5140M 461526 cephfs_metadata 7 58136k 0.58 5140M 496
Updated by John Spray almost 6 years ago
- Project changed from Ceph to CephFS
What client (kernel or fuse), and what version of the client?
Updated by Niklas Hambuechen almost 6 years ago
This was on the kernel client. I tried Ubuntu's 4.13.0-39-generic and 4.15.0-15-generic kernels.
With the fuse client of Ceph v12.2.4 on the same machine, I don't seem to experience this problem, the `ceph df` drops swiftly after creating and deleting files via the fuse mount.
Updated by Patrick Donnelly almost 6 years ago
- Subject changed from CephFS: OSD space is not reclaimed until MDS is restarted to mds: OSD space is not reclaimed until MDS is restarted
- Target version set to v14.0.0
- Source set to Community (user)
- Backport set to mimic,luminous
- Component(FS) Client, MDS, kceph added
Updated by Zheng Yan almost 6 years ago
please try again and dump mds' cache (ceph daemon mds.xxx dump cache /tmp/cachedump.x)
Updated by Patrick Donnelly almost 6 years ago
- Status changed from New to Need More Info
Updated by Patrick Donnelly about 5 years ago
- Target version changed from v14.0.0 to v15.0.0