Bug #21551
openCeph FS not recovering space on Luminous
0%
Description
I was running a test on a Ceph file system where I was creating and deleting about 45,000 files in a loop, and every hour I was taking a snapshot. When the file system got over 60% full I had a cron job that deleted snapshots until the file system size was back under 60% full. This test ran for several days, until I noticed the file system was hung at it had totally filled up one of the OSD and multiple other OSDs were close to being full. I added 6 more OSDs to the cluster to get out of the full condition. Once I could access the file system, I checked and there were no snapshots and I removed all files in the ceph file system, but I cannot get the space to recover. I rebooted all nodes, and the space still does not recover. It has now been several days stuck in this state.
ls -la /cephfs/ total 4 drwxr-xr-x 1 root root 0 Sep 25 17:38 . drwxr-xr-x 23 root root 4096 Sep 5 16:41 .. du -a /cephfs/ 0 /cephfs/ du -a /cephfs/.snap 0 /cephfs/.snap ls -la /cephfs/.snap total 0 drwxr-xr-x 1 root root 0 Dec 31 1969 . drwxr-xr-x 1 root root 0 Sep 25 17:38 .. df /cephfs/ Filesystem 1K-blocks Used Available Use% Mounted on 10.14.2.11:6789,10.14.2.12:6789,10.14.2.13:6789:/ 1481248768 1006370816 474877952 68% /cephfs grep ceph /proc/mounts 10.14.2.11:6789,10.14.2.12:6789,10.14.2.13:6789:/ /cephfs ceph rw,noatime,name=cephfs,secret=<hidden>,rbytes,acl 0 0 ceph df detail GLOBAL: SIZE AVAIL RAW USED %RAW USED OBJECTS 1412G 452G 959G 67.94 725k POOLS: NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED cephfs_data 1 N/A N/A 285G 51.11 272G 642994 627k 23664k 35531k 855G cephfs_metadata 2 N/A N/A 125M 0.05 272G 100401 100401 1974k 15320k 377M ceph -s cluster: id: 85a91bbe-b287-11e4-889f-001517987704 health: HEALTH_OK services: mon: 3 daemons, quorum ede-c1-mon01,ede-c1-mon02,ede-c1-mon03 mgr: ede-c1-mon01(active), standbys: ede-c1-mon03, ede-c1-mon02 mds: cephfs-1/1/1 up {0=ede-c1-mon01=up:active}, 1 up:standby-replay, 1 up:standby osd: 24 osds: 24 up, 24 in data: pools: 2 pools, 1280 pgs objects: 725k objects, 285 GB usage: 959 GB used, 452 GB / 1412 GB avail pgs: 1280 active+clean io: client: 852 B/s rd, 2 op/s rd, 0 op/s wr ceph fs ls name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ] ceph fs status cephfs - 1 clients ====== +------+----------------+--------------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+----------------+--------------+---------------+-------+-------+ | 0 | active | ede-c1-mon01 | Reqs: 0 /s | 17.7k | 16.3k | | 0-s | standby-replay | ede-c1-mon02 | Evts: 0 /s | 0 | 0 | +------+----------------+--------------+---------------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | cephfs_metadata | metadata | 132M | 293G | | cephfs_data | data | 306G | 293G | +-----------------+----------+-------+-------+ +--------------+ | Standby MDS | +--------------+ | ede-c1-mon03 | +--------------+ MDS version: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc) ceph -v ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc) OS: Ubuntu 16.04 kernel: uname -a Linux ede-c1-adm01 4.13.0-041300-generic #201709031731 SMP Sun Sep 3 21:33:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux