Bug #21551

Ceph FS not recovering space on Luminous

Added by Eric Eastman about 4 years ago. Updated almost 4 years ago.

Target version:
% Done:


3 - minor
Affected Versions:
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):


I was running a test on a Ceph file system where I was creating and deleting about 45,000 files in a loop, and every hour I was taking a snapshot. When the file system got over 60% full I had a cron job that deleted snapshots until the file system size was back under 60% full. This test ran for several days, until I noticed the file system was hung at it had totally filled up one of the OSD and multiple other OSDs were close to being full. I added 6 more OSDs to the cluster to get out of the full condition. Once I could access the file system, I checked and there were no snapshots and I removed all files in the ceph file system, but I cannot get the space to recover. I rebooted all nodes, and the space still does not recover. It has now been several days stuck in this state.

ls -la /cephfs/
total 4
drwxr-xr-x  1 root root    0 Sep 25 17:38 .
drwxr-xr-x 23 root root 4096 Sep  5 16:41 ..

du -a /cephfs/
0    /cephfs/

du -a /cephfs/.snap
0    /cephfs/.snap

ls -la /cephfs/.snap
total 0
drwxr-xr-x 1 root root 0 Dec 31  1969 .
drwxr-xr-x 1 root root 0 Sep 25 17:38 ..

df /cephfs/
Filesystem                                         1K-blocks       Used Available Use% Mounted on,, 1481248768 1006370816 474877952  68% /cephfs

grep ceph /proc/mounts,, /cephfs ceph rw,noatime,name=cephfs,secret=<hidden>,rbytes,acl 0 0

ceph df detail
    1412G      452G         959G         67.94        725k 
    NAME                ID     QUOTA OBJECTS     QUOTA BYTES     USED     %USED     MAX AVAIL     OBJECTS     DIRTY      READ       WRITE      RAW USED 
    cephfs_data         1      N/A               N/A             285G     51.11          272G      642994       627k     23664k     35531k         855G 
    cephfs_metadata     2      N/A               N/A             125M      0.05          272G      100401     100401      1974k     15320k         377M 

ceph -s
    id:     85a91bbe-b287-11e4-889f-001517987704
    health: HEALTH_OK

    mon: 3 daemons, quorum ede-c1-mon01,ede-c1-mon02,ede-c1-mon03
    mgr: ede-c1-mon01(active), standbys: ede-c1-mon03, ede-c1-mon02
    mds: cephfs-1/1/1 up  {0=ede-c1-mon01=up:active}, 1 up:standby-replay, 1 up:standby
    osd: 24 osds: 24 up, 24 in

    pools:   2 pools, 1280 pgs
    objects: 725k objects, 285 GB
    usage:   959 GB used, 452 GB / 1412 GB avail
    pgs:     1280 active+clean

    client:   852 B/s rd, 2 op/s rd, 0 op/s wr

ceph fs ls      
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]

ceph fs status
cephfs - 1 clients
| Rank |     State      |     MDS      |    Activity   |  dns  |  inos |
|  0   |     active     | ede-c1-mon01 | Reqs:    0 /s | 17.7k | 16.3k |
| 0-s  | standby-replay | ede-c1-mon02 | Evts:    0 /s |    0  |    0  |
|       Pool      |   type   |  used | avail |
| cephfs_metadata | metadata |  132M |  293G |
|   cephfs_data   |   data   |  306G |  293G |

| Standby MDS  |
| ede-c1-mon03 |
MDS version: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)

ceph -v
ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)

OS: Ubuntu 16.04
kernel: uname -a
Linux ede-c1-adm01 4.13.0-041300-generic #201709031731 SMP Sun Sep 3 21:33:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Related issues

Related to CephFS - Bug #19593: purge queue and standby replay mds Resolved 04/12/2017


#1 Updated by Patrick Donnelly about 4 years ago

Snapshots are not considered stable (especially with multiple active metadata servers). There are proposed fixes in the works:

If you have found a new bug, that's certainly useful. If you're willing: retry with those patches and if you still have a problem, please report back.

#2 Updated by Zheng Yan about 4 years ago

Could you please run 'ceph daemon mds.ede-c1-mon01 dump cache /tmp/cachedump' and upload cachedump. Besides, please set debug_mds=10, restart mds, let mds run a few minutes and upload mds log.

#3 Updated by Eric Eastman about 4 years ago

The command 'ceph daemon mds.ede-c1-mon01 dump cache /tmp/cachedump' did not give any output so I ran
ceph daemon mds.ede-c1-mon01 dump cache > cachedump
which created an 83 MB file which I bzip2 and put on our ftp server.

I set the debug_mds=10 in the ceph.conf file and restarted the mds process and capture about 7 minutes of run which create 207MB file that I also bzip2.

The files are at:

On Patrick comment: I am running a single active MDS with the second one as a standby with replay with the option:

mds_standby_replay = true

I am more the happy to retry with the patches, if they will help on a single MDS system. Please let me know if I should apply these patches to 12.2.0 or master or ?

Let me know if you need anything else off the current system.

#4 Updated by Zheng Yan almost 4 years ago

there are lots of "mds.0.purge_queue _consume: not readable right now" in the log.looks like purge queue stayed in non-readable state

please set debug_mds=5 and debug_journaler=10, restart mds, let mds run a few minutes and upload mds log

#5 Updated by Eric Eastman almost 4 years ago

#6 Updated by Zheng Yan almost 4 years ago

2017-09-26 09:16:41.000627 7f58662b4700 10 mds.0.journaler.pq(rw) _prefetch
2017-09-26 09:16:41.012367 7f58662b4700 10 mds.0.journaler.pq(rw) _finish_read got 1850138846~3743522
2017-09-26 09:16:41.012375 7f58662b4700 10 mds.0.journaler.pq(rw) _assimilate_prefetch 1850138846~3743522
2017-09-26 09:16:41.012376 7f58662b4700 10 mds.0.journaler.pq(rw) _assimilate_prefetch gap of 4194304 from received_pos 1853882368 to first prefetched buffer 1858076672
2017-09-26 09:16:41.012378 7f58662b4700 10 mds.0.journaler.pq(rw) _assimilate_prefetch read_buf now 1850138846~3743522, read pointers 1850138846/1853882368/1895825408
2017-09-26 09:16:41.012416 7f58662b4700 -1 mds.0.journaler.pq(rw) _decode error from assimilate_prefetch

looks like purge queue journal is corrupted. When was the filesystem created? I know a bug (when developing luminous) that can cause this corruption, but it has already been fix in ceph version 12.2.0

please upload objects 500.00000000 and 500.000001b9, I will help you to recover it.

#7 Updated by Zheng Yan almost 4 years ago

  • Related to Bug #19593: purge queue and standby replay mds added

#8 Updated by Eric Eastman almost 4 years ago

This file system was create with Ceph v12.2.0. This cluster was cleanly installed with Ceph v12.2.0 and was never upgraded.

I uploaded the two objects from the pool cephfs_metadata and put them at:

This is a test cluster. I can recreate the file system and data easily, so please do not waste time recovering it unless it helps you analyze the issue.

#9 Updated by Zheng Yan almost 4 years ago

OK, it's likely caused by please don't enable standby reply for now

Also available in: Atom PDF