Bug #21551: Ceph FS not recovering space on Luminous - CephFS - Ceph

Actions

Copy link

Bug #21551

open

Ceph FS not recovering space on Luminous

Added by Eric Eastman over 6 years ago. Updated over 6 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I was running a test on a Ceph file system where I was creating and deleting about 45,000 files in a loop, and every hour I was taking a snapshot. When the file system got over 60% full I had a cron job that deleted snapshots until the file system size was back under 60% full. This test ran for several days, until I noticed the file system was hung at it had totally filled up one of the OSD and multiple other OSDs were close to being full. I added 6 more OSDs to the cluster to get out of the full condition. Once I could access the file system, I checked and there were no snapshots and I removed all files in the ceph file system, but I cannot get the space to recover. I rebooted all nodes, and the space still does not recover. It has now been several days stuck in this state.

ls -la /cephfs/
total 4
drwxr-xr-x  1 root root    0 Sep 25 17:38 .
drwxr-xr-x 23 root root 4096 Sep  5 16:41 ..

du -a /cephfs/
0    /cephfs/

du -a /cephfs/.snap
0    /cephfs/.snap

ls -la /cephfs/.snap
total 0
drwxr-xr-x 1 root root 0 Dec 31  1969 .
drwxr-xr-x 1 root root 0 Sep 25 17:38 ..

df /cephfs/
Filesystem                                         1K-blocks       Used Available Use% Mounted on
10.14.2.11:6789,10.14.2.12:6789,10.14.2.13:6789:/ 1481248768 1006370816 474877952  68% /cephfs

grep ceph /proc/mounts 
10.14.2.11:6789,10.14.2.12:6789,10.14.2.13:6789:/ /cephfs ceph rw,noatime,name=cephfs,secret=<hidden>,rbytes,acl 0 0

ceph df detail
GLOBAL:
    SIZE      AVAIL     RAW USED     %RAW USED     OBJECTS 
    1412G      452G         959G         67.94        725k 
POOLS:
    NAME                ID     QUOTA OBJECTS     QUOTA BYTES     USED     %USED     MAX AVAIL     OBJECTS     DIRTY      READ       WRITE      RAW USED 
    cephfs_data         1      N/A               N/A             285G     51.11          272G      642994       627k     23664k     35531k         855G 
    cephfs_metadata     2      N/A               N/A             125M      0.05          272G      100401     100401      1974k     15320k         377M 

ceph -s
  cluster:
    id:     85a91bbe-b287-11e4-889f-001517987704
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ede-c1-mon01,ede-c1-mon02,ede-c1-mon03
    mgr: ede-c1-mon01(active), standbys: ede-c1-mon03, ede-c1-mon02
    mds: cephfs-1/1/1 up  {0=ede-c1-mon01=up:active}, 1 up:standby-replay, 1 up:standby
    osd: 24 osds: 24 up, 24 in

  data:
    pools:   2 pools, 1280 pgs
    objects: 725k objects, 285 GB
    usage:   959 GB used, 452 GB / 1412 GB avail
    pgs:     1280 active+clean

  io:
    client:   852 B/s rd, 2 op/s rd, 0 op/s wr

ceph fs ls      
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]

ceph fs status
cephfs - 1 clients
======
+------+----------------+--------------+---------------+-------+-------+
| Rank |     State      |     MDS      |    Activity   |  dns  |  inos |
+------+----------------+--------------+---------------+-------+-------+
|  0   |     active     | ede-c1-mon01 | Reqs:    0 /s | 17.7k | 16.3k |
| 0-s  | standby-replay | ede-c1-mon02 | Evts:    0 /s |    0  |    0  |
+------+----------------+--------------+---------------+-------+-------+
+-----------------+----------+-------+-------+
|       Pool      |   type   |  used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata |  132M |  293G |
|   cephfs_data   |   data   |  306G |  293G |
+-----------------+----------+-------+-------+

+--------------+
| Standby MDS  |
+--------------+
| ede-c1-mon03 |
+--------------+
MDS version: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)

ceph -v
ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)

OS: Ubuntu 16.04
kernel: uname -a
Linux ede-c1-adm01 4.13.0-041300-generic #201709031731 SMP Sun Sep 3 21:33:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Patrick Donnelly over 6 years ago

Snapshots are not considered stable (especially with multiple active metadata servers). There are proposed fixes in the works:

https://github.com/ceph/ceph/pull/16779

If you have found a new bug, that's certainly useful. If you're willing: retry with those patches and if you still have a problem, please report back.

Actions

Copy link

Updated by Zheng Yan over 6 years ago

Could you please run 'ceph daemon mds.ede-c1-mon01 dump cache /tmp/cachedump' and upload cachedump. Besides, please set debug_mds=10, restart mds, let mds run a few minutes and upload mds log.

Actions

Copy link

Updated by Eric Eastman over 6 years ago

The command 'ceph daemon mds.ede-c1-mon01 dump cache /tmp/cachedump' did not give any output so I ran
ceph daemon mds.ede-c1-mon01 dump cache > cachedump
which created an 83 MB file which I bzip2 and put on our ftp server.

I set the debug_mds=10 in the ceph.conf file and restarted the mds process and capture about 7 minutes of run which create 207MB file that I also bzip2.

The files are at:

ftp://ftp.keepertech.com/outgoing/eric/ceph_logs/cachedump.bz2
ftp://ftp.keepertech.com/outgoing/eric/ceph_logs/ceph-mds.ede-c1-mon01.log-debug10.bz2

On Patrick comment: I am running a single active MDS with the second one as a standby with replay with the option:

mds_standby_replay = true

I am more the happy to retry with the patches, if they will help on a single MDS system. Please let me know if I should apply these patches to 12.2.0 or master or ?

Let me know if you need anything else off the current system.

Actions

Copy link

Updated by Zheng Yan over 6 years ago

there are lots of "mds.0.purge_queue _consume: not readable right now" in the log.looks like purge queue stayed in non-readable state

please set debug_mds=5 and debug_journaler=10, restart mds, let mds run a few minutes and upload mds log

Actions

Copy link

Updated by Eric Eastman over 6 years ago

I uploaded the new mds run with

debug_mds=5
debug_journaler=10

to:

ftp://ftp.keepertech.com/outgoing/eric/ceph_logs/ceph-mds.ede-c1-mon01.log-debug-msd5-journaler10.bz2

Actions

Copy link

Updated by Zheng Yan over 6 years ago

2017-09-26 09:16:41.000627 7f58662b4700 10 mds.0.journaler.pq(rw) _prefetch
2017-09-26 09:16:41.012367 7f58662b4700 10 mds.0.journaler.pq(rw) _finish_read got 1850138846~3743522
2017-09-26 09:16:41.012375 7f58662b4700 10 mds.0.journaler.pq(rw) _assimilate_prefetch 1850138846~3743522
2017-09-26 09:16:41.012376 7f58662b4700 10 mds.0.journaler.pq(rw) _assimilate_prefetch gap of 4194304 from received_pos 1853882368 to first prefetched buffer 1858076672
2017-09-26 09:16:41.012378 7f58662b4700 10 mds.0.journaler.pq(rw) _assimilate_prefetch read_buf now 1850138846~3743522, read pointers 1850138846/1853882368/1895825408
2017-09-26 09:16:41.012416 7f58662b4700 -1 mds.0.journaler.pq(rw) _decode error from assimilate_prefetch

looks like purge queue journal is corrupted. When was the filesystem created? I know a bug (when developing luminous) that can cause this corruption, but it has already been fix in ceph version 12.2.0

please upload objects 500.00000000 and 500.000001b9, I will help you to recover it.

Actions

Copy link

Updated by Zheng Yan over 6 years ago

Related to Bug #19593: purge queue and standby replay mds added

Actions

Copy link

Updated by Eric Eastman over 6 years ago

This file system was create with Ceph v12.2.0. This cluster was cleanly installed with Ceph v12.2.0 and was never upgraded.

I uploaded the two objects from the pool cephfs_metadata and put them at:

ftp://ftp.keepertech.com/outgoing/eric/ceph_logs/500.00000000.dat.bz2
ftp://ftp.keepertech.com/outgoing/eric/ceph_logs/500.000001b9.dat.bz2

This is a test cluster. I can recreate the file system and data easily, so please do not waste time recovering it unless it helps you analyze the issue.

Actions

Copy link

Updated by Zheng Yan over 6 years ago

OK, it's likely caused by http://tracker.ceph.com/issues/19593. please don't enable standby reply for now

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #21551

Ceph FS not recovering space on Luminous

Updated by Patrick Donnelly over 6 years ago

Updated by Zheng Yan over 6 years ago

Updated by Eric Eastman over 6 years ago

Updated by Zheng Yan over 6 years ago

Updated by Eric Eastman over 6 years ago

Updated by Zheng Yan over 6 years ago

Updated by Zheng Yan over 6 years ago

Updated by Eric Eastman over 6 years ago

Updated by Zheng Yan over 6 years ago