Bug #22249: Need to restart MDS to release cephfs space - CephFS - Ceph

Actions

Copy link

Bug #22249

closed

Need to restart MDS to release cephfs space

Added by junming rao over 6 years ago. Updated about 5 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v10.2.6

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I used 'ceph df' to show the usage of the cluster was 238TB (2 copies), however, the result of using 'du sh' into the client mount point is 115TB; then, restarting the MDS server that the space will gradually Release; In the MDS log, there are a lot of the following logs:
2017-11-27 10:44:55.787815 7fa238e28700 1 - 10.106.130.211:6800/424036496 --> 100.103.129.138:0/1784919364 -- client_session(renewcaps seq 859720) v2 -- ?+0 0x7fa2495e7500 con 0x7fa249627e00
2017-11-27 10:44:55.787827 7fa238e28700 1 -- 10.106.130.211:6800/424036496 --> 10.106.1.21:0/4260081559 -- client_session(renewcaps seq 1134004) v2 -- ?+0 0x7fa250739680 con 0x7fa25075b780
2017-11-27 10:44:55.787838 7fa238e28700 1 -- 10.106.130.211:6800/424036496 --> 100.103.131.5:0/3856597629 -- client_session(renewcaps seq 862837) v2 -- ?+0 0x7fa25073b600 con 0x7fa25075c380
2017-11-27 10:44:55.787849 7fa238e28700 1 -- 10.106.130.211:6800/424036496 --> 10.150.0.41:0/2035000660 -- client_session(renewcaps seq 102944) v2 -- ?+0 0x7fa25073e0c0 con 0x7fa250758480
2017-11-27 10:44:55.787862 7fa238e28700 1 -- 10.106.130.211:6800/424036496 --> 100.103.128.137:0/2332806151 -- client_session(renewcaps seq 859719) v2 -- ?+0 0x7fa25073aac0 con 0x7fa25075b600
2017-11-27 10:44:55.787880 7fa238e28700 1 -- 10.106.130.211:6800/424036496 --> 10.106.9.206:0/4207974913 -- client_session(renewcaps seq 900640) v2 -- ?+0 0x7fa25073c140 con 0x7fa25075bd80
2017-11-27 10:44:55.787901 7fa238e28700 1 -- 10.106.130.211:6800/424036496 --> 100.103.129.136:0/1248023682 -- client_session(renewcaps seq 859722) v2 -- ?+0 0x7fa25073b3c0 con 0x7fa25075d880
2017-11-27 10:44:55.787975 7fa238e28700 1 mds.0.330 cluster recovered.
2017-11-27 10:44:55.788025 7fa238e28700 1 -- 10.106.130.211:6800/424036496 <== mon.0 10.106.130.211:6789/0 32837 ==== mdsbeacon(4502697/MDS-Server1 up:active seq 32825 v333) v7 ==== 136+0+0 (638660309 0 0) 0x7fa24c3c9380 con 0x7fa249620600
2017-11-27 10:44:55.788049 7fa238e28700 1 -- 10.106.130.211:6800/424036496 <== client.154556 10.150.0.159:0/602082883 5 ==== client_request(client.154556:84307977 getattr pAsLsXsFs #1 2017-11-27 10:44:42.906927) v3 ==== 122+0+0 (1560076961 0 0) 0x7fa254ac23c0 con 0x7fa249625580
2017-11-27 10:44:55.794065 7fa234c1e700 1 -- 10.106.130.211:6800/424036496 --> 10.150.0.21:6828/48875 -- osd_op(mds.0.330:35059 1.fb63e6e5 100006761bf.00000000 [delete] snapc 1=[] ondisk+write+known_if_redirected+full_force e90810) v7 -- ?+0 0x7fa254ac6880 con 0x7fa249622700
2017-11-27 10:44:55.794153 7fa234c1e700 1 -- 10.106.130.211:6800/424036496 --> 10.150.0.16:6826/1044724 -- osd_op(mds.0.330:35060 1.a1e5a26c 100006760ed.00000000 [delete] snapc 1=[] ondisk+write+known_if_redirected+full_force e90810) v7 -- ?+0 0x7fa254ac6e00 con 0x7fa250a1b480
2017-11-27 10:44:55.794186 7fa234c1e700 1 -- 10.106.130.211:6800/424036496 --> 10.150.0.16:6835/1045946 -- osd_op(mds.0.330:35061 1.69cd29a5 100006760ed.00000001 [delete] snapc 1=[] ondisk+write+known_if_redirected+full_force e90810) v7 -- ?+0 0x7fa254ac6b40 con 0x7fa250a1b000
2017-11-27 10:44:55.794222 7fa234c1e700 1 -- 10.106.130.211:6800/424036496 --> 10.150.0.12:6831/836678 -- osd_op(mds.0.330:35062 1.e0db34d9 100006760ed.00000002 [delete] snapc 1=[] ondisk+write+known_if_redirected+full_force e90810) v7 -- ?+0 0x7fa254ac7640 con 0x7fa250990a80
2017-11-27 10:44:55.794255 7fa234c1e700 1 -- 10.106.130.211:6800/424036496 --> 10.150.0.21:6832/49623 -- osd_op(mds.0.330:35063 1.5d92f497 100006760ed.00000003 [delete] snapc 1=[] ondisk+write+known_if_redirected+full_force e90810) v7 -- ?+0 0x7fa254ac7380 con 0x7fa250a1f680
2017-11-27 10:44:55.794577 7fa234c1e700 1 -- 10.106.130.211:6800/424036496 --> 10.150.0.13:6802/1782512 -- osd_op(mds.0.330:35064 1.19721a15 100006760ed.00000004 [delete] snapc 1=[] ondisk+write+known_if_redirected+full_force e90810) v7 -- ?+0 0x7fa254ac70c0 con 0x7fa250a1f500
2017-11-27 10:44:55.794856 7fa234c1e700 1 -- 10.106.130.211:6800/424036496 --> 10.150.0.17:6836/1096760 -- osd_op(mds.0.330:35065 1.f59ebf18 100006760ed.00000005 [delete] snapc 1=[] ondisk+write+known_if_redirected+full_force e90810) v7 -- ?+0 0x7fa254ac7bc0 con 0x7fa250997200
2017-11-27 10:44:55.795136 7fa234c1e700 1 -- 10.106.130.211:6800/424036496 --> 10.150.1.12:6816/87646 -- osd_op(mds.0.330:35066 1.42db5fcd 100006760ed.00000006 [delete] snapc 1=[] ondisk+write+known_if_redirected+full_force e90810) v7 -- ?+0 0x7fa254ac7900 con 0x7fa249623300
2017-11-27 10:44:55.795210 7fa234c1e700 1 -- 10.106.130.211:6800/424036496 --> 10.150.1.11:6800/5195 -- osd_op(mds.0.330:35067 1.e47844a2 100006760ed.00000007 [delete] snapc 1=[] ondisk+write+known_if_redirected+full_force e90810) v7 -- ?+0 0x7fa250b40b00 con 0x7fa249621e00
2017-11-27 10:44:55.795232 7fa234c1e700 1 -- 10.106.130.211:6800/424036496 --> 10.150.0.12:6831/836678 -- osd_op(mds.0.330:35068 1.1e81f49b 100006760ed.00000008 [delete] snapc 1=[] ondisk+write+known_if_redirected+full_force e90810) v7 -- ?+0 0x7fa250b40840 con 0x7fa250990a80
2017-11-27 10:44:55.795253 7fa234c1e700 1 -- 10.106.130.211:6800/424036496 --> 10.150.0.21:6832/49623 -- osd_op(mds.0.330:35069 1.2f6cbe2 100006760ed.00000009 [delete] snapc 1=[] ondisk+write+known_if_redirected+full_force e90810) v7 -- ?+0 0x7fa250b40580 con 0x7fa250a1f680
2017-11-27 10:44:55.795289 7fa234c1e700 1 -- 10.106.130.211:6800/424036496 --> 10.150.1.12:6812/87508 -- osd_op(mds.0.330:35070 1.130777ec 100006529d4.00000000 [delete] snapc 1=[] ondisk+write+known_if_redirected+full_force e90810) v7 -- ?+0 0x7fa250b402c0 con 0x7fa250a1ce00
2017-11-27 10:44:55.795313 7fa234c1e700 1 -- 10.106.130.211:6800/424036496 --> 10.150.0.13:6830/1783998 -- osd_op(mds.0.330:35071 1.658d7b51 100006529d4.00000001 [delete] snapc 1=[] ondisk+write+known_if_redirected+full_force e90810) v7 -- ?+0 0x7fa250b40000 con 0x7fa250e82700
2017-11-27 10:44:55.795333 7fa234c1e700 1 -- 10.106.130.211:6800/424036496 --> 10.150.0.15:6836/3716843 -- osd_op(mds.0.330:35072 1.9181682c 100006529d4.00000002 [delete] snapc 1=[] ondisk+write+known_if_redirected+full_force e90810) v7 -- ?+0 0x7fa250b41b80 con 0x7fa251d83300
2017-11-27 10:44:55.795360 7fa234c1e700 1 -- 10.106.130.211:6800/424036496 --> 10.150.0.14:6830/1971463 -- osd_op(mds.0.330:35073 1.da21d75c 100006529d4.00000003 [delete] snapc 1=[] ondisk+write+known_if_redirected+full_force e90810) v7 -- ?+0 0x7fa250b418c0 con 0x7fa250e81200
2017-11-27 10:44:55.795391 7fa234c1e700 1 -- 10.106.130.211:6800/424036496 --> 10.150.0.12:6831/836678 -- osd_op(mds.0.330:35074 1.c8404d3e 100006529d4.00000004 [delete] snapc 1=[] ondisk+write+known_if_redirected+full_force e90810) v7 -- ?+0 0x7fa250b41600 con 0x7fa250990a80

Files

Download all files

ceph-mds.log (674 KB) ceph-mds.log	MDS log	junming rao, 11/27/2017 07:52 AM
ceph-mds.tar.gz (40.1 KB) ceph-mds.tar.gz	capture some log before mds restart	junming rao, 11/29/2017 01:53 PM
ceph-mds.zip (760 KB) ceph-mds.zip	MDS log	junming rao, 11/30/2017 09:58 AM
dump-mds-log001.rar (374 KB) dump-mds-log001.rar	MDS Cache log 001	junming rao, 01/24/2018 09:06 AM
dump-mds-log002.rar (358 KB) dump-mds-log002.rar	MDS Cache log 002	junming rao, 01/24/2018 09:07 AM
dump-mds-log003.rar (435 KB) dump-mds-log003.rar	MDS Cache log 003	junming rao, 01/24/2018 09:07 AM

Actions

Copy link

Updated by Zheng Yan over 6 years ago

It seems the log was generated by pre-luminous mds. which version of ceph do you use.

Actions

Copy link

Updated by Patrick Donnelly over 6 years ago

Status changed from New to Need More Info

Actions

Copy link

Updated by junming rao over 6 years ago

Zheng Yan wrote:

It seems the log was generated by pre-luminous mds. which version of ceph do you use.

OS Version: CentOS 7.2
Ceph Version: 10.2.6

Actions

Copy link

Updated by Zheng Yan over 6 years ago

can't find any clue from log. Next time it happens, please set debug_mds=10 and capture some log before mds restart

Actions

Copy link

Updated by junming rao over 6 years ago

File ceph-mds.tar.gz ceph-mds.tar.gz added

Zheng Yan wrote:

can't find any clue from log. Next time it happens, please set debug_mds=10 and capture some log before mds restart

capture some log before mds restart.

Actions

Copy link

Updated by Zheng Yan over 6 years ago

still no clue in the log. Do you still have this issue after restarting mds

Actions

Copy link

Updated by junming rao over 6 years ago

File ceph-mds.zip ceph-mds.zip added

Zheng Yan wrote:

still no clue in the log. Do you still have this issue after restarting mds

Hi Zheng yan：
The issue still exists after restarting mds. The log file contains information before and after restart MDS。

Actions

Copy link

Updated by Patrick Donnelly over 6 years ago

Assignee set to Zheng Yan

Actions

Copy link

Updated by Zheng Yan over 6 years ago

It seems you have multiple clients mount cephfs. do you use kernel client or ceph-fuse? try executing "echo 3 >/proc/sys/vm/drop_caches" on machines that have cephfs mount, check if it make cephfs release space.

Actions

Copy link

#10

Updated by junming rao over 6 years ago

Zheng Yan wrote:

It seems you have multiple clients mount cephfs. do you use kernel client or ceph-fuse? try executing "echo 3 >/proc/sys/vm/drop_caches" on machines that have cephfs mount, check if it make cephfs release space.

I used ceph-fuse client,and multiple clients mount cephfs. if executing "echo 3 > /proc/sys/vm/drop_caches" to free pagecache, dentries and inodes，I'm worried that it will affect the application running on the system.

Actions

Copy link

#11

Updated by Zheng Yan over 6 years ago

please try remounting all cephfs with ceph-fuse option --client_try_dentry_invalidate=false.

Besides, please create a mds cache dump and upload it. (ceph daemon mds.xx dump cache /tmp/dump.0)

Actions

Copy link

#12

Updated by junming rao over 6 years ago

Zheng Yan wrote:

please try remounting all cephfs with ceph-fuse option --client_try_dentry_invalidate=false.

Besides, please create a mds cache dump and upload it. (ceph daemon mds.xx dump cache /tmp/dump.0)

Can i change it online, use ceph daemon xx.asok config set '--client_try_dentry_invalidate=false' ?
Thanks.

Actions

Copy link

#13

Updated by Zheng Yan over 6 years ago

junming rao wrote:

Zheng Yan wrote:

please try remounting all cephfs with ceph-fuse option --client_try_dentry_invalidate=false.

Besides, please create a mds cache dump and upload it. (ceph daemon mds.xx dump cache /tmp/dump.0)

Can i change it online, use ceph daemon xx.asok config set '--client_try_dentry_invalidate=false' ?
Thanks.

No. changing it online has no effect.

Actions

Copy link Download all files

#14

Updated by junming rao about 6 years ago

File dump-mds-log001.rar dump-mds-log001.rar added
File dump-mds-log002.rar dump-mds-log002.rar added
File dump-mds-log003.rar dump-mds-log003.rar added

Zheng Yan wrote:

junming rao wrote:

Zheng Yan wrote:

please try remounting all cephfs with ceph-fuse option --client_try_dentry_invalidate=false.

Besides, please create a mds cache dump and upload it. (ceph daemon mds.xx dump cache /tmp/dump.0)

Can i change it online, use ceph daemon xx.asok config set '--client_try_dentry_invalidate=false' ?
Thanks.

No. changing it online has no effect.

Hi Zheng Yan:
We have collected mds cache log by the command.ceph daemon mds.xx dump cache /tmp/dump.0) .
Thanks.

Actions

Copy link

#15