Project

General

Profile

Actions

Support #16528

closed

Stuck with CephFS with 1M files in one dir

Added by elder one almost 8 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Tags:
cephfs
Reviewed:
Affected Versions:
Component(FS):
Labels (FS):
Pull request ID:

Description

I'm pretty much stukc with cephfs (jewel 10.2.2) with 1 million 0 byte files in one dir left behind from unsuccessful bonnie++ run.
Can't list/delete files or that directory. At least in reasonable time. (Waited for 4 hours).

Ceph kernel client with 4.4.14 kernel on Ubuntu 14.04.

2 metadata servers, 1 active, another in standby-replay mode.

From active mds log when I try to access dir:

2016-06-29 19:05:39.053345 7fcc198da700  1 heartbeat_map reset_timeout 'MDSRank' had timed out after 10
2016-06-29 19:05:57.472549 7fcc198da700  0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 33.047606 secs
2016-06-29 19:05:57.472563 7fcc198da700  0 log_channel(cluster) log [WRN] : slow request 33.047606 seconds old, received at 2016-06-29 19:05:24.424893: client_request(client.28058182:3 readdir #100000021c2 2016-06-29 19:05:24.420352) currently acquired locks
2016-06-29 19:06:25.457003 7fcc198da700  0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 61.032040 secs
2016-06-29 19:06:25.457022 7fcc198da700  0 log_channel(cluster) log [WRN] : slow request 61.032040 seconds old, received at 2016-06-29 19:05:24.424893: client_request(client.28058182:3 readdir #100000021c2 2016-06-29 19:05:24.420352) currently acquired locks
2016-06-29 19:08:02.719494 7fcc198da700  0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 35.652664 secs
2016-06-29 19:08:02.719520 7fcc198da700  0 log_channel(cluster) log [WRN] : slow request 35.652664 seconds old, received at 2016-06-29 19:07:27.066769: client_request(client.28058182:6 readdir #100000021c2 00000ef1d6BKMay 2016-06-29 19:07:27.054945) currently acquired locks
2016-06-29 19:08:30.702883 7fcc198da700  0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 63.636058 secs
2016-06-29 19:08:30.702899 7fcc198da700  0 log_channel(cluster) log [WRN] : slow request 63.636058 seconds old, received at 2016-06-29 19:07:27.066769: client_request(client.28058182:6 readdir #100000021c2 00000ef1d6BKMay 2016-06-29 19:07:27.054945) currently acquired locks
2016-06-29 19:10:43.184162 7fcc198da700  0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 35.598239 secs
2016-06-29 19:10:43.184177 7fcc198da700  0 log_channel(cluster) log [WRN] : slow request 35.598239 seconds old, received at 2016-06-29 19:10:07.585870: client_request(client.28058182:9 readdir #100000021c2 0000022f71S 2016-06-29 19:10:07.574338) currently acquired locks
2016-06-29 19:11:10.606474 7fcc198da700  0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 63.020538 secs
2016-06-29 19:11:10.606503 7fcc198da700  0 log_channel(cluster) log [WRN] : slow request 63.020538 seconds old, received at 2016-06-29 19:10:07.585870: client_request(client.28058182:9 readdir #100000021c2 0000022f71S 2016-06-29 19:10:07.574338) currently acquired locks
2016-06-29 19:13:43.266474 7fcc198da700  0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 35.753201 secs
2016-06-29 19:13:43.266489 7fcc198da700  0 log_channel(cluster) log [WRN] : slow request 35.753201 seconds old, received at 2016-06-29 19:13:07.513218: client_request(client.28058182:13 readdir #100000021c2 0000012468KDbTAmV 2016-06-29 19:13:07.502135) currently acquired locks
2016-06-29 19:14:11.108257 7fcc198da700  0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 63.594955 secs
2016-06-29 19:14:11.108278 7fcc198da700  0 log_channel(cluster) log [WRN] : slow request 63.594955 seconds old, received at 2016-06-29 19:13:07.513218: client_request(client.28058182:13 readdir #100000021c2 0000012468KDbTAmV 2016-06-29 19:13:07.502135) currently acquired locks

Only one client mounted to fs as follows:
192.168.30.71,192.168.30.72,192.168.30.73:/ on /mnt/cephfs type ceph (name=admin,rsize=2097152,wsize=2097152,readdir_max_entries=10240,readdir_max_bytes=2097152,key=client.admin)

rados df

pool name                 KB      objects       clones     degraded      unfound           rd        rd KB           wr        wr KB
cephfs_data                0      1022482            0            0            0         9938     14623544      3655424     25191882
cephfs_metadata        35953           31            0            0            0        28177     67305532       147345      6187783

Ceph cluster status is OK


Files

ceph.conf (2.53 KB) ceph.conf elder one, 06/29/2016 04:59 PM
Actions

Also available in: Atom PDF