Bug #19438: ceph mds error "No space left on device" - CephFS - Ceph

Actions

Copy link

Bug #19438

closed

ceph mds error "No space left on device"

Added by william sheng about 7 years ago. Updated almost 6 years ago.

Status:

Won't Fix

Priority:

High

Assignee:

Category:

Correctness/Safety

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Through testing the bash script for MDS cluster, create a test directory under the ceph mount path, in the test directory to create 100000 small file, when the script quickly after writing 10000 files, ceph cluster alarm: "No space left on device", in the Google directory can be solved by setting the parameters in the query problem of division
Operation:
cat /etc/ceph/ceph.conf
osd crush location hook = /usr/bin/calamari-crush-location
public_network = 10.0.0.0/24
osd_pool_default_size = 3
osd_pool_default_min_size = 2
osd_pool_default_pgp_num = 256
osd_pool_default_pg_num = 256
osd_crush_chooseleaf_type = 1
mon_clock_drift_allowed = 2
mon_clock_drift_warn_backoff = 30
hellomon_osd_full_ratio = .85
mon_osd_nearfull_ratio = .85
debug rgw = 20
#rgw_override_bucket_index_max_shards = 0
mds_bal_frag = true
#mon_pg_warn_max_per_osd = 0

command：

Ceph fs set dbleader allow_dirfrags true - yes - - really - I mean - it

+++++++++++++++++++++++++++++++++ script +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

#!/bin/bash
let START=`date +'%s'`
for i in {1..400000};do
(( mod=$i % 1000 ))
if (( mod == 0 )); then
END=`date +'%s'`
let elapse=END-START
echo $i, $elapse
fi
dd if=/dev/zero of=./new-$i-new1 bs=200k count=1
done
END=`date +'%s'`
let elapse=END-START
echo $START, $END, $elapse

Actions

Copy link

Updated by John Spray about 7 years ago

Hmm, so fragmentation is enabled but apparently isn't happening? Is your ceph.conf with "mds_bal_frag = true" present on all the MDS servers?

The code in jewel would sometimes take a few seconds before fragmenting a directory, but that should usually still be fast enough.

If this is just a testing system, it would be interesting to try installing the 12.0.0 release to see if this still has the issue.

Otherwise, try setting "debug mds = 10" and look for log messages starting "mds.0.bal" for clues about whether it's trying to split fragments.

Actions

Copy link

Updated by John Spray about 7 years ago

Category changed from 90 to Correctness/Safety
Target version deleted (~~v10.2.7~~)

Actions

Copy link

Updated by berlin sun about 7 years ago

ceph mds error "No space left on device" ，This problem I have encountered in the 10.2.6 version 。
Need to add a line mds_bal_frag = true in the ceph.conf file and implement ceph fs set <mufs> allow_dirfrags true the command to take effect. But I don't understand why I did this in the new version of jewel？

Actions

Copy link

Updated by Patrick Donnelly almost 6 years ago

Status changed from New to Won't Fix

dirfrags are not stable on jewel. Closing this.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #19438

ceph mds error "No space left on device"

Updated by John Spray about 7 years ago

Updated by John Spray about 7 years ago

Updated by berlin sun about 7 years ago

Updated by Patrick Donnelly almost 6 years ago