Project

General

Profile

Actions

Bug #19438

closed

ceph mds error "No space left on device"

Added by william sheng about 7 years ago. Updated almost 6 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Through testing the bash script for MDS cluster, create a test directory under the ceph mount path, in the test directory to create 100000 small file, when the script quickly after writing 10000 files, ceph cluster alarm: "No space left on device", in the Google directory can be solved by setting the parameters in the query problem of division
Operation:
cat /etc/ceph/ceph.conf
osd crush location hook = /usr/bin/calamari-crush-location
public_network = 10.0.0.0/24
osd_pool_default_size = 3
osd_pool_default_min_size = 2
osd_pool_default_pgp_num = 256
osd_pool_default_pg_num = 256
osd_crush_chooseleaf_type = 1
mon_clock_drift_allowed = 2
mon_clock_drift_warn_backoff = 30
hellomon_osd_full_ratio = .85
mon_osd_nearfull_ratio = .85
debug rgw = 20
#rgw_override_bucket_index_max_shards = 0
mds_bal_frag = true
#mon_pg_warn_max_per_osd = 0

command:

Ceph fs set dbleader allow_dirfrags true - yes - - really - I mean - it

+++++++++++++++++++++++++++++++++ script +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

#!/bin/bash
let START=`date +'%s'`
for i in {1..400000};do
(( mod=$i % 1000 ))
if (( mod == 0 )); then
END=`date +'%s'`
let elapse=END-START
echo $i, $elapse
fi
dd if=/dev/zero of=./new-$i-new1 bs=200k count=1
done
END=`date +'%s'`
let elapse=END-START
echo $START, $END, $elapse

Actions #1

Updated by John Spray about 7 years ago

Hmm, so fragmentation is enabled but apparently isn't happening? Is your ceph.conf with "mds_bal_frag = true" present on all the MDS servers?

The code in jewel would sometimes take a few seconds before fragmenting a directory, but that should usually still be fast enough.

If this is just a testing system, it would be interesting to try installing the 12.0.0 release to see if this still has the issue.

Otherwise, try setting "debug mds = 10" and look for log messages starting "mds.0.bal" for clues about whether it's trying to split fragments.

Actions #2

Updated by John Spray about 7 years ago

  • Category changed from 90 to Correctness/Safety
  • Target version deleted (v10.2.7)
Actions #3

Updated by berlin sun about 7 years ago

ceph mds error "No space left on device" ,This problem I have encountered in the 10.2.6 version 。
Need to add a line mds_bal_frag = true in the ceph.conf file and implement ceph fs set <mufs> allow_dirfrags true the command to take effect. But I don't understand why I did this in the new version of jewel?

Actions #4

Updated by Patrick Donnelly almost 6 years ago

  • Status changed from New to Won't Fix

dirfrags are not stable on jewel. Closing this.

Actions

Also available in: Atom PDF