Project

General

Profile

Bug #49597

mds: mds goes to 'replay' state after setting 'osd_failsafe_ratio' to less than size of data written.

Added by Kotresh Hiremath Ravishankar about 1 month ago. Updated about 1 month ago.

Status:
New
Priority:
High
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Steps to reproduce on vstart cluster:

1. Set the following in ../src/vstart.sh

1. Disable client_check_pool_perm
[global]
client check pool perm = false
2. Set the bluestore block size to 10G
bluestore block size = 10737418240

2. Start vstart cluster as below

#env MDS=1 OSD=1 MON=1 ../src/vstart.sh -d -b -n --without-dashboard

3. Create a subvolume and write around 5G file

#bin/ceph fs subvolume create a sub_0
#subvol_path=$(bin/ceph fs subvolume getpath a sub_0 2>dev/null)
#bin/ceph-fuse -c ./ceph.conf /mnt
#dd if=/dev/urandom of=/mnt$subvol_path/5GB_file-1 status=progress bs=1M count=5000

4. Set osd ratios as below:

#bin/ceph osd set-full-ratio 0.2
#bin/ceph osd set-nearfull-ratio 0.16
#bin/ceph osd set-backfillfull-ratio 0.18

5. Removing the subvolume should return ENOSPACE:

#bin/ceph fs subvolume rm a sub_0
DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH
2021-03-04T15:24:02.674+0530 7fa9ec672700 -1 WARNING: all dangerous and experimental features are enabled.
2021-03-04T15:24:02.698+0530 7fa9ec672700 -1 WARNING: all dangerous and experimental features are enabled.
Error ENOSPC: error in setxattr

6. Output of 'osd df'

#bin/ceph osd df
DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH
2021-03-04T15:25:43.136+0530 7f619257c700 -1 WARNING: all dangerous and experimental features are enabled.
2021-03-04T15:25:43.152+0530 7f619257c700 -1 WARNING: all dangerous and experimental features are enabled.
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 ssd 0.01070 1.00000 11 GiB 5.9 GiB 4.9 GiB 0 B 27 MiB 5.1 GiB 53.48 1.00 192 up
TOTAL 11 GiB 5.9 GiB 4.9 GiB 0 B 27 MiB 5.1 GiB 53.48
MIN/MAX VAR: 1.00/1.00 STDDEV: 0

7. Edit the ./ceph.conf to set osd failsafe ratio to 0.5 (This can also be set at run time but ceph.conf takes precedence)

osd failsafe full ratio = .5

8. Stop and start the same cluster.

#../src/stop.sh
#env MDS=1 OSD=1 MON=1 ../src/vstart.sh -d -b --without-dashboard

9. Check that mds would stuck in `replay` state.

#bin/ceph fs status
DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH
2021-03-04T15:30:27.122+0530 7f2515951700 -1 WARNING: all dangerous and experimental features are enabled.
2021-03-04T15:30:27.138+0530 7f2515951700 -1 WARNING: all dangerous and experimental features are enabled.
a - 0 clients
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 replay a 0 0 0 0
POOL TYPE USED AVAIL
cephfs.a.meta metadata 188k 0
cephfs.a.data data 5000M 0
MDS version: ceph version Development (no_version) quincy (dev)

10. Now all mgr commands hang, waiting for the cephfs mount.

logs_except_osd.tar.gz - mds-mon-mgr-logs (345 KB) Kotresh Hiremath Ravishankar, 03/04/2021 01:56 PM

osd_last10k.tar.gz (109 KB) Kotresh Hiremath Ravishankar, 03/04/2021 02:02 PM

History

#3 Updated by Patrick Donnelly about 1 month ago

  • Priority changed from Normal to High
  • Target version set to v17.0.0
  • Source set to Development

Also available in: Atom PDF