Bug #49597: mds: mds goes to 'replay' state after setting 'osd_failsafe_ratio' to less than size of data written. - CephFS - Ceph

Actions

Copy link

Bug #49597

open

mds: mds goes to 'replay' state after setting 'osd_failsafe_ratio' to less than size of data written.

Added by Kotresh Hiremath Ravishankar about 3 years ago. Updated almost 2 years ago.

Status:

New

Priority:

High

Assignee:

Category:

Target version:

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Steps to reproduce on vstart cluster:

1. Set the following in ../src/vstart.sh

1. Disable client_check_pool_perm
       [global]
       client check pool perm = false

2. Set the bluestore block size to 10G
       bluestore block size = 10737418240

2. Start vstart cluster as below

#env MDS=1 OSD=1 MON=1 ../src/vstart.sh -d -b -n --without-dashboard

3. Create a subvolume and write around 5G file

#bin/ceph fs subvolume create a sub_0
#subvol_path=$(bin/ceph fs subvolume getpath a sub_0 2>dev/null)
#bin/ceph-fuse -c ./ceph.conf /mnt
#dd if=/dev/urandom of=/mnt$subvol_path/5GB_file-1 status=progress bs=1M count=5000

4. Set osd ratios as below:

#bin/ceph osd set-full-ratio 0.2
#bin/ceph osd set-nearfull-ratio 0.16
#bin/ceph osd set-backfillfull-ratio 0.18

5. Removing the subvolume should return ENOSPACE:

#bin/ceph fs subvolume rm a sub_0
DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH
2021-03-04T15:24:02.674+0530 7fa9ec672700 -1 WARNING: all dangerous and experimental features are enabled.
2021-03-04T15:24:02.698+0530 7fa9ec672700 -1 WARNING: all dangerous and experimental features are enabled.
Error ENOSPC: error in setxattr

6. Output of 'osd df'

#bin/ceph osd df
DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH
2021-03-04T15:25:43.136+0530 7f619257c700 -1 WARNING: all dangerous and experimental features are enabled.
2021-03-04T15:25:43.152+0530 7f619257c700 -1 WARNING: all dangerous and experimental features are enabled.
ID  CLASS  WEIGHT   REWEIGHT  SIZE    RAW USE  DATA     OMAP  META    AVAIL    %USE   VAR   PGS  STATUS
 0    ssd  0.01070   1.00000  11 GiB  5.9 GiB  4.9 GiB   0 B  27 MiB  5.1 GiB  53.48  1.00  192      up
                       TOTAL  11 GiB  5.9 GiB  4.9 GiB   0 B  27 MiB  5.1 GiB  53.48                   
MIN/MAX VAR: 1.00/1.00  STDDEV: 0

7. Edit the ./ceph.conf to set osd failsafe ratio to 0.5 (This can also be set at run time but ceph.conf takes precedence)

osd failsafe full ratio = .5

8. Stop and start the same cluster.

#../src/stop.sh
#env MDS=1 OSD=1 MON=1 ../src/vstart.sh -d -b --without-dashboard

9. Check that mds would stuck in `replay` state.

#bin/ceph fs status
DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH
2021-03-04T15:30:27.122+0530 7f2515951700 -1 WARNING: all dangerous and experimental features are enabled.
2021-03-04T15:30:27.138+0530 7f2515951700 -1 WARNING: all dangerous and experimental features are enabled.
a - 0 clients
RANK  STATE   MDS  ACTIVITY   DNS    INOS   DIRS   CAPS  
 0    replay   a                0      0      0      0   
     POOL        TYPE     USED  AVAIL  
cephfs.a.meta  metadata   188k     0   
cephfs.a.data    data    5000M     0   
MDS version: ceph version Development (no_version) quincy (dev)

10. Now all mgr commands hang, waiting for the cephfs mount.

Files

Download all files

logs_except_osd.tar.gz (345 KB) logs_except_osd.tar.gz	mds-mon-mgr-logs	Kotresh Hiremath Ravishankar, 03/04/2021 01:56 PM
osd_last10k.tar.gz (109 KB) osd_last10k.tar.gz		Kotresh Hiremath Ravishankar, 03/04/2021 02:02 PM