Bug #44023
open
MDS continuously crashing on v14.2.7
Added by Michael Sudnick over 4 years ago.
Updated about 4 years ago.
Description
I have max mds set to 2, though I have tried fiddling with the values since hitting the crash. Ceph status indicates the mds rejoining and reconnecting, and then looping, with the mds crashing. I've attached a log which I think captures the relevant crash from one of my MDS daemons. The crashes happen with or without the keyring entries in ceph.conf, I added them while debugging to progress past an earlier encountered issue which suggested an auth problem.
Files
I have tried resetting the MDS map to no avail. Also have tried failing the filesystem and then setting it joinable without success.
It looks like the MDSes are not being assigned a rank when they come up, ceph fs get cephfs shows:
Filesystem 'cephfs' (5)
fs_name cephfs
epoch 253032
flags 12
created 2020-02-06 19:12:11.351844
modified 2020-02-07 11:02:51.328320
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
min_compat_client -1 (unspecified)
last_failure 0
last_failure_osd_epoch 459380
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds 1
in 0
up {0=302606143}
failed
damaged
stopped 1
data_pools [68]
metadata_pool 67
inline_data disabled
balancer
standby_count_wanted 1
302606143: [v2:10.0.151.0:6832/2622467401,v1:10.0.151.0:6833/2622467401] 'ceph1' mds.0.253029 up:rejoin seq 5 laggy since 2020-02-07 11:02:51.328271
Rolling back to 14.2.6 did not fix the issue.
Managed to mess around and recover by adding wipe_sessions to ceph.conf, sorry for the false alarm. This can be closed.
- Project changed from Ceph to CephFS
Also available in: Atom
PDF