Bug #44023
openMDS continuously crashing on v14.2.7
0%
Description
I have max mds set to 2, though I have tried fiddling with the values since hitting the crash. Ceph status indicates the mds rejoining and reconnecting, and then looping, with the mds crashing. I've attached a log which I think captures the relevant crash from one of my MDS daemons. The crashes happen with or without the keyring entries in ceph.conf, I added them while debugging to progress past an earlier encountered issue which suggested an auth problem.
Files
Updated by Michael Sudnick about 4 years ago
I have tried resetting the MDS map to no avail. Also have tried failing the filesystem and then setting it joinable without success.
Updated by Michael Sudnick about 4 years ago
It looks like the MDSes are not being assigned a rank when they come up, ceph fs get cephfs shows:
Filesystem 'cephfs' (5)
fs_name cephfs
epoch 253032
flags 12
created 2020-02-06 19:12:11.351844
modified 2020-02-07 11:02:51.328320
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
min_compat_client -1 (unspecified)
last_failure 0
last_failure_osd_epoch 459380
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds 1
in 0
up {0=302606143}
failed
damaged
stopped 1
data_pools [68]
metadata_pool 67
inline_data disabled
balancer
standby_count_wanted 1
302606143: [v2:10.0.151.0:6832/2622467401,v1:10.0.151.0:6833/2622467401] 'ceph1' mds.0.253029 up:rejoin seq 5 laggy since 2020-02-07 11:02:51.328271
Updated by Michael Sudnick about 4 years ago
Rolling back to 14.2.6 did not fix the issue.
Updated by Michael Sudnick about 4 years ago
- File ceph-mon.ceph4.log.post ceph-mon.ceph4.log.post added
Updated by Michael Sudnick about 4 years ago
Managed to mess around and recover by adding wipe_sessions to ceph.conf, sorry for the false alarm. This can be closed.
Updated by Patrick Donnelly about 4 years ago
- Project changed from Ceph to CephFS