Bug #44023: MDS continuously crashing on v14.2.7 - CephFS - Ceph

Actions

Copy link

Bug #44023

open

MDS continuously crashing on v14.2.7

Added by Michael Sudnick about 4 years ago. Updated about 4 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v14.2.7

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I have max mds set to 2, though I have tried fiddling with the values since hitting the crash. Ceph status indicates the mds rejoining and reconnecting, and then looping, with the mds crashing. I've attached a log which I think captures the relevant crash from one of my MDS daemons. The crashes happen with or without the keyring entries in ceph.conf, I added them while debugging to progress past an earlier encountered issue which suggested an auth problem.

Files

Download all files

ceph.conf (813 Bytes) ceph.conf		Michael Sudnick, 02/06/2020 10:19 PM
ceph-mds.ceph0.log (54.9 KB) ceph-mds.ceph0.log		Michael Sudnick, 02/06/2020 10:24 PM
ceph-mon.ceph4.log.post (59.1 KB) ceph-mon.ceph4.log.post		Michael Sudnick, 02/08/2020 05:15 PM

Actions

Copy link

Updated by Michael Sudnick about 4 years ago

I have tried resetting the MDS map to no avail. Also have tried failing the filesystem and then setting it joinable without success.

Actions

Copy link

Updated by Michael Sudnick about 4 years ago

It looks like the MDSes are not being assigned a rank when they come up, ceph fs get cephfs shows:
Filesystem 'cephfs' (5)
fs_name cephfs
epoch 253032
flags 12
created 2020-02-06 19:12:11.351844
modified 2020-02-07 11:02:51.328320
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
min_compat_client -1 (unspecified)
last_failure 0
last_failure_osd_epoch 459380
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds 1
in 0
up {0=302606143}
failed
damaged
stopped 1
data_pools [68]
metadata_pool 67
inline_data disabled
balancer
standby_count_wanted 1
302606143: [v2:10.0.151.0:6832/2622467401,v1:10.0.151.0:6833/2622467401] 'ceph1' mds.0.253029 up:rejoin seq 5 laggy since 2020-02-07 11:02:51.328271