Actions
Bug #43407
openmds crash after update to v14.2.5
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
12/23/2019
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
All MDS crashed and not able to restart after update from v14.2.4 to v14.2.5
systemctl status:
ceph-mds@ceph1.service - Ceph metadata server daemon
Loaded: loaded (/lib/systemd/system/ceph-mds@.service; indirect; vendor preset: enabled)
Active: failed (Result: signal) since Sun 2019-12-22 23:15:58 UTC; 3s ago
Process: 7922 ExecStart=/usr/bin/ceph-mds -f --cluster ${CLUSTER} --id ceph1 --setuser ceph --setgroup ceph (code=killed, signal=SEGV)
Main PID: 7922 (code=killed, signal=SEGV)
Dez 22 23:15:58 ceph1.savoca.de systemd[1]: ceph-mds@ceph1.service: Service hold-off time over, scheduling restart.
Dez 22 23:15:58 ceph1.savoca.de systemd[1]: ceph-mds@ceph1.service: Scheduled restart job, restart counter is at 3.
Dez 22 23:15:58 ceph1.savoca.de systemd[1]: Stopped Ceph metadata server daemon.
Dez 22 23:15:58 ceph1.savoca.de systemd[1]: ceph-mds@ceph1.service: Start request repeated too quickly.
Dez 22 23:15:58 ceph1.savoca.de systemd[1]: ceph-mds@ceph1.service: Failed with result 'signal'.
Dez 22 23:15:58 ceph1.savoca.de systemd[1]: Failed to start Ceph metadata server daemon.
ceph -s:
cluster:
id: 3ac4d7e5-98ca-4b75-bb88-02386caa5793
health: HEALTH_WARN
1 filesystem is degraded
insufficient standby MDS daemons available
85 daemons have recently crashed
services:
mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 22h)
mgr: ceph2(active, since 22h), standbys: ceph3, ceph1
mds: media:1/1 {0=ceph1=up:replay(laggy or crashed)}
osd: 6 osds: 6 up (since 22h), 6 in (since 8w)
data:
pools: 12 pools, 181 pgs
objects: 2.22M objects, 8.3 TiB
usage: 12 TiB used, 9.4 TiB / 22 TiB avail
pgs: 181 active+clean
Excerpt of ceph-mds.log:
2019-12-22 12:20:17.347 7fce967872c0 0 set uid:gid to 64045:64045 (ceph:ceph)
2019-12-22 12:20:17.347 7fce967872c0 0 ceph version 14.2.5 (ad5bd132e1492173c85fda2cc863152730b16a92) nautilus (stable), process ceph-mds, pid 7040
2019-12-22 12:20:17.347 7fce967872c0 0 pidfile_write: ignore empty --pid-file
2019-12-22 12:20:17.351 7fce848d7700 1 mds.ceph1 Updating MDS map to version 331 from mon.0
2019-12-22 12:20:21.811 7fce848d7700 1 mds.ceph1 Updating MDS map to version 332 from mon.0
2019-12-22 12:20:21.811 7fce848d7700 1 mds.ceph1 Map has assigned me to become a standby
2019-12-22 12:20:21.815 7fce848d7700 1 mds.ceph1 Updating MDS map to version 333 from mon.0
2019-12-22 12:20:21.815 7fce848d7700 1 mds.0.333 handle_mds_map i am now mds.0.333
2019-12-22 12:20:21.815 7fce848d7700 1 mds.0.333 handle_mds_map state change up:boot --> up:replay
2019-12-22 12:20:21.815 7fce848d7700 1 mds.0.333 replay_start
2019-12-22 12:20:21.815 7fce848d7700 1 mds.0.333 recovery set is
2019-12-22 12:20:21.815 7fce848d7700 1 mds.0.333 waiting for osdmap 6044 (which blacklists prior instance)
2019-12-22 12:20:21.819 7fce7d8c9700 0 mds.0.cache creating system inode with ino:0x100
2019-12-22 12:20:21.819 7fce7d8c9700 0 mds.0.cache creating system inode with ino:0x1
2019-12-22 12:20:22.055 7fce7c8c7700 -1 log_channel(cluster) log [ERR] : replayed ESubtreeMap at 809504954 subtree root 0x1 is not mine in cache (it's -2,-2)
2019-12-22 12:20:22.055 7fce7c8c7700 0 mds.0.journal journal subtrees: {0x1=[],0x100=[]}
2019-12-22 12:20:22.055 7fce7c8c7700 0 mds.0.journal journal ambig_subtrees:
2019-12-22 12:20:22.059 7fce7c8c7700 -1 *** Caught signal (Segmentation fault) **
in thread 7fce7c8c7700 thread_name:md_log_replay
ceph version 14.2.5 (ad5bd132e1492173c85fda2cc863152730b16a92) nautilus (stable)
1: (()+0x12890) [0x7fce8d26d890]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Files
Actions