Bug #43407: mds crash after update to v14.2.5 - CephFS - Ceph

Actions

Copy link

Bug #43407

open

mds crash after update to v14.2.5

Added by Marco Savoca over 4 years ago. Updated over 4 years ago.

Status:

Triaged

Priority:

Normal

Assignee:

Category:

Target version:

Ceph - v14.2.5

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

12/23/2019

Affected Versions:

Ceph - v14.2.5

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

All MDS crashed and not able to restart after update from v14.2.4 to v14.2.5

systemctl status:

ceph-mds@ceph1.service - Ceph metadata server daemon
   Loaded: loaded (/lib/systemd/system/ceph-mds@.service; indirect; vendor preset: enabled)
   Active: failed (Result: signal) since Sun 2019-12-22 23:15:58 UTC; 3s ago
  Process: 7922 ExecStart=/usr/bin/ceph-mds -f --cluster ${CLUSTER} --id ceph1 --setuser ceph --setgroup ceph (code=killed, signal=SEGV)
 Main PID: 7922 (code=killed, signal=SEGV)

Dez 22 23:15:58 ceph1.savoca.de systemd[1]: ceph-mds@ceph1.service: Service hold-off time over, scheduling restart.
Dez 22 23:15:58 ceph1.savoca.de systemd[1]: ceph-mds@ceph1.service: Scheduled restart job, restart counter is at 3.
Dez 22 23:15:58 ceph1.savoca.de systemd[1]: Stopped Ceph metadata server daemon.
Dez 22 23:15:58 ceph1.savoca.de systemd[1]: ceph-mds@ceph1.service: Start request repeated too quickly.
Dez 22 23:15:58 ceph1.savoca.de systemd[1]: ceph-mds@ceph1.service: Failed with result 'signal'.
Dez 22 23:15:58 ceph1.savoca.de systemd[1]: Failed to start Ceph metadata server daemon.

ceph -s:

cluster:
    id:     3ac4d7e5-98ca-4b75-bb88-02386caa5793
    health: HEALTH_WARN
            1 filesystem is degraded
            insufficient standby MDS daemons available
            85 daemons have recently crashed

  services:
    mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 22h)
    mgr: ceph2(active, since 22h), standbys: ceph3, ceph1
    mds: media:1/1 {0=ceph1=up:replay(laggy or crashed)}
    osd: 6 osds: 6 up (since 22h), 6 in (since 8w)

  data:
    pools:   12 pools, 181 pgs
    objects: 2.22M objects, 8.3 TiB
    usage:   12 TiB used, 9.4 TiB / 22 TiB avail
    pgs:     181 active+clean

Excerpt of ceph-mds.log:

2019-12-22 12:20:17.347 7fce967872c0  0 set uid:gid to 64045:64045 (ceph:ceph)
2019-12-22 12:20:17.347 7fce967872c0  0 ceph version 14.2.5 (ad5bd132e1492173c85fda2cc863152730b16a92) nautilus (stable), process ceph-mds, pid 7040
2019-12-22 12:20:17.347 7fce967872c0  0 pidfile_write: ignore empty --pid-file
2019-12-22 12:20:17.351 7fce848d7700  1 mds.ceph1 Updating MDS map to version 331 from mon.0
2019-12-22 12:20:21.811 7fce848d7700  1 mds.ceph1 Updating MDS map to version 332 from mon.0
2019-12-22 12:20:21.811 7fce848d7700  1 mds.ceph1 Map has assigned me to become a standby
2019-12-22 12:20:21.815 7fce848d7700  1 mds.ceph1 Updating MDS map to version 333 from mon.0
2019-12-22 12:20:21.815 7fce848d7700  1 mds.0.333 handle_mds_map i am now mds.0.333
2019-12-22 12:20:21.815 7fce848d7700  1 mds.0.333 handle_mds_map state change up:boot --> up:replay
2019-12-22 12:20:21.815 7fce848d7700  1 mds.0.333 replay_start
2019-12-22 12:20:21.815 7fce848d7700  1 mds.0.333  recovery set is 
2019-12-22 12:20:21.815 7fce848d7700  1 mds.0.333  waiting for osdmap 6044 (which blacklists prior instance)
2019-12-22 12:20:21.819 7fce7d8c9700  0 mds.0.cache creating system inode with ino:0x100
2019-12-22 12:20:21.819 7fce7d8c9700  0 mds.0.cache creating system inode with ino:0x1
2019-12-22 12:20:22.055 7fce7c8c7700 -1 log_channel(cluster) log [ERR] :  replayed ESubtreeMap at 809504954 subtree root 0x1 is not mine in cache (it's -2,-2)
2019-12-22 12:20:22.055 7fce7c8c7700  0 mds.0.journal journal subtrees: {0x1=[],0x100=[]}
2019-12-22 12:20:22.055 7fce7c8c7700  0 mds.0.journal journal ambig_subtrees: 
2019-12-22 12:20:22.059 7fce7c8c7700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fce7c8c7700 thread_name:md_log_replay

 ceph version 14.2.5 (ad5bd132e1492173c85fda2cc863152730b16a92) nautilus (stable)
 1: (()+0x12890) [0x7fce8d26d890]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Files

Download all files

ceph-mds.ceph1.log (100 KB) ceph-mds.ceph1.log		Marco Savoca, 12/22/2019 11:36 PM
ceph-mds.ceph2.log.gz (450 KB) ceph-mds.ceph2.log.gz		Marco Savoca, 01/02/2020 09:18 PM

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #43407

mds crash after update to v14.2.5

Updated by Patrick Donnelly over 4 years ago

Updated by Marco Savoca over 4 years ago

Updated by Zheng Yan over 4 years ago

Updated by Marco Savoca over 4 years ago

Updated by Marco Savoca over 4 years ago

Updated by Patrick Donnelly over 4 years ago

Updated by Patrick Donnelly over 4 years ago

Updated by Zheng Yan over 4 years ago