Bug #36085: ceph mds repeat laggy or crashed - CephFS - Ceph

Actions

Copy link

Bug #36085

open

ceph mds repeat laggy or crashed

Added by Yunzhi Cheng over 5 years ago. Updated over 4 years ago.

Status:

Need More Info

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v13.2.1

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

cluster:
    id:     10f9c55a-a813-44d7-bce7-e6159a98dc61
    health: HEALTH_WARN
            774413/230157054 objects misplaced (0.336%)
            Degraded data redundancy: 27570/230157054 objects degraded (0.012%), 1 pg degraded, 1 pg undersized

  services:
    mon: 3 daemons, quorum rndcl94,rndcl106,rndcl154
    mgr: rndcl94(active), standbys: rndcl106, rndcl154
    mds: cephfs-1/1/1 up  {0=rndcl94=up:active(laggy or crashed)}
    osd: 24 osds: 24 up, 24 in; 11 remapped pgs

  data:
    pools:   2 pools, 512 pgs
    objects: 76.72 M objects, 14 TiB
    usage:   53 TiB used, 36 TiB / 89 TiB avail
    pgs:     27570/230157054 objects degraded (0.012%)
             774413/230157054 objects misplaced (0.336%)
             501 active+clean
             10  active+remapped+backfilling
             1   active+undersized+degraded+remapped+backfilling

  io:
    client:   9.7 MiB/s rd, 6.5 MiB/s wr, 94 op/s rd, 630 op/s wr
    recovery: 17 MiB/s, 90 objects/s

the mds go to laggy and crashed every a few seconds, and then go to active

this status seems make cephfs very slow but the mds process never truly dead.

I try to restart mds, but seems no effect.

I set debug_ms=1 and catch some log.

Files

mds_log_summary.log.gz (706 KB) mds_log_summary.log.gz

Yunzhi Cheng, 09/19/2018 11:31 AM