Project

General

Profile

Actions

Bug #47582

closed

MDS failover takes 10-15 hours: Ceph MDS stays in "up:replay" state for hours

Added by Heilig IOS over 3 years ago. Updated over 3 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
Performance/Resource Usage
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have 9 nodes Ceph cluster. Ceph version is 15.2.5. The cluster has 175 OSD (HDD) + 3 NVMe for cache tier for "cephfs_data" pool. CephFS pools info:

POOL                    ID  STORED   OBJECTS  USED     %USED  MAX AVAIL
cephfs_data              1  350 TiB  179.53M  350 TiB  66.93     87 TiB
cephfs_metadata          3  3.1 TiB   17.69M  3.1 TiB   1.77     87 TiB

We use multiple active MDS instances: 3 "active" and 3 "standby". Each MDS server has 128GB RAM, "mds cache memory limit" = 64GB.

Failover to a standby MDS instance takes 10-15 hours. CephFS is unreachable for the clients all this time. The MDS instance just stays in "up:replay" state during the failover for all this time.
It looks like MDS demon checking all of the folders:

2020-09-22T02:43:44.406-0700 7f22ae99e700 10 mds.0.journal EOpen.replay 
2020-09-22T02:43:44.406-0700 7f22ae99e700 10 mds.0.journal EMetaBlob.replay 3 dirlumps by unknown.0
2020-09-22T02:43:44.406-0700 7f22ae99e700 10 mds.0.journal EMetaBlob.replay dir 0x300000041c5
2020-09-22T02:43:44.406-0700 7f22ae99e700 10 mds.0.journal EMetaBlob.replay updated dir [dir 0x300000041c5 /repository/files/14/ [2,head] auth v=2070324 cv=0/0 state=1610612737|complete f(v0 m2020-09-10T13:05:29.297254-0700 515=0+515) n(v46584 rc2020-09-21T20:38:49.071043-0700 b3937793650802 1056114=601470+454644) hs=515+0,ss=0+0 dirty=75 | child=1 subtree=0 dirty=1 0x55d4c9359b80]
2020-09-22T02:43:44.406-0700 7f22ae99e700 10 mds.0.journal EMetaBlob.replay for [2,head] had [dentry #0x1/repository/files/14/14119 [2,head] auth (dversion lock) v=2049516 ino=0x30000812e2f state=1073741824 | inodepin=1 0x55db2463a1c0]
2020-09-22T02:43:44.406-0700 7f22ae99e700 10 mds.0.journal EMetaBlob.replay for [2,head] had [inode 0x30000812e2f [...2,head] /repository/files/14/14119/ auth fragtree_t(*^3) v2049516 f(v0 m2020-09-18T10:17:53.379121-0700 13498=0+13498) n(v6535 rc2020-09-19T05:52:25.035403-0700 b272027384385 112669=81992+30677) (iversion lock) | dirfrag=8 0x55db24643000]
2020-09-22T02:43:44.406-0700 7f22ae99e700 10 mds.0.journal EMetaBlob.replay dir 0x30000812e2f.000*
2020-09-22T02:43:44.406-0700 7f22ae99e700 10 mds.0.journal EMetaBlob.replay updated dir [dir 0x30000812e2f.000* /repository/files/14/14119/ [2,head] auth v=77082 cv=0/0 state=1073741824 f(v0 m2020-09-18T10:17:53.371122-0700 1636=0+1636) n(v6535 rc2020-09-19T05:51:18.063949-0700 b33321023818 13707=9986+3721) hs=885+0,ss=0+0 | child=1 0x55db845bf080]
2020-09-22T02:43:44.406-0700 7f22ae99e700 10 mds.0.journal EMetaBlob.replay added (full) [dentry #0x1/repository/files/14/14119/39823 [2,head] auth NULL (dversion lock) v=0 ino=(nil) state=1073741888|bottomlru 0x55d82061a900]

We have millions folders with millions of small files. When the folders/subfolders scan is done, CephFS is active again.
We tried standby-replay and it helps but doesn't eliminate the root cause.

Actions

Also available in: Atom PDF