Project

General

Profile

Bug #24400

CephFS - All MDS went offline and required repair of filesystem

Added by James Wilkins 7 months ago. Updated 6 months ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
06/04/2018
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:

Description

Hi,

Raising this incase we can get some more insight and/or it helps others.

We have a 12.2.5 cluster provising CephFS services to a container based platform - 3 MDS - 1 active, 1 standby-replay and 1 standby.

Client is Kernel-cephfs with version 4.16.13-1 - approx 50 nodes mount the filesystem.

on 3rd June @ 09:50 all MDS went offline and refused to start with the following error being recorded

2018-06-03 09:51:23.345270 7f824ec12700 1 mds.0.518759 handle_mds_map i am now mds.0.518759
2018-06-03 09:51:23.345272 7f824ec12700 1 mds.0.518759 handle_mds_map state change up:rejoin --> up:active
2018-06-03 09:51:23.345290 7f824ec12700 1 mds.0.518759 recovery_done -- successful recovery!
2018-06-03 09:51:23.345486 7f824ec12700 1 mds.0.518759 active_start
2018-06-03 09:51:24.549909 7f824ec12700 1 mds.0.518759 cluster recovered.
2018-06-03 09:51:24.640410 7f824ec12700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/mds/CDir.cc: In function 'void CDir::try_remove_dentries_for_stray()' thread 7f824ec12700 time 2018-06-03 09:51:24.635960
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/mds/CDir.cc: 724: FAILED assert(dn->get_linkage()->is_null())

ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x562c335eb8e0]
2: (CDir::try_remove_dentries_for_stray()+0x2c0) [0x562c334ac670]
3: (MDCache::clear_dirty_bits_for_stray(CInode*)+0x173) [0x562c333749b3]
4: (StrayManager::_eval_stray(CDentry*, bool)+0x8b3) [0x562c33426bf3]
5: (StrayManager::eval_stray(CDentry*, bool)+0x1e) [0x562c3342715e]
6: (MDCache::scan_stray_dir(dirfrag_t)+0x135) [0x562c33374635]
7: (MDSInternalContextBase::complete(int)+0x1eb) [0x562c335364db]
8: (MDSRank::_advance_queues()+0x4a5) [0x562c332cf845]
9: (MDSRank::_dispatch(Message*, bool)+0x4b6) [0x562c332d02d6]
10: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x562c332d0e45]
11: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x562c332b93b3]
12: (DispatchQueue::entry()+0x792) [0x562c338d3ed2]
13: (DispatchQueue::DispatchThread::entry()+0xd) [0x562c3367195d]
14: (()+0x7e25) [0x7f8253c51e25]
15: (clone()+0x6d) [0x7f8252d3434d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

At first we assumed it may have been related to the 12.2.5 update (applied Friday) so we downgraded to 12.2.4 but still had issues. I've attached the "dump of recent events" from each MDS log prior to crash. If more info is needed please ask but the logs were quite large so wanted to shrunk them down

In order to return service we had to do the following

  1. cephfs-journal-tool event recover_dentries summary

Events by type:
OPEN: 238657
SUBTREEMAP: 318
UPDATE: 11807
Errors: 0

  1. cephfs-journal-tool journal reset
    old journal was 305837915794261~945902859
    new journal start will be 305838862368768 (671648 bytes past old end)
    writing journal head
    writing EResetJournal entry
    done
  1. cephfs-table-tool all reset session {
    "0": {
    "data": {},
    "result": 0
    }
    }

This allowed the MDS to start and us to return service. We have an online data-scan running as per the DR docs.

Any input appreciated to helping understand what happened.

CephMDS01 Log -> https://files.ashung.com/s/t4qjM9AnF6SKoMD
CephMDS02 Log -> https://files.ashung.com/s/bYEifSzeCRKPB3Q
CephMDS03 Log -> https://files.ashung.com/s/SzjatjFXEeHaiHM

(used external file host as limit on file-size)

History

#1 Updated by James Wilkins 7 months ago

Forgot to say - one of the logs was taken with debug enabled (thus the size). Can provide whole log if needed

#2 Updated by Patrick Donnelly 7 months ago

  • Assignee set to Zheng Yan
  • Target version set to v14.0.0
  • Source set to Community (user)
  • Backport set to mimic,luminous
  • Component(FS) MDS added

#3 Updated by Zheng Yan 7 months ago

do you have mds log just before the crash

#4 Updated by James Wilkins 7 months ago

Zheng Yan wrote:

do you have mds log just before the crash

Excellent timing - we've just finished trawling through the debug log

this looks like the pertitent part - shout if you need more - i snipped the last 10000 events and only included the last 10 assuming lowest was closest to the assert()

2018-06-03 13:21:21.650653 7f62df4a6700 10 mds.0.cache.strays  inode is [inode 0x1001798ed04 [...2,head] ~mds0/stray9/1001798ed04/ auth v1437393752 dirtyparent f(v0 m2018-06-03 09:50:44.369279) n(v0 rc2018-06-03 09:50:44.369279 1=0+1) (iversion lock) | dirfrag=1 dirtyparent=1 dirty=0 0x562a92814000]
2018-06-03 13:21:21.650658 7f62df4a6700 10 mds.0.cache clear_dirty_bits_for_stray [inode 0x1001798ed04 [...2,head] ~mds0/stray9/1001798ed04/ auth v1437393752 dirtyparent f(v0 m2018-06-03 09:50:44.369279) n(v0 rc2018-06-03 09:50:44.369279 1=0+1) (iversion lock) | dirfrag=1 dirtyparent=1 dirty=0 0x562a92814000]
2018-06-03 13:21:21.650667 7f62df4a6700 10 mds.0.cache.dir(0x1001798ed04) try_remove_dentries_for_stray
2018-06-03 13:21:21.650669 7f62df4a6700 10 mds.0.cache.den(0x1001798ed04 mage---403_DB_PDO_MYSQL_DDL_nikko_log_visitor_info_1)  mark_clean [dentry #0x100/stray9/1001798ed04/mage---403_DB_PDO_MYSQL_DDL_nikko_log_visitor_info_1 [2,head] auth NULL (dversion lock) v=10 inode=0 state=1610612800|bottomlru | dirty=1 0x562a38e01880]
2018-06-03 13:21:21.650674 7f62df4a6700 10 mds.0.cache.den(0x1001798ed04 mage---internal-metadatas---403_DB_PDO_MYSQL_DDL_nikko_log_visitor_info_1)  mark_clean [dentry #0x100/stray9/1001798ed04/mage---internal-metadatas---403_DB_PDO_MYSQL_DDL_nikko_log_visitor_info_1 [2,head] auth NULL (dversion lock) v=12 inode=0 state=1610612800|bottomlru | dirty=1 0x562a38e01500]
2018-06-03 13:21:21.650679 7f62df4a6700 10 mds.0.cache.strays eval_stray [dentry #0x100/stray9/10017719499 [2,head] auth (dversion lock) pv=0 v=1437393796 inode=0x562a6a9fbc00 state=1342177296 0x562a959f88c0]
2018-06-03 13:21:21.650681 7f62df4a6700 10 mds.0.cache.strays  inode is [inode 0x10017719499 [...2,head] ~mds0/stray9/10017719499/ auth v1422665957 f(v0 m2018-05-30 22:29:04.321569) n(v0 rc2018-05-30 22:29:04.321569 1=0+1) (iversion lock) 0x562a6a9fbc00]
2018-06-03 13:21:21.650687 7f62df4a6700 10 mds.0.cache clear_dirty_bits_for_stray [inode 0x10017719499 [...2,head] ~mds0/stray9/10017719499/ auth v1422665957 f(v0 m2018-05-30 22:29:04.321569) n(v0 rc2018-05-30 22:29:04.321569 1=0+1) (iversion lock) 0x562a6a9fbc00]
2018-06-03 13:21:21.650692 7f62df4a6700 10 mds.0.cache.strays eval_stray [dentry #0x100/stray9/100171bf5cd [2,head] auth (dversion lock) pv=0 v=1437393796 inode=0x562a6a9fee00 state=1342177296 0x562a959f9500]
2018-06-03 13:21:21.650694 7f62df4a6700 10 mds.0.cache.strays  inode is [inode 0x100171bf5cd [...2,head] ~mds0/stray9/100171bf5cd/ auth v1421544669 f(v0 m2018-05-28 08:22:39.948913) n(v0 rc2018-05-28 08:22:39.948913 1=0+1) (iversion lock) 0x562a6a9fee00]
2018-06-03 13:21:21.650699 7f62df4a6700 10 mds.0.cache clear_dirty_bits_for_stray [inode 0x100171bf5cd [...2,head] ~mds0/stray9/100171bf5cd/ auth v1421544669 f(v0 m2018-05-28 08:22:39.948913) n(v0 rc2018-05-28 08:22:39.948913 1=0+1) (iversion lock) 0x562a6a9fee00]
2018-06-03 13:21:21.650703 7f62df4a6700 10 mds.0.cache.strays eval_stray [dentry #0x100/stray9/10000bb4ff5 [2,head] auth (dversion lock) v=1437393771 inode=0x5629ea457c00 state=1342177296 | inodepin=1 dirty=0 0x562a53de3880]
2018-06-03 13:21:21.650706 7f62df4a6700 10 mds.0.cache.strays  inode is [inode 0x10000bb4ff5 [...2,head] ~mds0/stray9/10000bb4ff5/ auth v1437393771 dirtyparent f(v0 m2018-06-03 09:50:42.620765) n(v3 rc2018-06-03 09:50:42.620765 b-1142088 1=0+1) (iversion lock) | dirfrag=1 dirtyparent=1 dirty=0 0x5629ea457c00]
2018-06-03 13:21:21.650715 7f62df4a6700 10 mds.0.cache clear_dirty_bits_for_stray [inode 0x10000bb4ff5 [...2,head] ~mds0/stray9/10000bb4ff5/ auth v1437393771 dirtyparent f(v0 m2018-06-03 09:50:42.620765) n(v3 rc2018-06-03 09:50:42.620765 b-1142088 1=0+1) (iversion lock) | dirfrag=1 dirtyparent=1 dirty=0 0x5629ea457c00]
2018-06-03 13:21:21.650721 7f62df4a6700 10 mds.0.cache.dir(0x10000bb4ff5) try_remove_dentries_for_stray
2018-06-03 13:21:21.653165 7f62df4a6700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/mds/CDir.cc: In function 'void CDir::try_remove_dentries_for_stray()' thread 7f62df4a6700 time 2018-06-03 13:21:21.650724
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/mds/CDir.cc: 728: FAILED assert(dn->get_linkage()->is_null())

 ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x5629dbb479f0]
 2: (CDir::try_remove_dentries_for_stray()+0x2c8) [0x5629dba11328]
 3: (MDCache::clear_dirty_bits_for_stray(CInode*)+0x176) [0x5629db8db1e6]
 4: (StrayManager::_eval_stray(CDentry*, bool)+0x8b3) [0x5629db98a123]
 5: (StrayManager::eval_stray(CDentry*, bool)+0x1e) [0x5629db98a69e]
 6: (MDCache::scan_stray_dir(dirfrag_t)+0x135) [0x5629db8dae65]
 7: (MDSInternalContextBase::complete(int)+0x1eb) [0x5629dba91eab]
 8: (MDSRank::_advance_queues()+0x4a5) [0x5629db838405]
 9: (MDSRank::ProgressThread::entry()+0x4a) [0x5629db83886a]
 10: (()+0x7e25) [0x7f62e8cf6e25]
 11: (clone()+0x6d) [0x7f62e7dd6bad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
**snip**

   -10> 2018-06-03 13:21:21.650703 7f62df4a6700 10 mds.0.cache.strays eval_stray [dentry #0x100/stray9/10000bb4ff5 [2,head] auth (dversion lock) v=1437393771 inode=0x5629ea457c00 state=1342177296 | inodepin=1 dirty=0 0x562a53de3880]
    -9> 2018-06-03 13:21:21.650706 7f62df4a6700 10 mds.0.cache.strays  inode is [inode 0x10000bb4ff5 [...2,head] ~mds0/stray9/10000bb4ff5/ auth v1437393771 dirtyparent f(v0 m2018-06-03 09:50:42.620765) n(v3 rc2018-06-03 09:50:42.620765 b-1142088 1=0+1) (iversion lock) | dirfrag=1 dirtyparent=1 dirty=0 0x5629ea457c00]
    -8> 2018-06-03 13:21:21.650715 7f62df4a6700 10 mds.0.cache clear_dirty_bits_for_stray [inode 0x10000bb4ff5 [...2,head] ~mds0/stray9/10000bb4ff5/ auth v1437393771 dirtyparent f(v0 m2018-06-03 09:50:42.620765) n(v3 rc2018-06-03 09:50:42.620765 b-1142088 1=0+1) (iversion lock) | dirfrag=1 dirtyparent=1 dirty=0 0x5629ea457c00]
    -7> 2018-06-03 13:21:21.650721 7f62df4a6700 10 mds.0.cache.dir(0x10000bb4ff5) try_remove_dentries_for_stray
    -6> 2018-06-03 13:21:21.650898 7f62e5c2e700  5 -- 192.168.5.201:6800/244869813 >> 192.168.5.15:6824/3508 conn(0x5629f1645800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=37 cs=1 l=1). rx osd.101 seq 8 0x562ae3ef38c0 osd_op_reply(1129 1001714d7ec.00000030 [stat] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8
    -5> 2018-06-03 13:21:21.650924 7f62e5c2e700  1 -- 192.168.5.201:6800/244869813 <== osd.101 192.168.5.15:6824/3508 8 ==== osd_op_reply(1129 1001714d7ec.00000030 [stat] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 164+0+0 (249080180 0 0) 0x562ae3ef38c0 con 0x5629f1645800
    -4> 2018-06-03 13:21:21.650983 7f62ddca3700  1 -- 192.168.5.201:6800/244869813 --> 192.168.5.5:6811/1126695 -- osd_op(unknown.0.518997:1131 1.1e8 1:17e31599:::1001714d7ec.0000002f:head [stat] snapc 0=[] ondisk+read+rwordered+known_if_redirected+full_force e390970) v8 -- 0x562a26457080 con 0
    -3> 2018-06-03 13:21:21.651519 7f62e6c30700  5 -- 192.168.5.201:6800/244869813 >> 192.168.5.5:6811/1126695 conn(0x5629f167a000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=61 cs=1 l=1). rx osd.76 seq 20 0x562a26457080 osd_op_reply(1131 1001714d7ec.0000002f [stat] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8
    -2> 2018-06-03 13:21:21.651539 7f62e6c30700  1 -- 192.168.5.201:6800/244869813 <== osd.76 192.168.5.5:6811/1126695 20 ==== osd_op_reply(1131 1001714d7ec.0000002f [stat] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 164+0+0 (235418965 0 0) 0x562a26457080 con 0x5629f167a000
    -1> 2018-06-03 13:21:21.651582 7f62ddca3700  1 -- 192.168.5.201:6800/244869813 --> 192.168.5.6:6824/1119477 -- osd_op(unknown.0.518997:1132 1.34 1:2c6e2dbb:::1001714d7ec.0000002e:head [stat] snapc 0=[] ondisk+read+rwordered+known_if_redirected+full_force e390970) v8 -- 0x562a46ecab00 con 0
     0> 2018-06-03 13:21:21.653165 7f62df4a6700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/mds/CDir.cc: In function 'void CDir::try_remove_dentries_for_stray()' thread 7f62df4a6700 time 2018-06-03 13:21:21.650724
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/mds/CDir.cc: 728: FAILED assert(dn->get_linkage()->is_null())

#5 Updated by Zheng Yan 7 months ago

do have have full log (the time mds started replay to mds crash). thanks

#6 Updated by James Wilkins 7 months ago

Zheng Yan wrote:

do have have full log (the time mds started replay to mds crash). thanks

Full MDS log starting at MDS launch where replay begins to end with crash.

https://files.ashung.com/s/7BR4P3dT3pGkgee/download

#7 Updated by Zheng Yan 7 months ago

there are lots of inodes have incorrect dirstat/rstat. have you ever run 'journal reset' before the crash

#8 Updated by James Wilkins 7 months ago

Zheng Yan wrote:

there are lots of inodes have incorrect dirstat/rstat. have you ever run 'journal reset' before the crash

No - this was only run as part of recovery as we wern't aware of the command until we did some reading.

Cluster started as a 10.X cluster and was upgraded to 12.X back in May

#9 Updated by Zheng Yan 7 months ago

http://tracker.ceph.com/issues/17177 can explain this issue. full filesystem scrub should repair incorrect dirstat/rstat.

#10 Updated by Zheng Yan 6 months ago

  • Status changed from New to Can't reproduce

reopen this ticket if you encounter this issue again

Also available in: Atom PDF