Bug #47698
mds crashed in try_remove_dentries_for_stray after touching file in strange directory
0%
Description
We had a directory "b1" which appeared empty but could not be rmdir'd.
The directory also had a very large size, also indicating some sort of metadata corruption.
We scrubbed the path and eventually the entire filesystem but it did not find any problems.
Here is the parent:
# rados listomapkeys -p cephfs_metadata 2002261a436.00000000 b1_head # # rados listomapvals -p cephfs_metadata 2002261a436.00000000 b1_head value (462 bytes) : 00000000 02 00 00 00 00 00 00 00 49 0f 06 a3 01 00 00 06 |........I.......| 00000010 a7 61 22 00 02 00 00 00 00 00 00 f3 28 4e 5f 04 |.a".........(N_.| 00000020 ea d1 32 ed 41 00 00 e1 03 00 00 de 03 00 00 01 |..2.A...........| 00000030 00 00 00 00 02 00 00 00 00 00 00 00 02 02 18 00 |................| 00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff |................| 00000050 ff ff ff ff ff ff 00 00 00 00 00 00 00 00 00 00 |................| 00000060 00 00 01 00 00 00 ff ff ff ff ff ff ff ff 00 00 |................| 00000070 00 00 00 00 00 00 00 00 00 00 f3 28 4e 5f 04 ea |...........(N_..| 00000080 d1 32 2b e6 8d 5e 8e 15 7f 3a 00 00 00 00 00 00 |.2+..^...:......| 00000090 00 00 03 02 28 00 00 00 01 00 00 00 00 00 00 00 |....(...........| 000000a0 f3 28 4e 5f 04 ea d1 32 ff ff ff ff ff ff ff ff |.(N_...2........| 000000b0 00 00 00 00 00 00 00 00 24 00 00 00 00 00 00 00 |........$.......| 000000c0 03 02 38 00 00 00 02 00 00 00 00 00 00 00 54 f8 |..8...........T.| 000000d0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff 01 00 |................| 000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000f0 00 00 00 00 00 00 f3 28 4e 5f 04 ea d1 32 03 02 |.......(N_...2..| 00000100 38 00 00 00 02 00 00 00 00 00 00 00 54 f8 ff ff |8...........T...| 00000110 ff ff ff ff ff ff ff ff ff ff ff ff 01 00 00 00 |................| 00000120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000130 00 00 00 00 f3 28 4e 5f 04 ea d1 32 a9 45 00 00 |.....(N_...2.E..| 00000140 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 |................| 00000150 00 00 00 00 9c 05 00 00 00 00 00 00 00 00 00 00 |................| 00000160 00 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff |................| 00000170 00 00 00 00 01 01 10 00 00 00 00 00 00 00 00 00 |................| 00000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a9 2f |.............../| 00000190 00 00 00 00 00 00 48 cb 96 5e 22 f6 ea 26 2b e6 |......H..^"..&+.| 000001a0 8d 5e 8e 15 7f 3a 24 00 00 00 00 00 00 00 ff ff |.^...:$.........| 000001b0 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000001c0 00 00 fe ff ff ff ff ff ff ff 00 00 00 00 |..............| 000001ce
And here was the directory itself:
# rados listomapvals -p cephfs_metadata 2002261a706.00000000 #
Before trying to remove that omap key directly, I thought maybe I should touch a file in `b1` dir.
This crashed the MDS:
-1> 2020-09-30 14:26:44.144 7f3216895700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/mds/CDir.cc: In function 'void CDir::try_remove_dentries_for_stray()' thread 7f3216895700 time 2020-09-30 14:26:44.142112 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/mds/CDir.cc: 758: FAILED ceph_assert(dn->get_linkage()->is_null()) ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7f3226335025] 2: (()+0x25c1ed) [0x7f32263351ed] 3: (CDir::try_remove_dentries_for_stray()+0x259) [0x557075dfad19] 4: (MDCache::clear_dirty_bits_for_stray(CInode*)+0x1c8) [0x557075d24ef8] 5: (StrayManager::_eval_stray(CDentry*)+0x3a2) [0x557075d752f2] 6: (StrayManager::eval_stray(CDentry*)+0x1f) [0x557075d75acf] 7: (Server::_unlink_local_finish(boost::intrusive_ptr<MDRequestImpl>&, CDentry*, CDentry*, unsigned long)+0x3f9) [0x557075c7e2c9] 8: (MDSContext::complete(int)+0x74) [0x557075e8a0e4] 9: (MDSIOContextBase::complete(int)+0x16f) [0x557075e8a33f] 10: (MDSLogContextBase::complete(int)+0x40) [0x557075e8a550] 11: (Finisher::finisher_thread_entry()+0x16f) [0x7f32263be38f] 12: (()+0x7ea5) [0x7f32241ebea5] 13: (clone()+0x6d) [0x7f3222e998dd]
The MDS then does not start, it crashes the same way during MDS clientreplay:
-3> 2020-09-30 14:27:47.203 7f24f0f55700 1 mds.0.80598 clientreplay_start -2> 2020-09-30 14:27:47.232 7f24f0f55700 1 mds.0.80598 still have 0 requests need to be replayed, 1 sessions need to be reclaimed -1> 2020-09-30 14:27:47.299 7f24ebf4b700 -1 /builddir/build/BUILD/ceph-14.2.11/src/mds/CDir.cc: In function 'void CDir::try_remove_dentries_for_stray()' thread 7f24ebf4b700 time 2020-09-30 14:27:47.298616 /builddir/build/BUILD/ceph-14.2.11/src/mds/CDir.cc: 758: FAILED ceph_assert(dn->get_linkage()->is_null()) ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7f24fa472025] 2: (()+0x25c1ed) [0x7f24fa4721ed] 3: (CDir::try_remove_dentries_for_stray()+0x259) [0x559ea7c3dc89] 4: (MDCache::clear_dirty_bits_for_stray(CInode*)+0x1c8) [0x559ea7b67e68] 5: (StrayManager::_eval_stray(CDentry*)+0x3a2) [0x559ea7bb8262] 6: (StrayManager::eval_stray(CDentry*)+0x1f) [0x559ea7bb8a3f] 7: (MDCache::scan_stray_dir(dirfrag_t)+0x134) [0x559ea7b3c694] 8: (MDSContext::complete(int)+0x74) [0x559ea7ccd054] 9: (MDSRank::_advance_queues()+0xa4) [0x559ea7a5b054] 10: (MDSRank::ProgressThread::entry()+0x3d) [0x559ea7a5b6cd] 11: (()+0x7ea5) [0x7f24f8328ea5] 12: (clone()+0x6d) [0x7f24f6fd68dd]
I have started this recovery procedure: https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/
History
#1 Updated by Dan van der Ster almost 3 years ago
After finishing the following, the MDS started:
# cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary Events by type: OPEN: 6242 SESSION: 24 SESSIONS: 7 SUBTREEMAP: 240 UPDATE: 185247 Errors: 0 # cephfs-journal-tool --rank=cephfs:0 journal reset # cephfs-table-tool all reset session
We are running `ceph tell mds.x scrub start / recursive repair` now.
And interestingly, we looked in the broken dir after the fs was back online, and the dir (and it's parent) are gone.
#2 Updated by Dan van der Ster almost 3 years ago
Here is the `b1` dir at the start of this issue:
# ls -l total 0 drwxr-xr-x 1 lemon lemon 1 Sep 30 10:20 60 drwxr-xr-x 1 lemon lemon 18446744073709551615 Sep 1 12:56 b1 drwxr-xr-x 1 lemon lemon 1 Sep 30 10:20 ce # rmdir b1 rmdir: failed to remove ‘b1’: Directory not empty
#3 Updated by Zheng Yan almost 3 years ago
try deleting 'd1' using 'rados rmomapkey'. If you have debug_mds=10, it should be easy to get d1's parent dirfrag (corresponding to omap object)
#4 Updated by Dan van der Ster almost 3 years ago
b1 was not longer there after we followed the recover_dentries procedure, so it is gone.
#5 Updated by Dan van der Ster almost 3 years ago
Now I'm trying to repair the metadata on this fs so it fully consistent.
When I run 'scrub start / force recurstive repair' and 'scrub start ~mdsdir force recursive repair' it finds and corrects some rstats, and complains about dup inodes (in stray and in the fs). But I run those scrubs a few times and eventually the scrubs are clean.
Then I failover the mds to a standby, and the standby complains about dup inodes again, and scrub again reports errors.
The mds log is at ceph-post-file: 1e6b1579-d030-4895-847b-f901f9414399
How do I cleanup this fs?
I've heard in the past about "compare the dup inodes, and rmomapkey for the lower versions" and I've heard about running scan_links.
Should I do either of those now?
#6 Updated by Zheng Yan almost 3 years ago
Dan van der Ster wrote:
Now I'm trying to repair the metadata on this fs so it fully consistent.
When I run 'scrub start / force recurstive repair' and 'scrub start ~mdsdir force recursive repair' it finds and corrects some rstats, and complains about dup inodes (in stray and in the fs). But I run those scrubs a few times and eventually the scrubs are clean.Then I failover the mds to a standby, and the standby complains about dup inodes again, and scrub again reports errors.
The mds log is at ceph-post-file: 1e6b1579-d030-4895-847b-f901f9414399
How do I cleanup this fs?
'cephfs-data-scan scan_links' can fix this error
I've heard in the past about "compare the dup inodes, and rmomapkey for the lower versions" and I've heard about running scan_links.
Should I do either of those now?
#7 Updated by Dan van der Ster almost 3 years ago
Thanks. scan_links fixed these dups (and some Bad nlink). For future reference, it took 2 hours to complete on a cluster with 4.3 million objects in cephfs_metadata.
Do we want to keep this open to work on the original corrupted dir which caused the crash? Or in future we should just `rados rmomapkey` the strange looking directory?