Project

General

Profile

Actions

Bug #65094

open

mds STATE_STARTING won't add root ino for root rank and not correctly handle when fails at STATE_STARTING

Added by ethan wu about 1 month ago. Updated about 1 month ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
quincy,reef,squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

root rank doesn't add root inode to its subtree auth when it enters STATE_STARTING,
also it doesn't handle it STATE_STARTING correctly when mds fails or is stopped at STARTING.

This will cause rank damage or ceph_assert failure when mds failover/switchover happens later.

the following are related logs

a.
-1 log_channel(cluster) log [ERR] : No subtrees found for root MDS rank!

b.
-15> 2024-03-24T18:06:19.461+0800 7f1542cbf700 0 mds.0.journal EMetaBlob.replay missing dir ino 0x10000000000
-14> 2024-03-24T18:06:19.461+0800 7f1542cbf700 -1 log_channel(cluster) log [ERR] : failure replaying journal (EMetaBlob)

c.
-6> 2024-03-24T19:39:59.593+0800 7f5903f02700 -1 log_channel(cluster) log [ERR] : replayed ESubtreeMap at 4209845 subtree root 0x1 not in cache


Files

mds.b.log (354 KB) mds.b.log ethan wu, 03/25/2024 12:56 PM
Actions #2

Updated by Venky Shankar about 1 month ago

  • Status changed from New to Fix Under Review
  • Backport set to quincy,reef,squid
  • Pull request ID set to 56429
Actions #3

Updated by ethan wu about 1 month ago

pull request: https://github.com/ceph/ceph/pull/56429

In my cephfs environment, I got mds replay failure log.

2024-03-25T20:47:49.630+0800 7fa83bbe8700 10 mds.0.journal EMetaBlob.replay request client.4317:5 trim_to 5
2024-03-25T20:47:49.630+0800 7fa83bbe8700 10 mds.0.log _replay 4217639~3155 / 4225716 2024-03-25T20:47:35.334606+0800: EUpdate unlink_local [metablob 0x10000000000, 4 dirs]
2024-03-25T20:47:49.630+0800 7fa83bbe8700 10 mds.0.journal EUpdate::replay
2024-03-25T20:47:49.630+0800 7fa83bbe8700 10 mds.0.journal EMetaBlob.replay 4 dirlumps by unknown.0
2024-03-25T20:47:49.630+0800 7fa83bbe8700 10 mds.0.journal EMetaBlob.replay don't have renamed ino 0x10000000003
2024-03-25T20:47:49.630+0800 7fa83bbe8700 10 mds.0.journal EMetaBlob.replay found null dentry in dir 0x10000000001
2024-03-25T20:47:49.630+0800 7fa83bbe8700 10 mds.0.journal EMetaBlob.replay dir 0x10000000000
2024-03-25T20:47:49.630+0800 7fa83bbe8700 0 mds.0.journal EMetaBlob.replay missing dir ino 0x10000000000
2024-03-25T20:47:49.630+0800 7fa83bbe8700 -1 log_channel(cluster) log [ERR] : failure replaying journal (EMetaBlob)
2024-03-25T20:47:49.630+0800 7fa83bbe8700 5 mds.beacon.b set_want_state: up:replay -> down:damaged

After investigating, I found out it's related to mds STARTING state.
mds STATE_STARTING doesn't add ino 0x1 into root rank subtrees, so the all inode under 0x1 got trimmed by
try_trim_nonauth_subtree.

Way to reproduce it:
1. Using vstart.sh to create a cephfs, (but turn off mds_debug_subtrees).
2. mount cephfs
3. mkdir -p ${cephfs_root}/dir1/dir11/foo; mkdir -p ${cephfs_root}/dir1/dir11/bar
4. umount cephfs
5. ./bin/ceph fs set a down true # wait for all mds stop
6. ./bin/ceph fs set a down false
7. mount cephfs
8. rmdir ${cephfs_root}/dir1/dir11/foo; rmdir ${cephfs_root}/dir1/dir11/bar
9. umount cephfs
10. kill rank 0 mds and trigger failover
11. ./bin/ceph fs dump # rank 0 is marked damaged

And during fix the issue, I also found bugs that error handling of STATE_STARTING isn't correct.
1. Take-over mds won't enter STATE_STARTING again when mds fails before STATE_STARTING finishes.
2. Even mds finishes STATE_STARTING and request STATE_ACTIVE, the mds log created at STATE_STARTING didn't get flushed.

Take-over mds will fail at replay assert that subtree should not be empty.

-1 log_channel(cluster) log [ERR] : No subtrees found for root MDS rank!
The subtree map log is not flushed

Actions #4

Updated by Patrick Donnelly about 1 month ago

  • Category set to Correctness/Safety
  • Assignee set to ethan wu
  • Target version set to v20.0.0
  • Source set to Community (dev)
Actions

Also available in: Atom PDF