Bug #46976
After restarting an mds, its standy-replay mds remained in the "resolve" state
0%
Description
In multimds and standy-replay enabled Ceph cluster,after reduce a filesystem mds num and restart an active mds, its standy-replay mds didn't enter into active state and remained in the "resolve" state. This issue can reproduce by the following steps:
1.ceph fs set cephfs max_mds 6
2.ceph fs set cephfs allow_standby_replay true
3.ceph fs set cephfs max_mds 5 //reduce mds num
4.waiting for mds num reduce success
5.restart any active mds
6.ceph fs status
[root@host-192-168-10-241 ~]# ceph fs status--------------------+----------------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |--------------------+----------------------+---------------+-------+-------+
| 0 | resolve | host-192-168-5-105-4 | | 40.6k | 40.6k |
| 1 | rejoin | host-192-168-5-101-9 | | 19.5k | 19.5k |
| 2 | active | host-192-168-5-105-2 | Reqs: 0 /s | 55.6k | 55.6k |
| 3 | active | host-192-168-5-101-3 | Reqs: 0 /s | 32.2k | 32.2k |
| 4 | active | host-192-168-5-105-1 | Reqs: 0 /s | 16.4k | 16.4k |
| 4-s | standby-replay | host-192-168-5-104-2 | Evts: 0 /s | 7527 | 7530 |--------------------+----------------------+---------------+-------+-------+
log:
2020-08-13 08:50:48.901 7f223e02c700 10 mds.host-192-168-5-105-4 handle_mds_map: handling map as rank 0
2020-08-13 08:50:48.901 7f223e02c700 7 mds.0.tableserver(snaptable) handle_mds_recovery mds.1
2020-08-13 08:50:48.901 7f223e02c700 10 mds.0.4109 resolve set is 0,1
2020-08-13 08:50:48.901 7f223e02c700 7 mds.0.cache set_recovery_set 1,2,3,4
2020-08-13 08:50:48.901 7f223e02c700 1 mds.0.4109 recovery set is 1,2,3,4
2020-08-13 08:50:48.901 7f223e02c700 10 mds.0.cache send_slave_resolves
2020-08-13 08:50:48.901 7f223e02c700 10 mds.0.cache send_subtree_resolves
2020-08-13 08:50:48.901 7f223e02c700 10 mds.0.cache claim 0x1 [0x100010e40e4.000001*,0x100010e40e4.000010*,0x100010e40e4.000011*,0x100010e40e4.000000*,0x100010e40e4.000100*,0x100010e40e4.101101*,0x100010e40e4.101111*,0x100010e40e4.111111*,0x100010e40e4.000111*,0x100010e40e4.001000*,0x100010e40e4.001010*,0x100010e40e4.001011*,0x100010e40e4.001101*,0x100010e40e4.010000*,0x100010e40e4.010010*,0x100010e40e4.010101*,0x100010e40e4.010110*,0x100010e40e4.011001*,0x100010e40e4.011010*,0x100010e40e4.011011*,0x100010e40e4.011100*,0x100010e40e4.011110*,0x100010e40e4.100000*,0x100010e40e4.100001*,0x100010e40e4.100011*,0x100010e40e4.100110*,0x100010e40e4.100111*,0x100010e40e4.101001*,0x100010e40e4.101011*,0x100010e40e4.101100*,0x100010e40e4.101110*,0x100010e40e4.110000*,0x100010e40e4.110011*,0x100010e40e4.110110*,0x100010e40e4.110111*,0x100010e40e4.111000*,0x100010e40e4.111001*,0x100010e40e4.111110*]
2020-08-13 08:50:48.901 7f223e02c700 10 mds.0.cache claim 0x100 []
2020-08-13 08:50:48.901 7f223e02c700 10 mds.0.cache sending subtee resolve to mds.1
2020-08-13 08:50:48.901 7f223e02c700 10 mds.0.cache sending subtee resolve to mds.2
2020-08-13 08:50:48.901 7f223e02c700 10 mds.0.cache sending subtee resolve to mds.3
2020-08-13 08:50:48.901 7f223e02c700 10 mds.0.cache sending subtee resolve to mds.4
2020-08-13 08:50:48.902 7f223e02c700 7 mds.0.cache handle_resolve from mds.1
2020-08-13 08:50:48.902 7f223e02c700 10 mds.0.cache maybe_resolve_finish still waiting for resolves (2,3,4,5)
2020-08-13 08:50:48.902 7f223e02c700 7 mds.0.cache handle_resolve from mds.4
......
2020-08-13 08:50:48.904 7f223e02c700 10 mds.0.cache maybe_resolve_finish still waiting for resolves (5)
Related issues
History
#1 Updated by Zheng Yan over 3 years ago
- Assignee set to Zheng Yan
#2 Updated by Zheng Yan over 3 years ago
MDSRank::calc_recovery_set() should be called by MDSRank::resolve_start
#3 Updated by Zheng Yan over 3 years ago
- Status changed from New to Fix Under Review
- Assignee deleted (
Zheng Yan) - Pull request ID set to 36632
#4 Updated by Patrick Donnelly over 3 years ago
- Assignee set to wei qiaomiao
- Target version set to v16.0.0
- Source set to Community (dev)
- Backport set to octopus,nautilus
- Component(FS) MDS added
#5 Updated by Patrick Donnelly over 3 years ago
- Status changed from Fix Under Review to Pending Backport
#6 Updated by Nathan Cutler over 3 years ago
- Copied to Backport #47089: octopus: After restarting an mds, its standy-replay mds remained in the "resolve" state added
#7 Updated by Nathan Cutler over 3 years ago
- Copied to Backport #47090: nautilus: After restarting an mds, its standy-replay mds remained in the "resolve" state added
#8 Updated by Nathan Cutler over 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".