Project

General

Profile

Bug #47843

Updated by Patrick Donnelly over 3 years ago

In multi MDS ceph cluster, first reduce max_mds,before this step is completed, restart one or more MDS immediately. The restarted MDS will remain in the "resolve" or "rejoin" state
The produce steps are as follows:
1)There are 6 active MDS in ceph cluster(0,1,2,3,4,5)
2)set max_mds=3
3)restart mds.0,mds.1,mds.2
4)mds.2 remained in "reslove" state,Logs print:"still waiting for resolves (5)". Because mds.5 has stopped normally at this time.
*Before:*

<pre>

[root@ceph1 ceph]# ceph fs status
cephfs - 0 clients
======
+------+--------+---------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+---------+---------------+-------+-------+
| 0 | active | ceph1-3 | Reqs: 0 /s | 30.3k | 77 |
| 1 | active | ceph2-2 | Reqs: 0 /s | 253 | 13 |
| 2 | active | ceph2 | Reqs: 0 /s | 462 | 13 |
| 3 | active | ceph1-1 | Reqs: 0 /s | 10 | 13 |
| 4 | active | ceph1 | Reqs: 0 /s | 10 | 13 |
| 5 | active | ceph1-4 | Reqs: 0 /s | 0 | 0 |
+------+--------+---------+---------------+-------+-------+
+------+----------+-------+-------+
| Pool | type | used | avail |
+------+----------+-------+-------+
| meta | metadata | 382M | 565G |
| data | data | 2070M | 565G |
+------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
| ceph2-1 |
| ceph2-4 |
| ceph2-3 |
+-------------+

</pre>


*
Aefore reduce max_mds and restart some mds:

<pre>
mds:*
[root@ceph1 ceph]# ceph fs status
cephfs - 0 clients
======
+------+---------+---------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+---------+---------+---------------+-------+-------+
| 0 | rejoin | ceph2-3 | | 30.2k | 14 |
| 1 | rejoin | ceph2-4 | | 244 | 4 |
| 2 | resolve | ceph2-1 | | 454 | 5 |
| 3 | active | ceph1-1 | Reqs: 0 /s | 10 | 13 |
| 4 | active | ceph1 | Reqs: 0 /s | 10 | 13 |
+------+---------+---------+---------------+-------+-------+
+------+----------+-------+-------+
| Pool | type | used | avail |
+------+----------+-------+-------+
| meta | metadata | 382M | 565G |
| data | data | 2070M | 565G |
+------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
| ceph1-4 |
+-------------+
</pre>


*logs:*

<pre>

2020-10-09 16:05:09.894 7fcbf7d79700 1 mds.2.2407 handle_mds_map state change up:replay --> up:resolve
2020-10-09 16:05:09.894 7fcbf7d79700 1 mds.2.2407 resolve_start
2020-10-09 16:05:09.894 7fcbf7d79700 1 mds.2.2407 reopen_log
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.cache rollback_uncommitted_fragments: 0 pending
2020-10-09 16:05:09.894 7fcbf7d79700 7 mds.2.cache set_recovery_set 0,1,3,4,5
2020-10-09 16:05:09.894 7fcbf7d79700 1 mds.2.2407 recovery set is 0,1,3,4,5

2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.2407 resolve set is 2
2020-10-09 16:05:09.894 7fcbf7d79700 7 mds.2.cache set_recovery_set 0,1,3,4,5
2020-10-09 16:05:09.894 7fcbf7d79700 1 mds.2.2407 recovery set is 0,1,3,4,5
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.cache send_slave_resolves
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.cache send_subtree_resolves
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.cache claim 0x102 []
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.cache sending subtee resolve to mds.0
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.cache sending subtee resolve to mds.1
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.cache sending subtee resolve to mds.3
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.cache sending subtee resolve to mds.4
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.cache sending subtee resolve to mds.5

2020-10-09 16:05:50.855 7fcbf7d79700 10 mds.2.2407 resolve set is 0,1,2
2020-10-09 16:05:50.855 7fcbf7d79700 7 mds.2.cache set_recovery_set 0,1,3,4
2020-10-09 16:05:50.855 7fcbf7d79700 1 mds.2.2407 recovery set is 0,1,3,4
2020-10-09 16:05:50.855 7fcbf7d79700 10 mds.2.cache send_slave_resolves
2020-10-09 16:05:50.855 7fcbf7d79700 10 mds.2.cache send_subtree_resolves
2020-10-09 16:05:50.855 7fcbf7d79700 10 mds.2.cache claim 0x102 []
2020-10-09 16:05:50.855 7fcbf7d79700 10 mds.2.cache sending subtee resolve to mds.0
2020-10-09 16:05:50.855 7fcbf7d79700 10 mds.2.cache sending subtee resolve to mds.1
2020-10-09 16:05:50.855 7fcbf7d79700 10 mds.2.cache sending subtee resolve to mds.3
2020-10-09 16:05:50.855 7fcbf7d79700 10 mds.2.cache sending subtee resolve to mds.4

2020-10-09 16:05:50.856 7fcbf7d79700 7 mds.2.cache handle_resolve from mds.1
2020-10-09 16:05:50.856 7fcbf7d79700 10 mds.2.cache maybe_resolve_finish still waiting for resolves (0,3,4,5)
2020-10-09 16:05:50.860 7fcbf7d79700 7 mds.2.cache handle_resolve from mds.0
2020-10-09 16:05:50.860 7fcbf7d79700 10 mds.2.cache maybe_resolve_finish still waiting for resolves (3,4,5)
2020-10-09 16:05:50.885 7fcbf7d79700 7 mds.2.cache handle_resolve from mds.4
2020-10-09 16:05:50.885 7fcbf7d79700 10 mds.2.cache maybe_resolve_finish still waiting for resolves (3,5)
2020-10-09 16:05:50.913 7fcbf7d79700 7 mds.2.cache handle_resolve from mds.3
2020-10-09 16:05:50.914 7fcbf7d79700 10 mds.2.cache maybe_resolve_finish still waiting for resolves (5)
</pre>

Back