Ceph : Issues
https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2020-10-13T08:01:55Z
Ceph
Redmine
CephFS - Bug #47843 (Fix Under Review): mds: stuck in resolve when restarting MDS and reducing ma...
https://tracker.ceph.com/issues/47843
2020-10-13T08:01:55Z
wei qiaomiao
wei.qiaomiao@zte.com.cn
<p>In multi MDS ceph cluster, first reduce max_mds,before this step is completed, restart one or more MDS immediately. The restarted MDS will remain in the "resolve" or "rejoin" state<br />The produce steps are as follows:<br />1)There are 6 active MDS in ceph cluster(0,1,2,3,4,5)<br />2)set max_mds=3<br />3)restart mds.0,mds.1,mds.2<br />4)mds.2 remained in "reslove" state,Logs print:"still waiting for resolves (5)". Because mds.5 has stopped normally at this time.<br /><strong>Before:</strong></p>
<pre>
[root@ceph1 ceph]# ceph fs status
cephfs - 0 clients
======
+------+--------+---------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+---------+---------------+-------+-------+
| 0 | active | ceph1-3 | Reqs: 0 /s | 30.3k | 77 |
| 1 | active | ceph2-2 | Reqs: 0 /s | 253 | 13 |
| 2 | active | ceph2 | Reqs: 0 /s | 462 | 13 |
| 3 | active | ceph1-1 | Reqs: 0 /s | 10 | 13 |
| 4 | active | ceph1 | Reqs: 0 /s | 10 | 13 |
| 5 | active | ceph1-4 | Reqs: 0 /s | 0 | 0 |
+------+--------+---------+---------------+-------+-------+
+------+----------+-------+-------+
| Pool | type | used | avail |
+------+----------+-------+-------+
| meta | metadata | 382M | 565G |
| data | data | 2070M | 565G |
+------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
| ceph2-1 |
| ceph2-4 |
| ceph2-3 |
+-------------+
</pre>
<p>Aefore reduce max_mds and restart some mds:</p>
<pre>
[root@ceph1 ceph]# ceph fs status
cephfs - 0 clients
======
+------+---------+---------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+---------+---------+---------------+-------+-------+
| 0 | rejoin | ceph2-3 | | 30.2k | 14 |
| 1 | rejoin | ceph2-4 | | 244 | 4 |
| 2 | resolve | ceph2-1 | | 454 | 5 |
| 3 | active | ceph1-1 | Reqs: 0 /s | 10 | 13 |
| 4 | active | ceph1 | Reqs: 0 /s | 10 | 13 |
+------+---------+---------+---------------+-------+-------+
+------+----------+-------+-------+
| Pool | type | used | avail |
+------+----------+-------+-------+
| meta | metadata | 382M | 565G |
| data | data | 2070M | 565G |
+------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
| ceph1-4 |
+-------------+
</pre>
<p><strong>logs:</strong></p>
<pre>
2020-10-09 16:05:09.894 7fcbf7d79700 1 mds.2.2407 handle_mds_map state change up:replay --> up:resolve
2020-10-09 16:05:09.894 7fcbf7d79700 1 mds.2.2407 resolve_start
2020-10-09 16:05:09.894 7fcbf7d79700 1 mds.2.2407 reopen_log
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.cache rollback_uncommitted_fragments: 0 pending
2020-10-09 16:05:09.894 7fcbf7d79700 7 mds.2.cache set_recovery_set 0,1,3,4,5
2020-10-09 16:05:09.894 7fcbf7d79700 1 mds.2.2407 recovery set is 0,1,3,4,5
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.2407 resolve set is 2
2020-10-09 16:05:09.894 7fcbf7d79700 7 mds.2.cache set_recovery_set 0,1,3,4,5
2020-10-09 16:05:09.894 7fcbf7d79700 1 mds.2.2407 recovery set is 0,1,3,4,5
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.cache send_slave_resolves
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.cache send_subtree_resolves
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.cache claim 0x102 []
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.cache sending subtee resolve to mds.0
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.cache sending subtee resolve to mds.1
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.cache sending subtee resolve to mds.3
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.cache sending subtee resolve to mds.4
2020-10-09 16:05:09.894 7fcbf7d79700 10 mds.2.cache sending subtee resolve to mds.5
2020-10-09 16:05:50.855 7fcbf7d79700 10 mds.2.2407 resolve set is 0,1,2
2020-10-09 16:05:50.855 7fcbf7d79700 7 mds.2.cache set_recovery_set 0,1,3,4
2020-10-09 16:05:50.855 7fcbf7d79700 1 mds.2.2407 recovery set is 0,1,3,4
2020-10-09 16:05:50.855 7fcbf7d79700 10 mds.2.cache send_slave_resolves
2020-10-09 16:05:50.855 7fcbf7d79700 10 mds.2.cache send_subtree_resolves
2020-10-09 16:05:50.855 7fcbf7d79700 10 mds.2.cache claim 0x102 []
2020-10-09 16:05:50.855 7fcbf7d79700 10 mds.2.cache sending subtee resolve to mds.0
2020-10-09 16:05:50.855 7fcbf7d79700 10 mds.2.cache sending subtee resolve to mds.1
2020-10-09 16:05:50.855 7fcbf7d79700 10 mds.2.cache sending subtee resolve to mds.3
2020-10-09 16:05:50.855 7fcbf7d79700 10 mds.2.cache sending subtee resolve to mds.4
2020-10-09 16:05:50.856 7fcbf7d79700 7 mds.2.cache handle_resolve from mds.1
2020-10-09 16:05:50.856 7fcbf7d79700 10 mds.2.cache maybe_resolve_finish still waiting for resolves (0,3,4,5)
2020-10-09 16:05:50.860 7fcbf7d79700 7 mds.2.cache handle_resolve from mds.0
2020-10-09 16:05:50.860 7fcbf7d79700 10 mds.2.cache maybe_resolve_finish still waiting for resolves (3,4,5)
2020-10-09 16:05:50.885 7fcbf7d79700 7 mds.2.cache handle_resolve from mds.4
2020-10-09 16:05:50.885 7fcbf7d79700 10 mds.2.cache maybe_resolve_finish still waiting for resolves (3,5)
2020-10-09 16:05:50.913 7fcbf7d79700 7 mds.2.cache handle_resolve from mds.3
2020-10-09 16:05:50.914 7fcbf7d79700 10 mds.2.cache maybe_resolve_finish still waiting for resolves (5)
</pre>