Bug #45107
multi-mds: journal can't be flushed
0%
Description
I'm trying to learn MDS and fix https://tracker.ceph.com/issues/45024 . Meanwhile, I found maybe there's a bug in current mechanism of processing slave-update. When mds restart after a crash, all mds only sends its slave-op to the corresponding master, however, master will not iterate its active_reqeust and send what itself knows to slave. So there is a possibility that slave crashes after sending OP_COMMITTED to master. And due to some reason, master doesn't receive the OP_COMMITED message. What's more, slave has trimmed its journal just before crashing, which is possible because there isn't mechanism to prevent slave from trimming its ESlaveUpdate::OP_COMMIT logevent.
The problem can be reproduced as follows:
1.modify source code (master branch), let `Server::handle_slave_request_reply` return before calling mdcache->committed_master_slave(r, from). This can simulate the OP_COMMITTED message missing for master.
2.pin dir /mnt/mds0 to rank-0, dir /mnt/mds1 to rank-1
3.run `touch /mnt/mds0/test`
4.run `ln /mnt/mds0/test /mnt/mds1/test`
5.run `ceph daemon mds.<rank-1-id> flush journal`
Step 5 will blocked. Logs for rank-1 shows that there is a logevent that stucks in waiting.
2020-04-16T13:24:31.239+0800 7f84311cc700 6 mds.1.journal LogSegment(165/4362884).try_to_expire 2020-04-16T13:24:31.239+0800 7f84311cc700 15 mds.1.journal try_to_expire committing [dir 0x10000000001 /mds1/ [2,head] auth{0=2} v=91 cv=0/0 dir_auth=1 state=1611137025|complete|auxsubtree f(v24 m2020-04-16T13:24:17.336304+0800 11=11+0) n(v22 rc2020-04-16T13:24:17.336304+0800) hs=11+0,ss=0+0 dirty=1 | child=1 subtree=1 subtreetemp=0 replicated=1 dirty=1 waiter=0 authpin=0 0x5600b8dd9200] 2020-04-16T13:24:31.240+0800 7f84311cc700 10 mds.1.journal try_to_expire waiting for purge of [] 2020-04-16T13:24:31.240+0800 7f84311cc700 6 mds.1.journal LogSegment(165/4362884).try_to_expire waiting 2020-04-16T13:24:31.240+0800 7f84311cc700 6 mds.1.journal LogSegment(170/4366482).try_to_expire 2020-04-16T13:24:31.240+0800 7f84311cc700 15 mds.1.journal try_to_expire committing [dir 0x10000000001 /mds1/ [2,head] auth{0=2} v=91 cv=91/0 dir_auth=1 ap=1+0 state=1611137057|complete|auxsubtree|committing f(v24 m2020-04-16T13:24:17.336304+0800 11=11+0) n(v22 rc2020-04-16T13:24:17.336304+0800) hs=11+0,ss=0+0 dirty=1 | child=1 subtree=1 subtreetemp=0 replicated=1 dirty=1 waiter=0 authpin=1 0x5600b8dd9200] 2020-04-16T13:24:31.240+0800 7f84311cc700 10 mds.1.journal try_to_expire waiting for slaves to ack commit on client.54755:134 2020-04-16T13:24:31.240+0800 7f84311cc700 10 mds.1.journal try_to_expire saving sessionmap, need 18, committed is 17 (17) 2020-04-16T13:24:31.240+0800 7f84311cc700 10 mds.1.journal try_to_expire waiting for purge of [] 2020-04-16T13:24:31.240+0800 7f84311cc700 6 mds.1.journal LogSegment(170/4366482).try_to_expire waiting 2020-04-16T13:24:31.241+0800 7f84329cf700 6 mds.1.journal LogSegment(165/4362884).try_to_expire 2020-04-16T13:24:31.241+0800 7f84329cf700 10 mds.1.journal try_to_expire waiting for purge of [] 2020-04-16T13:24:31.241+0800 7f84329cf700 6 mds.1.journal LogSegment(165/4362884).try_to_expire success
History
#1 Updated by Xinying Song almost 4 years ago
The simulation is wrong. Restart MDS will work.
Please close this issue.
#2 Updated by Zheng Yan almost 4 years ago
- Status changed from New to Closed