mds: lost mds journal when hot-standby mds switch occurs
ceph version: jewel 10.2.2
mds mode: hot-standby
There is a risk mds lost some event because it wake up waiters who’s journal hasn’t been flush to disk. Thus would cause at least three errors in my environment.
1. report “dir not empty” when deleting directory
2. report “no such file or directory” when create file
3. can’t list any files or dir when do ls, but report “file exist” when create a old dir.
As the following example：
The above case may be caused by the fllowing time series:
t1: journaler start flushing event 99
t2: journaler start flushing event 100
t3: event 101 and event 102 is appended to write_buf and next_safe_pos move to write_pos_101
t4: the finish flush of event 100 call back and the safe_pos = next_safe_pos_102 due to pending_safe is empty
t5: journaler wake up the waiters of event 99, event 100 and event 101 and response to client (note: the event 99 and event 101 hasn’t be flushed to disk)
t6: hot-standby mds switch occurs and the event 101 and event 99 will be lost
If the lost event is “unlink_local” client will receive a response of unlink file success mistakenly. So when client finish the unlinking of the last file it will do rmdir of the parent dir but the mds report “dir not empty” because the file still exists in mds side.
If the lost event is “mkdir” and client do create file under the dir mds will report “no such file or directory” because the directory didn’t create successfully in mds. Actually, in addition to the above problems, ti can also causes the dir fnode statistics errors which will lead to more problems.