Bug #13167
closedmds: replay gets stuck (on out-of-order journal replies?)
0%
Description
ubuntu-2015-09-17_16:55:52-fs-greg-fs-testing---basic-multi/1061690/ceph-mds.a.log
This MDS went in and out of replay a few times, but it got stuck on the last one. It looks like it already has the data it needs to proceed, but it's gotten stuck at the wait condition of MDLog::_replay_thread even so. I do see that the last entry it processed is the last one before a log object boundary (and the next event seems to cross that boundary?). And the second object read completed first.
Updated by Zheng Yan over 8 years ago
- Status changed from 12 to Duplicate
Write_pos of journal seems to be pointing to somewhere in object 200.00000002, But size of object 200.00000001 is 3139442. It's likely this is another symptom of #13166
Updated by Greg Farnum over 8 years ago
- Status changed from Duplicate to 12
- Priority changed from Urgent to Normal
We should be detecting holes in the journal and shutting down with a nice message or clear assert or something instead of just hanging forever.
Updated by Zheng Yan over 8 years ago
- Status changed from 12 to Fix Under Review
Updated by Greg Farnum over 8 years ago
- Status changed from Fix Under Review to Resolved