Bug #9427
osdc/Journaler.cc: 405: FAILED assert(last_written.write_pos >= last_written.expire_pos)
0%
Description
2014-09-10 20:44:59.843880 7fc438331700 -1 osdc/Journaler.cc: In function 'void Journaler::_write_head(Context*)' thread 7fc438331700 time 2014-09-10 20:44:59.792758 osdc/Journaler.cc: 405: FAILED assert(last_written.write_pos >= last_written.expire_pos) ceph version 0.85-669-gc050afd (c050afdbe01e4204de988ef2dba28dd4ba1c34e2) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x90e9bf] 2: (Journaler::_write_head(Context*)+0x82f) [0x7b8d2f] 3: (Journaler::_flush(C_OnFinisher*)+0x118) [0x7bb508] 4: (Journaler::flush(Context*)+0x3d) [0x7bb9ad] 5: (MDLog::_submit_thread()+0x723) [0x79cfa3] 6: (MDLog::SubmitThread::entry()+0xd) [0x5a09cd] 7: (()+0x7e9a) [0x7fc442f27e9a] 8: (clone()+0x6d) [0x7fc441adc3fd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
on burnupi21
History
#1 Updated by Sage Weil about 9 years ago
2014-09-10 21:22:51.698712 7f412cd8a700 10 mds.0.journaler(rw) append_entry len 3025 to 135243619~3045 2014-09-10 21:22:51.698728 7f412cd8a700 20 mds.0.journaler(rw) flush not delaying flush 2014-09-10 21:22:51.698730 7f412cd8a700 10 mds.0.journaler(rw) _do_flush flushing 135243619~3045 2014-09-10 21:22:51.698823 7f412cd8a700 10 mds.0.journaler(rw) _do_flush (prezeroing/prezero)/write/flush/safe pointers now at (159383552/159383552)/135246664/135246664/135243619 2014-09-10 21:22:51.698826 7f412cd8a700 20 mds.0.journaler(rw) _issue_prezero target 159383552 <= prezeroing_pos 159383552 2014-09-10 21:22:51.698834 7f412cd8a700 10 mds.0.journaler(rw) write_head loghead(trim 4194304, expire 1875432704405, write 135243619, stream_format 1)
expire_pos didn't get reset during jouranl upgrade?
#2 Updated by Sage Weil about 9 years ago
ESubtreeMap has an expire_pos field, and we set it in ESubtreeMap::replay() if it is > the current expire pos. I think the journal rewrite needs to adjust this value, or we need to guard it with a journal version or something. Or drop it entirely...
#3 Updated by Sage Weil about 9 years ago
- Project changed from Ceph to CephFS
wip-mds has hacky workaround
#4 Updated by John Spray about 9 years ago
The history:
commit 6337842c464ecb9055683270a1faaac313b93257 Author: Greg Farnum <gregf@hq.newdream.net> Date: Tue Nov 30 14:00:32 2010 -0800 mds: add expire_pos to the ESubtreeMap. This will allow more efficient trimming during standby_replay. Signed-off-by: Greg Farnum <gregf@hq.newdream.net> commit d496b65144739d4daf716d81d4ac517845e701df Author: Greg Farnum <gregf@hq.newdream.net> Date: Tue Feb 15 09:00:06 2011 -0800 Journaler: add some checks for expire_pos. I don't think these are necessary checks, but the expire_pos >= trim_pos invariant got broken somehow by johnl, and these checks won't hurt! Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
I think that now that the journal encoding format is somewhat changeable, having absolute references to the serialized positions inside the journal entries themselves
is bad news -- we could zero them out during the reformat, or we could drop this field to make our lives simpler in future.
Looking into this, I'm a bit concerned that EMetaBlob::truncate_finish might have a similar problem.
#5 Updated by John Spray about 9 years ago
So rewriting the truncate_finish part isn't too hard if we want to do that:
https://github.com/ceph/ceph/commit/4ae6bca4d497e4e5bd302a3a013d62624ad87a52
(and can do a similar but simpler thing to zero out the expire_pos bit).
Greg: I don't understand what the original benefit is to feeding the expire_pos value here to stand by MDSs, could we get the same benefit by using a log segment sequence number here instead of a raw log position?
#6 Updated by Greg Farnum about 9 years ago
It doesn't need to be an absolute offset that gets fed to the standby-replay MDS, as long as it can use the information to expire old log segments and trim its cache. I think we added the expire_pos value just because they might not catch all the way up and get a chance to re-read the journal header.
(Alternative solution: make the journaled a little more complicated so that the standby-replay mds actually reads the header more often than just after running out of journal to read.)
#7 Updated by Sage Weil about 9 years ago
- Assignee set to John Spray
#8 Updated by John Spray about 9 years ago
- Status changed from New to Fix Under Review
#9 Updated by John Spray about 9 years ago
For posterity, the manual test procedure for the changes to rewrite that fix this issue:
The manual test procedure for this, with a vstart cluster is: * Create a fresh filesystem with firefly * Mount the FS and generate a decent number of operations (I copy my /etc onto it) so that we get at least a few log segments in play * Open a file, write some data to it, then open it again for truncate (I use open('foo', 'w') from python console) * Check MDS logs, it should have some stuff about truncates. * Unmount and stop the MDS. * Upgrade to master (in my case I'm just checking it out and recompiling my tree) * Restart the MDS. Check that (a) it comes up happily, and (b) there are messages in the MDS log about "rewrite_truncate_finish applying segment seq mapping" and "_reformat journal zeroing expire pos" Currently the next thing that happens after that is that the MDS crashes with #9444
#10 Updated by Greg Farnum about 9 years ago
- Status changed from Fix Under Review to Resolved
Merged to master in commit:e06f4251ac36503d33f203567ada1b096119ab80.
Immediately cherry-picked to giant in commit:c3c6468cad8fa2c976d6c325d93a34e9096da86b