Bug #9427: osdc/Journaler.cc: 405: FAILED assert(last_written.write_pos >= last_written.expire_pos) - CephFS - Ceph

Actions

Copy link

Bug #9427

closed

osdc/Journaler.cc: 405: FAILED assert(last_written.write_pos >= last_written.expire_pos)

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:

Resolved

Priority:

Immediate

Assignee:

John Spray

Category:

Target version:

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

2014-09-10 20:44:59.843880 7fc438331700 -1 osdc/Journaler.cc: In function 'void Journaler::_write_head(Context*)' thread 7fc438331700 time 2014-09-10 20:44:59.792758
osdc/Journaler.cc: 405: FAILED assert(last_written.write_pos >= last_written.expire_pos)

 ceph version 0.85-669-gc050afd (c050afdbe01e4204de988ef2dba28dd4ba1c34e2)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x90e9bf]
 2: (Journaler::_write_head(Context*)+0x82f) [0x7b8d2f]
 3: (Journaler::_flush(C_OnFinisher*)+0x118) [0x7bb508]
 4: (Journaler::flush(Context*)+0x3d) [0x7bb9ad]
 5: (MDLog::_submit_thread()+0x723) [0x79cfa3]
 6: (MDLog::SubmitThread::entry()+0xd) [0x5a09cd]
 7: (()+0x7e9a) [0x7fc442f27e9a]
 8: (clone()+0x6d) [0x7fc441adc3fd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

on burnupi21

Actions

Copy link

Updated by Sage Weil over 9 years ago

2014-09-10 21:22:51.698712 7f412cd8a700 10 mds.0.journaler(rw) append_entry len 3025 to 135243619~3045
2014-09-10 21:22:51.698728 7f412cd8a700 20 mds.0.journaler(rw) flush not delaying flush
2014-09-10 21:22:51.698730 7f412cd8a700 10 mds.0.journaler(rw) _do_flush flushing 135243619~3045
2014-09-10 21:22:51.698823 7f412cd8a700 10 mds.0.journaler(rw) _do_flush (prezeroing/prezero)/write/flush/safe pointers now at (159383552/159383552)/135246664/135246664/135243619
2014-09-10 21:22:51.698826 7f412cd8a700 20 mds.0.journaler(rw) _issue_prezero target 159383552 <= prezeroing_pos 159383552
2014-09-10 21:22:51.698834 7f412cd8a700 10 mds.0.journaler(rw) write_head loghead(trim 4194304, expire 1875432704405, write 135243619, stream_format 1)

expire_pos didn't get reset during jouranl upgrade?

Actions

Copy link

Updated by Sage Weil over 9 years ago

ESubtreeMap has an expire_pos field, and we set it in ESubtreeMap::replay() if it is > the current expire pos. I think the journal rewrite needs to adjust this value, or we need to guard it with a journal version or something. Or drop it entirely...

Actions

Copy link

Updated by Sage Weil over 9 years ago

Project changed from Ceph to CephFS

wip-mds has hacky workaround

Actions

Copy link

Updated by John Spray over 9 years ago

The history:

commit 6337842c464ecb9055683270a1faaac313b93257
Author: Greg Farnum <gregf@hq.newdream.net>
Date:   Tue Nov 30 14:00:32 2010 -0800

    mds: add expire_pos to the ESubtreeMap.

    This will allow more efficient trimming during standby_replay.

    Signed-off-by: Greg Farnum <gregf@hq.newdream.net>

commit d496b65144739d4daf716d81d4ac517845e701df
Author: Greg Farnum <gregf@hq.newdream.net>
Date:   Tue Feb 15 09:00:06 2011 -0800

    Journaler: add some checks for expire_pos.

    I don't think these are necessary checks, but the expire_pos >= trim_pos
    invariant got broken somehow by johnl, and these checks won't hurt!

    Signed-off-by: Greg Farnum <gregf@hq.newdream.net>

I think that now that the journal encoding format is somewhat changeable, having absolute references to the serialized positions inside the journal entries themselves
is bad news -- we could zero them out during the reformat, or we could drop this field to make our lives simpler in future.

Looking into this, I'm a bit concerned that EMetaBlob::truncate_finish might have a similar problem.

Actions

Copy link

Updated by John Spray over 9 years ago

So rewriting the truncate_finish part isn't too hard if we want to do that:
https://github.com/ceph/ceph/commit/4ae6bca4d497e4e5bd302a3a013d62624ad87a52

(and can do a similar but simpler thing to zero out the expire_pos bit).

Greg: I don't understand what the original benefit is to feeding the expire_pos value here to stand by MDSs, could we get the same benefit by using a log segment sequence number here instead of a raw log position?

Actions

Copy link

Updated by Greg Farnum over 9 years ago

It doesn't need to be an absolute offset that gets fed to the standby-replay MDS, as long as it can use the information to expire old log segments and trim its cache. I think we added the expire_pos value just because they might not catch all the way up and get a chance to re-read the journal header.
(Alternative solution: make the journaled a little more complicated so that the standby-replay mds actually reads the header more often than just after running out of journal to read.)

Actions

Copy link

Updated by Sage Weil over 9 years ago

Assignee set to John Spray

Actions

Copy link

Updated by John Spray over 9 years ago

Status changed from New to Fix Under Review

https://github.com/ceph/ceph/pull/2469

Actions

Copy link

Updated by John Spray over 9 years ago

For posterity, the manual test procedure for the changes to rewrite that fix this issue:

The manual test procedure for this, with a vstart cluster is:
 * Create a fresh filesystem with firefly
 * Mount the FS and generate a decent number of operations (I copy my /etc onto it) so that we get at least a few log segments in play
 * Open a file, write some data to it, then open it again for truncate (I use open('foo', 'w') from python console)
 * Check MDS logs, it should have some stuff about truncates.
 * Unmount and stop the MDS.
 * Upgrade to master (in my case I'm just checking it out and recompiling my tree)
 * Restart the MDS.

Check that (a) it comes up happily, and (b) there are messages in the MDS log about "rewrite_truncate_finish applying segment seq mapping" and "_reformat journal zeroing expire pos" 

Currently the next thing that happens after that is that the MDS crashes with #9444

Actions

Copy link

#10

Updated by Greg Farnum over 9 years ago

Status changed from Fix Under Review to Resolved

Merged to master in commit:e06f4251ac36503d33f203567ada1b096119ab80.
Immediately cherry-picked to giant in commit:c3c6468cad8fa2c976d6c325d93a34e9096da86b

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #9427

osdc/Journaler.cc: 405: FAILED assert(last_written.write_pos >= last_written.expire_pos)

Updated by Sage Weil over 9 years ago

Updated by Sage Weil over 9 years ago

Updated by Sage Weil over 9 years ago

Updated by John Spray over 9 years ago

Updated by John Spray over 9 years ago

Updated by Greg Farnum over 9 years ago

Updated by Sage Weil over 9 years ago

Updated by John Spray over 9 years ago

Updated by John Spray over 9 years ago

Updated by Greg Farnum over 9 years ago