Project

General

Profile

Actions

Bug #10368

closed

Assertion in _trim_expired_segments

Added by John Spray over 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

544669    -20> 2014-12-18 05:15:49.295240 7f642f89e700 10 mds.0.objecter _op_submit oid 200.00000002 @3 @3 [write 1813550~1736] tid 0 osd.0
544670    -19> 2014-12-18 05:15:49.295243 7f64321a4700  5 mds.0.objecter 0 unacked, 4 uncommitted
544671    -18> 2014-12-18 05:15:49.295245 7f64321a4700  6 mds.0.journal LogSegment(5026/10197802).try_to_expire waiting
544672    -17> 2014-12-18 05:15:49.295246 7f64321a4700  5 mds.0.log try_expire expiring segment 5026/10197802
544673    -16> 2014-12-18 05:15:49.295246 7f642f89e700 20 mds.0.objecter get_session s=0x2e9f000 osd=0 4
544674    -15> 2014-12-18 05:15:49.295248 7f642f89e700 15 mds.0.objecter _session_op_assign 0 1577
544675    -14> 2014-12-18 05:15:49.295249 7f64321a4700  6 mds.0.journal LogSegment(5028/10198820).try_to_expire
544676    -13> 2014-12-18 05:15:49.295249 7f642f89e700 15 mds.0.objecter _send_op 1577 to osd.0
544677    -12> 2014-12-18 05:15:49.295250 7f64321a4700 10 mds.0.journal try_to_expire saving sessionmap, need 1009, committed is 1007 (1009)
544678    -11> 2014-12-18 05:15:49.295252 7f64321a4700 10 mds.0.sessionmap save needv 1009, v 1009
544679    -10> 2014-12-18 05:15:49.295252 7f642f89e700 20 mds.0.objecter put_session s=0x2e9f000 osd=0 5
544680     -9> 2014-12-18 05:15:49.295253 7f64321a4700  6 mds.0.journal LogSegment(5028/10198820).try_to_expire waiting
544681     -8> 2014-12-18 05:15:49.295254 7f642f89e700  5 mds.0.objecter 0 unacked, 4 uncommitted
544682     -7> 2014-12-18 05:15:49.295255 7f64321a4700  5 mds.0.log try_expire expiring segment 5028/10198820
544683     -6> 2014-12-18 05:15:49.295257 7f64321a4700  6 mds.0.journal LogSegment(5030/10199909).try_to_expire
544684     -5> 2014-12-18 05:15:49.295258 7f64321a4700  6 mds.0.journal LogSegment(5030/10199909).try_to_expire success
544685     -4> 2014-12-18 05:15:49.295259 7f64321a4700 10 mds.0.log try_expire expired segment 5030/10199909
544686     -3> 2014-12-18 05:15:49.295261 7f64321a4700  5 mds.0.log _expired segment 5030/10199909, 2 events
544687     -2> 2014-12-18 05:15:49.295263 7f64321a4700 10 mds.0.log _trim_expired_segments trimming expired 4022/0x897ea8
544688     -1> 2014-12-18 05:15:49.295266 7f64321a4700 10 mds.0.log _trim_expired_segments trimming expired 4024/0x898570
544689      0> 2014-12-18 05:15:49.296896 7f64321a4700 -1 ./include/elist.h: In function 'elist<T>::~elist() [with T = CDentry*]' thread 7f64       321a4700 time 2014-12-18 05:15:49.295269
544690 ./include/elist.h: 92: FAILED assert(_head.empty())
544691
544692  ceph version 0.89-755-g2d76bc2 (2d76bc2d989118f14f7b3a98ec11a3c87dc318da)
544693  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x9494df]
544694  2: (MDLog::_trim_expired_segments()+0x63e) [0x7d50ee]
544695  3: (MDLog::trim(int)+0x170) [0x7d7280]
544696  4: (MDS::tick()+0x3dc) [0x5a2f6c]
544697  5: (MDSInternalContextBase::complete(int)+0x1db) [0x7ca7ab]
544698  6: (SafeTimer::timer_thread()+0x3e5) [0x93b025]
544699  7: (SafeTimerThread::entry()+0xd) [0x93bbbd]
544700  8: (()+0x7e9a) [0x7f643a375e9a]
544701  9: (clone()+0x6d) [0x7f6438d3c31d]
544702  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Happened on mounting a client after mds-auto-repair test executed, investigating...

Investigating


Files

10368.sh (288 Bytes) 10368.sh John Spray, 01/15/2015 07:52 AM
mds.a.log.short.gz (8.72 MB) mds.a.log.short.gz John Spray, 01/15/2015 07:52 AM
Actions #1

Updated by John Spray over 9 years ago

Reproduce (took two tries to see the crash again, so it's intermittent)

tasks:
    - install:
    - ceph:
    - ceph-fuse:
    - mds-auto-repair:
    - mds-auto-repair:

Updated by John Spray over 9 years ago

Had no joy reproducing with mds_auto_repair more recently (on diff. test nodes), but did manage on a vstart cluster against master with the attached .sh running in a loop, a client with debug output turned off, and the following settings:

        mds cache size = 100                                                    
        mds cache log max segments = 2                                          
        mds cache log events per segment = 1                                    
        mds cache verify backtrace = 1      

The attached log is the last part of a much longer log.

Notes:

Segment that has the unwanted item_dirty is seq 170750                                                                      

2015-01-14 14:50:36.996372 7f71e5197700  1 mds.0.1 asok_command: flush journal (starting...)

2015-01-14 14:50:36.998726 7f71e5197700  7 mds.0.log _prepare_new_segment seq 170750

*** this is the msgr thread, starting a new segment just after the one that the asok thread created *** 

cap release on 1000000548c, which is AKA /testdir2/file57                          
testdir2 is 10000005452                                                            

2015-01-14 14:50:37.360868 7f71e3994700 10 mds.0.log submit_entry also starting new segment: last = 170750/222043046, event seq = 170961

2015-01-14 14:50:38.702326 7f71e5197700  5 mds.0.1 _command_flush_journal: beginning segment expiry

2015-01-14 14:50:38.702352 7f71e5197700  6 mds.0.journal LogSegment(170750/222043046).try_to_expire
2015-01-14 14:50:38.702360 7f71e5197700  6 mds.0.journal LogSegment(170750/222043046).try_to_expire success
2015-01-14 14:50:38.702363 7f71e5197700 10 mds.0.log try_expire expired segment 170750/222043046
2015-01-14 14:50:38.702367 7f71e5197700  5 mds.0.log _expired segment 170750/222043046, 212 events 

The testdir2/file57 gets marked clean here:                                     

2015-01-14 14:50:41.259444 7f71e118f700 15 mds.0.cache.dir(10000005452)  dir 1277 >= inode 1208 now clean [inode 1000000548c [2,head] /testdir2/file57 auth v1208 s=0 n(v0 1=1+0) (iversion lock) | ptrwaiter=0 request=0 lock=0 caps=0 dirtyparent=0 dirty=1 authpin=0 0x6d29c20]
2015-01-14 14:50:41.259452 7f71e118f700 10 mds.0.cache.ino(1000000548c)  mark_clean [inode 1000000548c [2,head] /testdir2/file57 auth v1208 s=0 n(v0 1=1+0) (iversion lock) | ptrwaiter=0 request=0 lock=0 caps=0 dirtyparent=0 dirty=1 authpin=0 0x6d29c20]
2015-01-14 14:50:41.259464 7f71e118f700 10 mds.0.cache.den(10000005452 file57)  mark_clean [dentry #1/testdir2/file57 [2,head] auth (dversion lock) v=1208 inode=0x6d29c20 | request=0 lock=0 inodepin=0 dirty=1 authpin=0 0x78ee240]

2015-01-14 14:50:41.273594 7f71e5197700  5 mds.0.1 _command_flush_journal: expiry complete, expire_pos/trim_pos is now d01beba/d000000

2015-01-14 14:50:41.273622 7f71e5197700 10 mds.0.log _trim_expired_segments trimming expired 170750/0xd3c1ba6

So the weird thing is that as far as I can tell mark_clean is getting called for the only file that was operated on during the problematic segment's lifetime, and yet the item is still in the dirty_dentries list.

My wip-10368-repro branch has some extra debugging, but so far I haven't seen the bug reproduce while the extra debugging was live :-/

Actions #3

Updated by Zheng Yan over 9 years ago

one possible cause:

there are more than one contexts in mdlog's wait_for_safe list. there is a context executed after MDS::_command_flush_journal()'s mdlog_flushed. the context marks dentry dirty, but the corresponding log segment has already been expired by the flush journal command.

Actions #4

Updated by Greg Farnum about 9 years ago

We saw this on our internal lab install as well.

[ubuntu@magna125 ~]$ zless /var/log/ceph/ceph-mds.magna125.log-20150204.gz

Actions #5

Updated by John Spray about 9 years ago

  • Status changed from In Progress to Fix Under Review
Actions #6

Updated by Greg Farnum about 9 years ago

  • Status changed from Fix Under Review to Pending Backport

This certainly needs to go into hammer as well, but I'm not quite brave enough to do an insta-cherry-pick. :)

Actions #7

Updated by John Spray about 9 years ago

Agree about backport, but we can be leisurely about it -- nobody's is likely to hit this bug unless they're hitting the journal flush asok command frequently for some reason

Actions #8

Updated by John Spray about 9 years ago

  • Status changed from Pending Backport to In Progress

By sheer coincidence, I've hit this issue again, while no journal flushes were running at all, and much more reproducibly. I think we may have a more fundamental issue with MDLog completions (that call mark_dirty on things) getting called after MDLog has decided to expire a segment. The decision to expire a segment is made by directly checking the safe_pos of journaller, whereas the completions have to travel through a finisher. We might need to send safe_pos updates via the same finisher to ensure ordering.

Actions #9

Updated by Greg Farnum about 9 years ago

  • Status changed from In Progress to Resolved

Finally merged to master by commit:b25e9e130bffed82ae5dda08449ea440fe6c485c

Actions

Also available in: Atom PDF