Bug #6791: mds assert after startup - CDir::commit error (want > commited version) - CephFS - Ceph

Actions

Copy link

Bug #6791

closed

mds assert after startup - CDir::commit error (want > commited version)

Added by Maros Vegh over 10 years ago. Updated about 10 years ago.

Status:

Won't Fix

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

On upgrade from 0.67 to 0.72 i experienced the bug 6755.
I repaired the system with the ceph_filestore_tool as described in the bug 6761.
After that i started the system with wip-6761-emperor. All pgs are active+clean.

But the mds asserts after startup with the error:
mds/CDir.cc: In function 'void CDir::commit(version_t, Context*, bool)' thread 7fb82cc64700 time 2013-11-16 22:35:56.495266
mds/CDir.cc: 1718: FAILED assert(want > committed_version)

My system runs on Ubuntu 13.04

Files

Download all files

ceph-mds.b.log (2.14 MB) ceph-mds.b.log		Maros Vegh, 11/16/2013 01:43 PM
ceph-mds.b.log.bug6791.log10.tar.gz (36.2 MB) ceph-mds.b.log.bug6791.log10.tar.gz		Maros Vegh, 11/17/2013 04:35 AM

Actions

Copy link

Updated by Maros Vegh over 10 years ago

File ceph-mds.b.log.bug6791.log10.tar.gz ceph-mds.b.log.bug6791.log10.tar.gz added

On a higher log level i can see that this happens during "try_to_expire" on a journal LogSegment:

-4> 2013-11-17 11:02:05.053688 7feee2234700 10 mds.0.cache.ino(10002f3a3fd) clear_dirty_parent
    -3> 2013-11-17 11:02:05.053693 7feee2234700 10 mds.0.log _maybe_expired segment 3213692504680 2387 events
    -2> 2013-11-17 11:02:05.053697 7feee2234700  6 mds.0.journal LogSegment(3213692504680).try_to_expire
    -1> 2013-11-17 11:02:05.053707 7feee2234700 10 mds.0.cache.dir(10002f62375) commit want 0 on [dir 10002f62375 /meteo_data/opt/wrf/umbriel/input_arch/2013111506/ [2,head] auth v=453 cv=453/453 state=1073741824 f(v0 m2013-11-15 10:45:41.467229 34=34+0) n(v0 rc2013-11-15 10:45:41.467229 b288063844 34=34+0) hs=34+0,ss=0+0 dirty=18 | child=1 authpin=0 0x4ec4000]
     0> 2013-11-17 11:02:05.057561 7feee2234700 -1 mds/CDir.cc: In function 'void CDir::commit(version_t, Context*, bool)' thread 7feee2234700 time 2013-11-17 11:02:05.053725
mds/CDir.cc: 1718: FAILED assert(want > committed_version)

ceph version 0.72-3-g5e1e02c (5e1e02c99b620fa4ffd2b455eb8e005b172fa05c)
 1: (CDir::commit(unsigned long, Context*, bool)+0x325) [0x80beb5]
 2: (LogSegment::try_to_expire(MDS*, C_GatherBuilder&)+0x214a) [0x6680aa]
 3: (MDLog::try_expire(LogSegment*)+0x66) [0x85a476]
 4: (MDLog::_maybe_expired(LogSegment*)+0xb2) [0x85b482]
 5: (Context::complete(int)+0x9) [0x62bec9]
 6: (C_Gather::delete_me()+0x16) [0x62c436]
 7: (C_Gather::sub_finish(Context*, int)+0x24d) [0x62fe4d]
 8: (C_Gather::C_GatherSub::finish(int)+0x12) [0x62ff52]
 9: (Context::complete(int)+0x9) [0x62bec9]
 10: (CInode::_stored_backtrace(unsigned long, Context*)+0x8e) [0x81882e]
 11: (Context::complete(int)+0x9) [0x62bec9]
 12: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x10e3) [0x87e573]
 13: (MDS::handle_core_message(Message*)+0xc77) [0x64d087]
 14: (MDS::_dispatch(Message*)+0x33) [0x64d1a3]
 15: (MDS::ms_dispatch(Message*)+0xbb) [0x64f03b]
 16: (DispatchQueue::entry()+0x4fb) [0xa2002b]
 17: (DispatchQueue::DispatchThread::entry()+0xd) [0x946fad]
 18: (()+0x7f8e) [0x7feee5d66f8e]
 19: (clone()+0x6d) [0x7feee454da0d]
 NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.