Project

General

Profile

Actions

Bug #472

closed

mds: fragstat crash

Added by Sage Weil over 13 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

see pudgy:/home/gregf/logs/fragstat_assert


mds/CInode.cc: In function 'virtual void CInode::decode_lock_state(int, ceph::bufferlist&)':
mds/CInode.cc:1286: FAILED assert(pf->fragstat == fragstat)
 ceph version 0.22~rc (73a88cb6df372de4d72b036485066781cefe2659)
 1: (CInode::decode_lock_state(int, ceph::buffer::list&)+0x18ff) [0x91d47f]
 2: (SimpleLock::decode_locked_state(ceph::buffer::list&)+0x42) [0x8be6ee]
 3: (Locker::handle_file_lock(ScatterLock*, MLock*)+0x273) [0x8bd867]
 4: (Locker::handle_lock(MLock*)+0x1c4) [0x8b8d0a]
 5: (Locker::dispatch(Message*)+0x45) [0x8a949f]
 6: (MDS::_dispatch(Message*)+0x1aa4) [0x759fee]
 7: (MDS::ms_dispatch(Message*)+0x38) [0x7583d0]
 8: (Messenger::ms_deliver_dispatch(Message*)+0x63) [0x7433a1]
 9: (SimpleMessenger::dispatch_entry()+0x5d4) [0x7346aa]
 10: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x729404]
 11: (Thread::_entry_func(void*)+0x23) [0x7422c5]
 12: /lib/libpthread.so.0 [0x7f5a6224f73a]
 13: (clone()+0x6d) [0x7f5a6120e69d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #1

Updated by Greg Farnum over 13 years ago

Similarly:
#0 0x0000000000000000 in ?? ()
#1 0x0000000000a1e2e7 in sigabrt_handler (signum=6) at config.cc:238
#2 <signal handler called>
#3 0x00007fb846eaaf45 in *_GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#4 0x00007fb846eadd80 in *
_GI_abort () at abort.c:88
#5 0x00007fb847731d45 in _gnu_cxx::_verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#6 0x00007fb847730176 in ?? () from /usr/lib/libstdc++.so.6
#7 0x00007fb8477301a3 in std::terminate() () from /usr/lib/libstdc++.so.6
#8 0x00007fb84773029e in _cxa_throw () from /usr/lib/libstdc++.so.6
#9 0x0000000000a0da04 in ceph::
_ceph_assert_fail (assertion=0xa68cb8 "pf->accounted_fragstat == fragstat", file=0xa686d3 "mds/CInode.cc", line=1285,
func=0xa6a6a0 "virtual void CInode::decode_lock_state(int, ceph::bufferlist&)") at common/assert.cc:30
#10 0x000000000091d0ee in CInode::decode_lock_state (this=0x1f32810, type=64, bl=...) at mds/CInode.cc:1285
#11 0x00000000008be602 in SimpleLock::decode_locked_state (this=0x1f32f88, bl=...) at mds/SimpleLock.h:289
#12 0x00000000008bd903 in Locker::handle_file_lock (this=0x1e5d780, lock=0x1f32f88, m=0x1e836c0) at mds/Locker.cc:3915
#13 0x00000000008b8c1e in Locker::handle_lock (this=0x1e5d780, m=0x1e836c0) at mds/Locker.cc:2752
#14 0x00000000008a93b3 in Locker::dispatch (this=0x1e5d780, m=0x1e836c0) at mds/Locker.cc:73
#15 0x0000000000759fee in MDS::_dispatch (this=0x1e64000, m=0x1e836c0) at mds/MDS.cc:1495
#16 0x00000000007583d0 in MDS::ms_dispatch (this=0x1e64000, m=0x1e836c0) at mds/MDS.cc:1354
#17 0x00000000007433a1 in Messenger::ms_deliver_dispatch (this=0x1e61000, m=0x1e836c0) at msg/Messenger.h:97
#18 0x00000000007346aa in SimpleMessenger::dispatch_entry (this=0x1e61000) at msg/SimpleMessenger.cc:342
#19 0x0000000000729404 in SimpleMessenger::DispatchThread::entry (this=0x1e61488) at msg/SimpleMessenger.h:558
#20 0x00000000007422c5 in Thread::_entry_func (arg=0x1e61488) at ./common/Thread.h:39
#21 0x00007fb847f8573a in start_thread (arg=<value optimized out>) at pthread_create.c:300
#22 0x00007fb846f4469d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#23 0x0000000000000000 in ?? ()

Actions #2

Updated by Greg Farnum over 13 years ago

Applied patch you gave me. Got new crash:
#0 0x0000000000000000 in ?? ()
#1 0x0000000000a1e317 in sigabrt_handler (signum=6) at config.cc:238
#2 <signal handler called>
#3 0x00007f49af484f45 in *_GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#4 0x00007f49af487d80 in *
_GI_abort () at abort.c:88
#5 0x00007f49afd0bd45 in _gnu_cxx::_verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#6 0x00007f49afd0a176 in ?? () from /usr/lib/libstdc++.so.6
#7 0x00007f49afd0a1a3 in std::terminate() () from /usr/lib/libstdc++.so.6
#8 0x00007f49afd0a29e in _cxa_throw () from /usr/lib/libstdc++.so.6
#9 0x0000000000a0da34 in ceph::
_ceph_assert_fail (assertion=0xa68fc6 "\"unmatched rstat rbytes\" == 0", file=0xa68713 "mds/CInode.cc", line=1534, func=0xa6a5e0 "void CInode::finish_scatter_gather_update(int)")
at common/assert.cc:30
#10 0x000000000091f6fe in CInode::finish_scatter_gather_update (this=0x24b7ad0, type=1024) at mds/CInode.cc:1534
#11 0x00000000008ba9fb in Locker::scatter_writebehind (this=0x247c680, lock=0x24b82c8) at mds/Locker.cc:3246
#12 0x00000000008ac9f8 in Locker::eval_gather (this=0x247c680, lock=0x24b82c8, first=false, pneed_issue=0x0, pfinishers=0x0) at mds/Locker.cc:554
#13 0x00000000008bdbe2 in Locker::handle_file_lock (this=0x247c680, lock=0x24b82c8, m=0x24fc240) at mds/Locker.cc:3949
#14 0x00000000008b8c4e in Locker::handle_lock (this=0x247c680, m=0x24fc240) at mds/Locker.cc:2752
#15 0x00000000008a93e3 in Locker::dispatch (this=0x247c680, m=0x24fc240) at mds/Locker.cc:73
#16 0x0000000000759fee in MDS::_dispatch (this=0x2484000, m=0x24fc240) at mds/MDS.cc:1495
#17 0x00000000007583d0 in MDS::ms_dispatch (this=0x2484000, m=0x24fc240) at mds/MDS.cc:1354
#18 0x00000000007433a1 in Messenger::ms_deliver_dispatch (this=0x2481000, m=0x24fc240) at msg/Messenger.h:97
#19 0x00000000007346aa in SimpleMessenger::dispatch_entry (this=0x2481000) at msg/SimpleMessenger.cc:342
#20 0x0000000000729404 in SimpleMessenger::DispatchThread::entry (this=0x2481488) at msg/SimpleMessenger.h:558
#21 0x00000000007422c5 in Thread::_entry_func (arg=0x2481488) at ./common/Thread.h:39
#22 0x00007f49b055f73a in start_thread (arg=<value optimized out>) at pthread_create.c:300
#23 0x00007f49af51e69d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#24 0x0000000000000000 in ?? ()

Actions #3

Updated by Sage Weil over 13 years ago

let's try


diff --git a/src/mds/MDCache.cc b/src/mds/MDCache.cc
index 4603966..691bb4e 100644
--- a/src/mds/MDCache.cc
+++ b/src/mds/MDCache.cc
@@ -1875,6 +1875,10 @@ void MDCache::predirty_journal_parents(Mutation *mut, EMetaBlob *blob,
        project_rstat_frag_to_inode(p->second.rstat, p->second.accounted_rstat, p->second.first, p->first, pin, true);//false);
       parent->dirty_old_rstat.clear();
       project_rstat_frag_to_inode(pf->rstat, pf->accounted_rstat, parent->first, CEPH_NOSNAP, pin, true);//false);
+      
+      // bump version
+      pi->rstat.version++;
+      pf->rstat.version = pf->accounted_rstat.version = pi->rstat.version;
     }

     // next parent!

Actions #4

Updated by Greg Farnum over 13 years ago

Well, this seems to have gotten rid of the first assert issue -- and made pjd last a bit longer -- and it's a bit more stable, but I can still reliably reproducing the second assert failure (on accounted_fragstat). Maybe it's a different issue after all?

Actions #5

Updated by Sage Weil over 13 years ago

  • Target version changed from v0.22 to v0.23
Actions #6

Updated by Sage Weil over 13 years ago

  • Status changed from New to Resolved
Actions #7

Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)
  • Target version deleted (v0.23)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Also available in: Atom PDF