Bug #472
closedmds: fragstat crash
0%
Description
see pudgy:/home/gregf/logs/fragstat_assert
mds/CInode.cc: In function 'virtual void CInode::decode_lock_state(int, ceph::bufferlist&)': mds/CInode.cc:1286: FAILED assert(pf->fragstat == fragstat) ceph version 0.22~rc (73a88cb6df372de4d72b036485066781cefe2659) 1: (CInode::decode_lock_state(int, ceph::buffer::list&)+0x18ff) [0x91d47f] 2: (SimpleLock::decode_locked_state(ceph::buffer::list&)+0x42) [0x8be6ee] 3: (Locker::handle_file_lock(ScatterLock*, MLock*)+0x273) [0x8bd867] 4: (Locker::handle_lock(MLock*)+0x1c4) [0x8b8d0a] 5: (Locker::dispatch(Message*)+0x45) [0x8a949f] 6: (MDS::_dispatch(Message*)+0x1aa4) [0x759fee] 7: (MDS::ms_dispatch(Message*)+0x38) [0x7583d0] 8: (Messenger::ms_deliver_dispatch(Message*)+0x63) [0x7433a1] 9: (SimpleMessenger::dispatch_entry()+0x5d4) [0x7346aa] 10: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x729404] 11: (Thread::_entry_func(void*)+0x23) [0x7422c5] 12: /lib/libpthread.so.0 [0x7f5a6224f73a] 13: (clone()+0x6d) [0x7f5a6120e69d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Greg Farnum over 13 years ago
Similarly:
#0 0x0000000000000000 in ?? ()
#1 0x0000000000a1e2e7 in sigabrt_handler (signum=6) at config.cc:238
#2 <signal handler called>
#3 0x00007fb846eaaf45 in *_GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#4 0x00007fb846eadd80 in *_GI_abort () at abort.c:88
#5 0x00007fb847731d45 in _gnu_cxx::_verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#6 0x00007fb847730176 in ?? () from /usr/lib/libstdc++.so.6
#7 0x00007fb8477301a3 in std::terminate() () from /usr/lib/libstdc++.so.6
#8 0x00007fb84773029e in _cxa_throw () from /usr/lib/libstdc++.so.6
#9 0x0000000000a0da04 in ceph::_ceph_assert_fail (assertion=0xa68cb8 "pf->accounted_fragstat == fragstat", file=0xa686d3 "mds/CInode.cc", line=1285,
func=0xa6a6a0 "virtual void CInode::decode_lock_state(int, ceph::bufferlist&)") at common/assert.cc:30
#10 0x000000000091d0ee in CInode::decode_lock_state (this=0x1f32810, type=64, bl=...) at mds/CInode.cc:1285
#11 0x00000000008be602 in SimpleLock::decode_locked_state (this=0x1f32f88, bl=...) at mds/SimpleLock.h:289
#12 0x00000000008bd903 in Locker::handle_file_lock (this=0x1e5d780, lock=0x1f32f88, m=0x1e836c0) at mds/Locker.cc:3915
#13 0x00000000008b8c1e in Locker::handle_lock (this=0x1e5d780, m=0x1e836c0) at mds/Locker.cc:2752
#14 0x00000000008a93b3 in Locker::dispatch (this=0x1e5d780, m=0x1e836c0) at mds/Locker.cc:73
#15 0x0000000000759fee in MDS::_dispatch (this=0x1e64000, m=0x1e836c0) at mds/MDS.cc:1495
#16 0x00000000007583d0 in MDS::ms_dispatch (this=0x1e64000, m=0x1e836c0) at mds/MDS.cc:1354
#17 0x00000000007433a1 in Messenger::ms_deliver_dispatch (this=0x1e61000, m=0x1e836c0) at msg/Messenger.h:97
#18 0x00000000007346aa in SimpleMessenger::dispatch_entry (this=0x1e61000) at msg/SimpleMessenger.cc:342
#19 0x0000000000729404 in SimpleMessenger::DispatchThread::entry (this=0x1e61488) at msg/SimpleMessenger.h:558
#20 0x00000000007422c5 in Thread::_entry_func (arg=0x1e61488) at ./common/Thread.h:39
#21 0x00007fb847f8573a in start_thread (arg=<value optimized out>) at pthread_create.c:300
#22 0x00007fb846f4469d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#23 0x0000000000000000 in ?? ()
Updated by Greg Farnum over 13 years ago
Applied patch you gave me. Got new crash:
#0 0x0000000000000000 in ?? ()
#1 0x0000000000a1e317 in sigabrt_handler (signum=6) at config.cc:238
#2 <signal handler called>
#3 0x00007f49af484f45 in *_GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#4 0x00007f49af487d80 in *_GI_abort () at abort.c:88
#5 0x00007f49afd0bd45 in _gnu_cxx::_verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#6 0x00007f49afd0a176 in ?? () from /usr/lib/libstdc++.so.6
#7 0x00007f49afd0a1a3 in std::terminate() () from /usr/lib/libstdc++.so.6
#8 0x00007f49afd0a29e in _cxa_throw () from /usr/lib/libstdc++.so.6
#9 0x0000000000a0da34 in ceph::_ceph_assert_fail (assertion=0xa68fc6 "\"unmatched rstat rbytes\" == 0", file=0xa68713 "mds/CInode.cc", line=1534, func=0xa6a5e0 "void CInode::finish_scatter_gather_update(int)")
at common/assert.cc:30
#10 0x000000000091f6fe in CInode::finish_scatter_gather_update (this=0x24b7ad0, type=1024) at mds/CInode.cc:1534
#11 0x00000000008ba9fb in Locker::scatter_writebehind (this=0x247c680, lock=0x24b82c8) at mds/Locker.cc:3246
#12 0x00000000008ac9f8 in Locker::eval_gather (this=0x247c680, lock=0x24b82c8, first=false, pneed_issue=0x0, pfinishers=0x0) at mds/Locker.cc:554
#13 0x00000000008bdbe2 in Locker::handle_file_lock (this=0x247c680, lock=0x24b82c8, m=0x24fc240) at mds/Locker.cc:3949
#14 0x00000000008b8c4e in Locker::handle_lock (this=0x247c680, m=0x24fc240) at mds/Locker.cc:2752
#15 0x00000000008a93e3 in Locker::dispatch (this=0x247c680, m=0x24fc240) at mds/Locker.cc:73
#16 0x0000000000759fee in MDS::_dispatch (this=0x2484000, m=0x24fc240) at mds/MDS.cc:1495
#17 0x00000000007583d0 in MDS::ms_dispatch (this=0x2484000, m=0x24fc240) at mds/MDS.cc:1354
#18 0x00000000007433a1 in Messenger::ms_deliver_dispatch (this=0x2481000, m=0x24fc240) at msg/Messenger.h:97
#19 0x00000000007346aa in SimpleMessenger::dispatch_entry (this=0x2481000) at msg/SimpleMessenger.cc:342
#20 0x0000000000729404 in SimpleMessenger::DispatchThread::entry (this=0x2481488) at msg/SimpleMessenger.h:558
#21 0x00000000007422c5 in Thread::_entry_func (arg=0x2481488) at ./common/Thread.h:39
#22 0x00007f49b055f73a in start_thread (arg=<value optimized out>) at pthread_create.c:300
#23 0x00007f49af51e69d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#24 0x0000000000000000 in ?? ()
Updated by Sage Weil over 13 years ago
let's try
diff --git a/src/mds/MDCache.cc b/src/mds/MDCache.cc index 4603966..691bb4e 100644 --- a/src/mds/MDCache.cc +++ b/src/mds/MDCache.cc @@ -1875,6 +1875,10 @@ void MDCache::predirty_journal_parents(Mutation *mut, EMetaBlob *blob, project_rstat_frag_to_inode(p->second.rstat, p->second.accounted_rstat, p->second.first, p->first, pin, true);//false); parent->dirty_old_rstat.clear(); project_rstat_frag_to_inode(pf->rstat, pf->accounted_rstat, parent->first, CEPH_NOSNAP, pin, true);//false); + + // bump version + pi->rstat.version++; + pf->rstat.version = pf->accounted_rstat.version = pi->rstat.version; } // next parent!
Updated by Greg Farnum over 13 years ago
Well, this seems to have gotten rid of the first assert issue -- and made pjd last a bit longer -- and it's a bit more stable, but I can still reliably reproducing the second assert failure (on accounted_fragstat). Maybe it's a different issue after all?
Updated by Sage Weil over 13 years ago
- Target version changed from v0.22 to v0.23
Updated by John Spray over 7 years ago
- Project changed from Ceph to CephFS
- Category deleted (
1) - Target version deleted (
v0.23)
Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.