Bug #966
closedmds: scatterstat error
0%
Description
- vstart d -n -x cfuse mnt
- cp /usr mnt/foo for a while (~15 mb of mds journal)
- stop the copy
- restart mds's
mds/CDir.cc: In function 'bool CDir::check_rstats()', in thread '0x7fe52f6e9700' mds/CDir.cc: 225: FAILED assert(!g_conf.mds_debug_scatterstat || (get_num_head_items() == (fnode.fragstat.nfiles + fnode.fragstat.nsubdirs))) ceph version 0.25-534-g3f1e9b0 (commit:3f1e9b0f7b87f0d2113e72e33c328f647c3eb2ef) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x53) [0xa5076a] 2: (CDir::check_rstats()+0x758) [0x942d06] 3: (CInode::finish_scatter_gather_update(int)+0x1718) [0x969b22] 4: (Locker::scatter_writebehind(ScatterLock*)+0x273) [0x8fc077] 5: (Locker::simple_sync(SimpleLock*, bool*)+0x3ee) [0x8fb340] 6: (Locker::scatter_nudge(ScatterLock*, Context*, bool)+0x464) [0x8fcda4] 7: (Locker::scatter_tick()+0x231) [0x8fd30b] 8: (MDS::tick()+0x376) [0x76bae8] 9: (MDS::C_MDS_Tick::finish(int)+0x32) [0x77f34e] 10: (SafeTimer::timer_thread()+0x217) [0xa4dadd] 11: (SafeTimerThread::entry()+0x1c) [0xa4e7b8] 12: (Thread::_entry_func(void*)+0x23) [0x74ac29] 13: (()+0x68ba) [0x7fe532e418ba] 14: (clone()+0x6d) [0x7fe531ad602d] ceph version 0.25-534-g3f1e9b0 (commit:3f1e9b0f7b87f0d2113e72e33c328f647c3eb2ef) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x53) [0xa5076a] 2: (CDir::check_rstats()+0x758) [0x942d06] 3: (CInode::finish_scatter_gather_update(int)+0x1718) [0x969b22] 4: (Locker::scatter_writebehind(ScatterLock*)+0x273) [0x8fc077] 5: (Locker::simple_sync(SimpleLock*, bool*)+0x3ee) [0x8fb340] 6: (Locker::scatter_nudge(ScatterLock*, Context*, bool)+0x464) [0x8fcda4] 7: (Locker::scatter_tick()+0x231) [0x8fd30b] 8: (MDS::tick()+0x376) [0x76bae8] 9: (MDS::C_MDS_Tick::finish(int)+0x32) [0x77f34e] 10: (SafeTimer::timer_thread()+0x217) [0xa4dadd] 11: (SafeTimerThread::entry()+0x1c) [0xa4e7b8] 12: (Thread::_entry_func(void*)+0x23) [0x74ac29] 13: (()+0x68ba) [0x7fe532e418ba] 14: (clone()+0x6d) [0x7fe531ad602d] *** Caught signal (Aborted) **
happens consistently. haven't tried with copying less data.
Updated by Brian Chrisman about 13 years ago
Update this item when there's a fix submitted and I can retest my related issue fairly quickly.
Updated by Greg Farnum about 13 years ago
- Status changed from New to In Progress
- Assignee set to Greg Farnum
I'll look into this today.
Updated by Greg Farnum about 13 years ago
Ugh. My first attempt to reproduce this, the OSD crashed before the MDS did:
osd/ReplicatedPG.cc: 1211: FAILED assert(0 == "oi disagrees with stat")
Perhaps this OSD bug is the actual cause of the problem? (Which means that #989 is probably a different bug, actually...)
Updated by Greg Farnum about 13 years ago
Oh, that assert actually occurred because of an ENOENT return code, not a stat mismatch. Pushed a fix, will try again.
Updated by Greg Farnum about 13 years ago
- Assignee changed from Greg Farnum to Sage Weil
I haven't been able to reproduce this even once. Assigning back to Sage per his request.
Updated by Sage Weil about 13 years ago
- Status changed from In Progress to Can't reproduce
Can't reproduce this one. Maybe it was resolved by commit:466306de3aeca22311993bf5a1955281499d751d.. or something earlier, given that greg couldn't hit it either!
Updated by John Spray over 7 years ago
- Project changed from Ceph to CephFS
- Category deleted (
1) - Target version deleted (
v0.27)
Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.