Project

General

Profile

Actions

Bug #11641

closed

test_journal_repair failing on mds_verify_scatter

Added by John Spray almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Possibly related to change to recent changes to the mydir population stuff?

http://pulpito.front.sepia.ceph.com/gregf-2015-05-14_21:57:00-fs-greg-fs-testing-testing-basic-multi/893540

Actions #1

Updated by John Spray almost 9 years ago

2015-05-15T01:47:07.655 WARNING:tasks.cephfs.filesystem:Unhealthy mds state gid_4156:up:rejoin
2015-05-15T01:47:07.950 INFO:tasks.ceph.mds.a.plana28.stderr:2015-05-15 01:47:07.951842 7f5e6a312700 -1 log_channel(cluster) log [ERR] : unmatched fragstat size on single dirfrag 100, inode has f(v0 m2015-05-15 01:47:07.951664 11=0+11), dirfrag has f(v0 m2015-05-15 01:47:07.951664 1=0+1)
2015-05-15T01:47:07.951 INFO:tasks.ceph.mds.a.plana28.stderr:mds/MDCache.cc: In function 'void MDCache::predirty_journal_parents(MutationRef, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)' thread 7f5e6a312700 time 2015-05-15 01:47:07.951890
2015-05-15T01:47:07.951 INFO:tasks.ceph.mds.a.plana28.stderr:mds/MDCache.cc: 2257: FAILED assert(!"unmatched fragstat size" == g_conf->mds_verify_scatter)
2015-05-15T01:47:07.952 INFO:tasks.ceph.mds.a.plana28.stderr: ceph version 9.0.0-879-g1e72489 (1e724898e7131a0d21e8e69c386310e22ca65f52)
2015-05-15T01:47:07.952 INFO:tasks.ceph.mds.a.plana28.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x96e04b]
2015-05-15T01:47:07.952 INFO:tasks.ceph.mds.a.plana28.stderr: 2: (MDCache::predirty_journal_parents(std::tr1::shared_ptr<MutationImpl>, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0x1376) [0x6be006]
2015-05-15T01:47:07.952 INFO:tasks.ceph.mds.a.plana28.stderr: 3: (MDCache::_create_system_file(CDir*, char const*, CInode*, MDSInternalContextBase*)+0xbc4) [0x6bfdf4]
2015-05-15T01:47:07.952 INFO:tasks.ceph.mds.a.plana28.stderr: 4: (MDCache::populate_mydir()+0x6c7) [0x6c0d67]
2015-05-15T01:47:07.952 INFO:tasks.ceph.mds.a.plana28.stderr: 5: (MDS::recovery_done(int)+0x1c2) [0x5aa7d2]
2015-05-15T01:47:07.952 INFO:tasks.ceph.mds.a.plana28.stderr: 6: (MDS::handle_mds_map(MMDSMap*)+0x27f6) [0x5ba4a6]
2015-05-15T01:47:07.952 INFO:tasks.ceph.mds.a.plana28.stderr: 7: (MDS::handle_core_message(Message*)+0x7ab) [0x5bfceb]
2015-05-15T01:47:07.953 INFO:tasks.ceph.mds.a.plana28.stderr: 8: (MDS::_dispatch(Message*, bool)+0x35) [0x5bfe65]
2015-05-15T01:47:07.953 INFO:tasks.ceph.mds.a.plana28.stderr: 9: (MDS::ms_dispatch(Message*)+0x98) [0x5c1408]
2015-05-15T01:47:07.953 INFO:tasks.ceph.mds.a.plana28.stderr: 10: (DispatchQueue::entry()+0x649) [0xa5b2a9]
2015-05-15T01:47:07.953 INFO:tasks.ceph.mds.a.plana28.stderr: 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x95661d]
2015-05-15T01:47:07.953 INFO:tasks.ceph.mds.a.plana28.stderr: 12: (()+0x8182) [0x7f5e70ebc182]
2015-05-15T01:47:07.953 INFO:tasks.ceph.mds.a.plana28.stderr: 13: (clone()+0x6d) [0x7f5e6f62b47d]
2015-05-15T01:47:07.953 INFO:tasks.ceph.mds.a.plana28.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #2

Updated by Greg Farnum almost 9 years ago

  • Project changed from Ceph to CephFS

Note that this run has the dirfrag enable PR included. I'm not sure if this test would be hitting that or if it's failed in any runs without that. (I'll leave dirfrags out of the next test run.)

Actions #3

Updated by John Spray almost 9 years ago

OK, turns out to be nothing to do with journaltool, this is a straight up bug in the change I made to populate_mydir. It's prematurely hitting the "mydir->get_version() == 0" check and acting like the frag object is missing, but it hasn't actually tried to load it yet. So it continues with its freshly created blank dirfrag, whose stats of course don't match those of the inode that we were loading from.

Can get same symptom by just creating something in a stray dir and then doing a "flush journal" on a live MDS followed by a reset journal and restart the mds.

Actions #4

Updated by John Spray almost 9 years ago

  • Status changed from New to In Progress
Actions #5

Updated by John Spray almost 9 years ago

  • Status changed from In Progress to Fix Under Review
Actions #6

Updated by Zheng Yan almost 9 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF