Bug #11641
closedtest_journal_repair failing on mds_verify_scatter
0%
Description
Possibly related to change to recent changes to the mydir population stuff?
Updated by John Spray almost 9 years ago
2015-05-15T01:47:07.655 WARNING:tasks.cephfs.filesystem:Unhealthy mds state gid_4156:up:rejoin 2015-05-15T01:47:07.950 INFO:tasks.ceph.mds.a.plana28.stderr:2015-05-15 01:47:07.951842 7f5e6a312700 -1 log_channel(cluster) log [ERR] : unmatched fragstat size on single dirfrag 100, inode has f(v0 m2015-05-15 01:47:07.951664 11=0+11), dirfrag has f(v0 m2015-05-15 01:47:07.951664 1=0+1) 2015-05-15T01:47:07.951 INFO:tasks.ceph.mds.a.plana28.stderr:mds/MDCache.cc: In function 'void MDCache::predirty_journal_parents(MutationRef, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)' thread 7f5e6a312700 time 2015-05-15 01:47:07.951890 2015-05-15T01:47:07.951 INFO:tasks.ceph.mds.a.plana28.stderr:mds/MDCache.cc: 2257: FAILED assert(!"unmatched fragstat size" == g_conf->mds_verify_scatter) 2015-05-15T01:47:07.952 INFO:tasks.ceph.mds.a.plana28.stderr: ceph version 9.0.0-879-g1e72489 (1e724898e7131a0d21e8e69c386310e22ca65f52) 2015-05-15T01:47:07.952 INFO:tasks.ceph.mds.a.plana28.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x96e04b] 2015-05-15T01:47:07.952 INFO:tasks.ceph.mds.a.plana28.stderr: 2: (MDCache::predirty_journal_parents(std::tr1::shared_ptr<MutationImpl>, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0x1376) [0x6be006] 2015-05-15T01:47:07.952 INFO:tasks.ceph.mds.a.plana28.stderr: 3: (MDCache::_create_system_file(CDir*, char const*, CInode*, MDSInternalContextBase*)+0xbc4) [0x6bfdf4] 2015-05-15T01:47:07.952 INFO:tasks.ceph.mds.a.plana28.stderr: 4: (MDCache::populate_mydir()+0x6c7) [0x6c0d67] 2015-05-15T01:47:07.952 INFO:tasks.ceph.mds.a.plana28.stderr: 5: (MDS::recovery_done(int)+0x1c2) [0x5aa7d2] 2015-05-15T01:47:07.952 INFO:tasks.ceph.mds.a.plana28.stderr: 6: (MDS::handle_mds_map(MMDSMap*)+0x27f6) [0x5ba4a6] 2015-05-15T01:47:07.952 INFO:tasks.ceph.mds.a.plana28.stderr: 7: (MDS::handle_core_message(Message*)+0x7ab) [0x5bfceb] 2015-05-15T01:47:07.953 INFO:tasks.ceph.mds.a.plana28.stderr: 8: (MDS::_dispatch(Message*, bool)+0x35) [0x5bfe65] 2015-05-15T01:47:07.953 INFO:tasks.ceph.mds.a.plana28.stderr: 9: (MDS::ms_dispatch(Message*)+0x98) [0x5c1408] 2015-05-15T01:47:07.953 INFO:tasks.ceph.mds.a.plana28.stderr: 10: (DispatchQueue::entry()+0x649) [0xa5b2a9] 2015-05-15T01:47:07.953 INFO:tasks.ceph.mds.a.plana28.stderr: 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x95661d] 2015-05-15T01:47:07.953 INFO:tasks.ceph.mds.a.plana28.stderr: 12: (()+0x8182) [0x7f5e70ebc182] 2015-05-15T01:47:07.953 INFO:tasks.ceph.mds.a.plana28.stderr: 13: (clone()+0x6d) [0x7f5e6f62b47d] 2015-05-15T01:47:07.953 INFO:tasks.ceph.mds.a.plana28.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Greg Farnum almost 9 years ago
- Project changed from Ceph to CephFS
Note that this run has the dirfrag enable PR included. I'm not sure if this test would be hitting that or if it's failed in any runs without that. (I'll leave dirfrags out of the next test run.)
Updated by John Spray almost 9 years ago
OK, turns out to be nothing to do with journaltool, this is a straight up bug in the change I made to populate_mydir. It's prematurely hitting the "mydir->get_version() == 0" check and acting like the frag object is missing, but it hasn't actually tried to load it yet. So it continues with its freshly created blank dirfrag, whose stats of course don't match those of the inode that we were loading from.
Can get same symptom by just creating something in a stray dir and then doing a "flush journal" on a live MDS followed by a reset journal and restart the mds.
Updated by John Spray almost 9 years ago
- Status changed from In Progress to Fix Under Review
Updated by Zheng Yan almost 9 years ago
- Status changed from Fix Under Review to Resolved