Project

General

Profile

Bug #20337

test_rebuild_simple_altpool triggers MDS assertion

Added by John Spray almost 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Category:
Testing
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Two things are going wrong here, I think:
  • The test code is doing a self.fs.wait_for_daemons() (test_data_scan.py:425), but there is also an other_fs in this case which should be waited for as well.
  • The scrub_path command is not checking that the daemon is active before executing.

We should fix the test not to trigger this case, and at the same time fix the MDS to reject the command in this situation instead of asserting out when it tries to repair metadata.

ceph version 12.0.3-1728-g9742c3e (9742c3ee3c45045a4b7853b252ed427748214bb6) luminous (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f62ad116f30]
 2: (MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)+0x164) [0x7f62ad078404]
 3: (Locker::scatter_writebehind(ScatterLock*)+0x863) [0x7f62acf71f73]
 4: (Locker::simple_sync(SimpleLock*, bool*)+0x558) [0x7f62acf75da8]
 5: (Locker::file_eval(ScatterLock*, bool*)+0x3e6) [0x7f62acf7b6f6]
 6: (Locker::try_eval(SimpleLock*, bool*)+0x6ee) [0x7f62acf7c89e]
 7: (Locker::wrlock_finish(SimpleLock*, MutationImpl*, bool*)+0x26e) [0x7f62acf7f30e]
 8: (Locker::_drop_non_rdlocks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x22c) [0x7f62acf82c7c]
 9: (Locker::drop_locks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x76) [0x7f62acf83046]
 10: (MDCache::repair_inode_stats_work(boost::intrusive_ptr<MDRequestImpl>&)+0x9ca) [0x7f62acee996a]
 11: (MDCache::repair_inode_stats(CInode*)+0x73) [0x7f62acee9ba3]
 12: (()+0x4a2016) [0x7f62acffb016]
 13: (Continuation::_continue_function(int, int)+0x1aa) [0x7f62ad02c00a]
 14: (()+0x4c2583) [0x7f62ad01b583]
 15: (()+0x4c4aa6) [0x7f62ad01daa6]
 16: (Continuation::_continue_function(int, int)+0x1aa) [0x7f62ad02c00a]
 17: (Continuation::Callback::finish(int)+0x10) [0x7f62ad02c0f0]
 18: (Context::complete(int)+0x9) [0x7f62acdfc859]
 19: (MDSIOContextBase::complete(int)+0xa4) [0x7f62ad062ba4]
 20: (Finisher::finisher_thread_entry()+0x1c5) [0x7f62ad115ef5]
 21: (()+0x7dc5) [0x7f62aac48dc5]
 22: (clone()+0x6d) [0x7f62a9d2d73d]

Related issues

Copied to CephFS - Backport #21490: luminous: test_rebuild_simple_altpool triggers MDS assertion Resolved

History

#1 Updated by John Spray almost 7 years ago

  • Assignee set to Douglas Fuller

#2 Updated by Douglas Fuller almost 7 years ago

  • Status changed from New to Need More Info
  • Assignee changed from Douglas Fuller to John Spray

wait_for_daemons should wait for every daemon regardless of filesystem. is there a failure log I can look at?

#3 Updated by John Spray almost 7 years ago

I'm not seeing how Filesystem.are_daemons_healthy is waiting for daemons outside the filesystem: it's inspecting daemosn in self.get_mds_map().

Here's the failure: http://pulpito.ceph.com/jspray-2017-06-15_20:24:16-fs-wip-jcsp-testing-20170615-distro-basic-smithi/1292005

#4 Updated by John Spray almost 7 years ago

  • Status changed from Need More Info to New

#5 Updated by Douglas Fuller almost 7 years ago

  • Assignee changed from John Spray to Douglas Fuller

#6 Updated by Douglas Fuller over 6 years ago

  • Status changed from New to 17

#8 Updated by Patrick Donnelly over 6 years ago

  • Status changed from 17 to Resolved

#9 Updated by Patrick Donnelly over 6 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to luminous

This should resolve this failure in 12.2.1 testing:

http://tracker.ceph.com/issues/21466#note-1

#10 Updated by Nathan Cutler over 6 years ago

@Patrick: So both https://github.com/ceph/ceph/pull/16305 and https://github.com/ceph/ceph/pull/17849 need to be backported to luminous?

#11 Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #21490: luminous: test_rebuild_simple_altpool triggers MDS assertion added

#12 Updated by Patrick Donnelly over 6 years ago

Nathan Cutler wrote:

@Patrick: So both https://github.com/ceph/ceph/pull/16305 and https://github.com/ceph/ceph/pull/17849 need to be backported to luminous?

Yes

#13 Updated by Nathan Cutler over 6 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF