Bug #20337
closedtest_rebuild_simple_altpool triggers MDS assertion
0%
Description
- The test code is doing a self.fs.wait_for_daemons() (test_data_scan.py:425), but there is also an other_fs in this case which should be waited for as well.
- The scrub_path command is not checking that the daemon is active before executing.
We should fix the test not to trigger this case, and at the same time fix the MDS to reject the command in this situation instead of asserting out when it tries to repair metadata.
ceph version 12.0.3-1728-g9742c3e (9742c3ee3c45045a4b7853b252ed427748214bb6) luminous (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f62ad116f30] 2: (MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)+0x164) [0x7f62ad078404] 3: (Locker::scatter_writebehind(ScatterLock*)+0x863) [0x7f62acf71f73] 4: (Locker::simple_sync(SimpleLock*, bool*)+0x558) [0x7f62acf75da8] 5: (Locker::file_eval(ScatterLock*, bool*)+0x3e6) [0x7f62acf7b6f6] 6: (Locker::try_eval(SimpleLock*, bool*)+0x6ee) [0x7f62acf7c89e] 7: (Locker::wrlock_finish(SimpleLock*, MutationImpl*, bool*)+0x26e) [0x7f62acf7f30e] 8: (Locker::_drop_non_rdlocks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x22c) [0x7f62acf82c7c] 9: (Locker::drop_locks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x76) [0x7f62acf83046] 10: (MDCache::repair_inode_stats_work(boost::intrusive_ptr<MDRequestImpl>&)+0x9ca) [0x7f62acee996a] 11: (MDCache::repair_inode_stats(CInode*)+0x73) [0x7f62acee9ba3] 12: (()+0x4a2016) [0x7f62acffb016] 13: (Continuation::_continue_function(int, int)+0x1aa) [0x7f62ad02c00a] 14: (()+0x4c2583) [0x7f62ad01b583] 15: (()+0x4c4aa6) [0x7f62ad01daa6] 16: (Continuation::_continue_function(int, int)+0x1aa) [0x7f62ad02c00a] 17: (Continuation::Callback::finish(int)+0x10) [0x7f62ad02c0f0] 18: (Context::complete(int)+0x9) [0x7f62acdfc859] 19: (MDSIOContextBase::complete(int)+0xa4) [0x7f62ad062ba4] 20: (Finisher::finisher_thread_entry()+0x1c5) [0x7f62ad115ef5] 21: (()+0x7dc5) [0x7f62aac48dc5] 22: (clone()+0x6d) [0x7f62a9d2d73d]
Updated by Douglas Fuller almost 7 years ago
- Status changed from New to Need More Info
- Assignee changed from Douglas Fuller to John Spray
wait_for_daemons should wait for every daemon regardless of filesystem. is there a failure log I can look at?
Updated by John Spray almost 7 years ago
I'm not seeing how Filesystem.are_daemons_healthy is waiting for daemons outside the filesystem: it's inspecting daemosn in self.get_mds_map().
Here's the failure: http://pulpito.ceph.com/jspray-2017-06-15_20:24:16-fs-wip-jcsp-testing-20170615-distro-basic-smithi/1292005
Updated by John Spray almost 7 years ago
- Status changed from Need More Info to New
Updated by Douglas Fuller almost 7 years ago
- Assignee changed from John Spray to Douglas Fuller
Updated by Douglas Fuller almost 7 years ago
Updated by Patrick Donnelly over 6 years ago
- Status changed from Resolved to Pending Backport
- Backport set to luminous
This should resolve this failure in 12.2.1 testing:
Updated by Nathan Cutler over 6 years ago
@Patrick: So both https://github.com/ceph/ceph/pull/16305 and https://github.com/ceph/ceph/pull/17849 need to be backported to luminous?
Updated by Nathan Cutler over 6 years ago
- Copied to Backport #21490: luminous: test_rebuild_simple_altpool triggers MDS assertion added
Updated by Patrick Donnelly over 6 years ago
Nathan Cutler wrote:
@Patrick: So both https://github.com/ceph/ceph/pull/16305 and https://github.com/ceph/ceph/pull/17849 need to be backported to luminous?
Yes
Updated by Nathan Cutler over 6 years ago
- Status changed from Pending Backport to Resolved