Project

General

Profile

Actions

Bug #20337

closed

test_rebuild_simple_altpool triggers MDS assertion

Added by John Spray almost 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Category:
Testing
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Two things are going wrong here, I think:
  • The test code is doing a self.fs.wait_for_daemons() (test_data_scan.py:425), but there is also an other_fs in this case which should be waited for as well.
  • The scrub_path command is not checking that the daemon is active before executing.

We should fix the test not to trigger this case, and at the same time fix the MDS to reject the command in this situation instead of asserting out when it tries to repair metadata.

ceph version 12.0.3-1728-g9742c3e (9742c3ee3c45045a4b7853b252ed427748214bb6) luminous (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f62ad116f30]
 2: (MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)+0x164) [0x7f62ad078404]
 3: (Locker::scatter_writebehind(ScatterLock*)+0x863) [0x7f62acf71f73]
 4: (Locker::simple_sync(SimpleLock*, bool*)+0x558) [0x7f62acf75da8]
 5: (Locker::file_eval(ScatterLock*, bool*)+0x3e6) [0x7f62acf7b6f6]
 6: (Locker::try_eval(SimpleLock*, bool*)+0x6ee) [0x7f62acf7c89e]
 7: (Locker::wrlock_finish(SimpleLock*, MutationImpl*, bool*)+0x26e) [0x7f62acf7f30e]
 8: (Locker::_drop_non_rdlocks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x22c) [0x7f62acf82c7c]
 9: (Locker::drop_locks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x76) [0x7f62acf83046]
 10: (MDCache::repair_inode_stats_work(boost::intrusive_ptr<MDRequestImpl>&)+0x9ca) [0x7f62acee996a]
 11: (MDCache::repair_inode_stats(CInode*)+0x73) [0x7f62acee9ba3]
 12: (()+0x4a2016) [0x7f62acffb016]
 13: (Continuation::_continue_function(int, int)+0x1aa) [0x7f62ad02c00a]
 14: (()+0x4c2583) [0x7f62ad01b583]
 15: (()+0x4c4aa6) [0x7f62ad01daa6]
 16: (Continuation::_continue_function(int, int)+0x1aa) [0x7f62ad02c00a]
 17: (Continuation::Callback::finish(int)+0x10) [0x7f62ad02c0f0]
 18: (Context::complete(int)+0x9) [0x7f62acdfc859]
 19: (MDSIOContextBase::complete(int)+0xa4) [0x7f62ad062ba4]
 20: (Finisher::finisher_thread_entry()+0x1c5) [0x7f62ad115ef5]
 21: (()+0x7dc5) [0x7f62aac48dc5]
 22: (clone()+0x6d) [0x7f62a9d2d73d]

Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #21490: luminous: test_rebuild_simple_altpool triggers MDS assertionResolvedNathan CutlerActions
Actions #1

Updated by John Spray almost 7 years ago

  • Assignee set to Douglas Fuller
Actions #2

Updated by Douglas Fuller almost 7 years ago

  • Status changed from New to Need More Info
  • Assignee changed from Douglas Fuller to John Spray

wait_for_daemons should wait for every daemon regardless of filesystem. is there a failure log I can look at?

Actions #3

Updated by John Spray almost 7 years ago

I'm not seeing how Filesystem.are_daemons_healthy is waiting for daemons outside the filesystem: it's inspecting daemosn in self.get_mds_map().

Here's the failure: http://pulpito.ceph.com/jspray-2017-06-15_20:24:16-fs-wip-jcsp-testing-20170615-distro-basic-smithi/1292005

Actions #4

Updated by John Spray almost 7 years ago

  • Status changed from Need More Info to New
Actions #5

Updated by Douglas Fuller almost 7 years ago

  • Assignee changed from John Spray to Douglas Fuller
Actions #6

Updated by Douglas Fuller almost 7 years ago

  • Status changed from New to 17
Actions #8

Updated by Patrick Donnelly over 6 years ago

  • Status changed from 17 to Resolved
Actions #9

Updated by Patrick Donnelly over 6 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to luminous

This should resolve this failure in 12.2.1 testing:

http://tracker.ceph.com/issues/21466#note-1

Actions #10

Updated by Nathan Cutler over 6 years ago

@Patrick: So both https://github.com/ceph/ceph/pull/16305 and https://github.com/ceph/ceph/pull/17849 need to be backported to luminous?

Actions #11

Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #21490: luminous: test_rebuild_simple_altpool triggers MDS assertion added
Actions #12

Updated by Patrick Donnelly over 6 years ago

Nathan Cutler wrote:

@Patrick: So both https://github.com/ceph/ceph/pull/16305 and https://github.com/ceph/ceph/pull/17849 need to be backported to luminous?

Yes

Actions #13

Updated by Nathan Cutler over 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF