Bug #18717: multimds: FAILED assert(0 == "got export_cancel in weird state") - CephFS - Ceph

Actions

Copy link

Bug #18717

closed

multimds: FAILED assert(0 == "got export_cancel in weird state")

Added by Patrick Donnelly about 7 years ago. Updated about 5 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

multimds

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Assertion: /tmp/buildd/ceph-11.1.0-6912-gaf9152f/src/mds/Migrator.cc: 2001: FAILED assert(0 == "got export_cancel in weird state")
ceph version 11.1.0-6912-gaf9152f (af9152f34a416a46bbf96b14b1416ca004c80f84)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x10e) [0x7f88e28b827e]
 2: (Migrator::handle_export_cancel(MExportDirCancel*)+0x2fd) [0x55827032eb3d]
 3: (Migrator::dispatch(Message*)+0x85) [0x5582703369f5]
 4: (MDSRank::handle_deferrable_message(Message*)+0x5f3) [0x55827019c963]
 5: (MDSRank::_dispatch(Message*, bool)+0x1d8) [0x5582701a5c48]
 6: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x5582701a6d75]
 7: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x558270195213]
 8: (DispatchQueue::entry()+0x78b) [0x7f88e291131b]
 9: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f88e299b03d]
 10: (()+0x8184) [0x7f88e2427184]
 11: (clone()+0x6d) [0x7f88e152837d]
1 jobs: ['755936']
suites: ['ceph-thrash/default.yaml', 'clusters/9-mds.yaml', 'frag_enable.yaml', 'fs/xfs.yaml', 'mount/fuse.yaml', 'msgr-failures/osd-mds-delay.yaml', 'multimds:thrash/{begin.yaml', 'overrides/{fuse-default-perm-no.yaml', 'tasks/cfuse_workunit_suites_pjd.yaml}', 'thrash/{debug.yaml', 'whitelist_wrongly_marked_down.yaml}}']

From http://pulpito.ceph.com/pdonnell-2017-01-27_17:57:10-multimds:thrash-wip-multimds-tests-testing-basic-mira/755936/

Actions

Copy link

Updated by Zheng Yan about 7 years ago

ceph-mds.a.log

2017-01-27 18:20:31.516989 7fe18c464700 10 mds.0.7 send_message_mds mds.1 not up, dropping export_discover(10000000003 #1/client.0/tmp/pjd-fstest-20090130-RC) v1

ceph.log

2017-01-27 18:20:31.513898 mon.0 172.21.7.120:6789/0 568 : cluster [INF] fsmap e126: 3/3/3 up {0=g=up:active,4=d=up:active,6=e=up:active}, 5 up:standby

cluster states like this are not supposed to exist. I think we should limit how mds cluster gets shrinked. Maybe only allow stopping one mds at a time.

Actions

Copy link