Project

General

Profile

Bug #18717

multimds: FAILED assert(0 == "got export_cancel in weird state")

Added by Patrick Donnelly 9 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
multi-MDS
Target version:
-
Start date:
01/27/2017
Due date:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Component(FS):
MDS
Needs Doc:
No

Description

Assertion: /tmp/buildd/ceph-11.1.0-6912-gaf9152f/src/mds/Migrator.cc: 2001: FAILED assert(0 == "got export_cancel in weird state")
ceph version 11.1.0-6912-gaf9152f (af9152f34a416a46bbf96b14b1416ca004c80f84)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x10e) [0x7f88e28b827e]
 2: (Migrator::handle_export_cancel(MExportDirCancel*)+0x2fd) [0x55827032eb3d]
 3: (Migrator::dispatch(Message*)+0x85) [0x5582703369f5]
 4: (MDSRank::handle_deferrable_message(Message*)+0x5f3) [0x55827019c963]
 5: (MDSRank::_dispatch(Message*, bool)+0x1d8) [0x5582701a5c48]
 6: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x5582701a6d75]
 7: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x558270195213]
 8: (DispatchQueue::entry()+0x78b) [0x7f88e291131b]
 9: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f88e299b03d]
 10: (()+0x8184) [0x7f88e2427184]
 11: (clone()+0x6d) [0x7f88e152837d]
1 jobs: ['755936']
suites: ['ceph-thrash/default.yaml', 'clusters/9-mds.yaml', 'frag_enable.yaml', 'fs/xfs.yaml', 'mount/fuse.yaml', 'msgr-failures/osd-mds-delay.yaml', 'multimds:thrash/{begin.yaml', 'overrides/{fuse-default-perm-no.yaml', 'tasks/cfuse_workunit_suites_pjd.yaml}', 'thrash/{debug.yaml', 'whitelist_wrongly_marked_down.yaml}}']

From http://pulpito.ceph.com/pdonnell-2017-01-27_17:57:10-multimds:thrash-wip-multimds-tests-testing-basic-mira/755936/

History

#1 Updated by Zheng Yan 9 months ago

ceph-mds.a.log

2017-01-27 18:20:31.516989 7fe18c464700 10 mds.0.7 send_message_mds mds.1 not up, dropping export_discover(10000000003 #1/client.0/tmp/pjd-fstest-20090130-RC) v1

ceph.log

2017-01-27 18:20:31.513898 mon.0 172.21.7.120:6789/0 568 : cluster [INF] fsmap e126: 3/3/3 up {0=g=up:active,4=d=up:active,6=e=up:active}, 5 up:standby

cluster states like this are not supposed to exist. I think we should limit how mds cluster gets shrinked. Maybe only allow stopping one mds at a time.

#2 Updated by Zheng Yan 9 months ago

  • Status changed from New to Verified

Also available in: Atom PDF