Project

General

Profile

Bug #11481

"mds/MDSTable.cc: 146: FAILED assert(is_undef())" on standby->replay transition

Added by Greg Farnum about 4 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
04/27/2015
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:

Description

http://pulpito-rdu.front.sepia.ceph.com/teuthology-2015-04-22_11:17:57-fs-hammer-testing-basic-typica/2417/

2015-04-22 15:36:47.891526 7fde5b0cd700  1 -- 172.20.133.65:6811/16012 <== mon.1 172.20.133.69:6789/0 54 ==== mdsmap(e 23) v1 ==== 867+0+0 (3456897465 0 0) 0x5a88480 con 0x39ef6e0
2015-04-22 15:36:47.891561 7fde5b0cd700  5 mds.-1.0 handle_mds_map epoch 23 from mon.1
2015-04-22 15:36:47.891578 7fde5b0cd700 10 mds.-1.0      my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table}
2015-04-22 15:36:47.891584 7fde5b0cd700 10 mds.-1.0  mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table}
2015-04-22 15:36:47.891587 7fde5b0cd700 10 mds.0.5 map says i am 172.20.133.65:6811/16012 mds.0.5 state up:replay
2015-04-22 15:36:47.891591 7fde5b0cd700  1 mds.0.5 handle_mds_map i am now mds.0.5
2015-04-22 15:36:47.891593 7fde5b0cd700  1 mds.0.5 handle_mds_map state change up:standby --> up:replay
2015-04-22 15:36:47.891594 7fde5b0cd700 10 mds.0.5 set_want_state up:standby -> up:replay
2015-04-22 15:36:47.891597 7fde5b0cd700  1 mds.0.5 replay_start
2015-04-22 15:36:47.891599 7fde5b0cd700  7 mds.0.cache set_recovery_set 
2015-04-22 15:36:47.891601 7fde5b0cd700  1 mds.0.5  recovery set is 
2015-04-22 15:36:47.891604 7fde5b0cd700  2 mds.0.5 boot_start 0: opening inotable
2015-04-22 15:36:47.891608 7fde5b0cd700 10 mds.0.inotable: load
2015-04-22 15:36:47.894085 7fde5b0cd700 -1 mds/MDSTable.cc: In function 'void MDSTable::load(MDSInternalContextBase*)' thread 7fde5b0cd700 time 2015-04-22 15:36:47.891613
mds/MDSTable.cc: 146: FAILED assert(is_undef())

 ceph version 0.94.1-6-g8a58d83 (8a58d83b0d039d2c2be353fee9c57c4e6181b662)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x966faf]
 2: (MDSTable::load(MDSInternalContextBase*)+0x4a6) [0x7b7366]
 3: (MDS::boot_start(MDS::BootStep, int)+0x381) [0x5bfc91]
 4: (MDS::replay_start()+0x127) [0x5c0ea7]
 5: (MDS::handle_mds_map(MMDSMap*)+0x2fb7) [0x5c4107]
 6: (MDS::handle_core_message(Message*)+0x26b) [0x5c5bbb]
 7: (MDS::_dispatch(Message*)+0x2b) [0x5c626b]
 8: (MDS::ms_dispatch(Message*)+0x1e4) [0x5c7704]
 9: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0xa5a357]
 10: (DispatchQueue::entry()+0x44a) [0xa574fa]
 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x94facd]
 12: (()+0x7e9a) [0x7fde60197e9a]
 13: (clone()+0x6d) [0x7fde5eb5e2ed]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

This probably is because of our recent changes to MDSTable handling.

Associated revisions

Revision 0c5b9a2b (diff)
Added by John Spray about 4 years ago

mds: validate the state+rank in MDS map

Especially: * once I have been assigned a rank, it
can't be taken away without restarting
the daemon. * once I have entered standby, I can
only go upwards through the states.

Fixes: #11481
Signed-off-by: John Spray <>

History

#1 Updated by Greg Farnum about 4 years ago

  • Description updated (diff)

#2 Updated by Zheng Yan about 4 years ago

I can't access http://pulpito-rdu.front.sepia.ceph.com/. can anyone post full mds log here

#3 Updated by John Spray about 4 years ago

Weird! This is an MDS going from state active to state standby, then back again. In the final stage it notices that its already initialized inotable, so asserts out.

#4 Updated by John Spray about 4 years ago

This looks like a mon bug of some kind. After a leader election, a bunch of routed messages are being resent to the leader by another mon. These include boot beacons from various MDS instances. The daemon states in the MDSMap flap around like this:

--
4107:    172.20.133.69:6808/20631 'a' mds.-1.0 up:standby seq 1
4129:    172.20.133.65:6811/16012 'a-s' mds.0.4 up:active seq 10
--
4107:    172.20.133.69:6808/20631 'a' mds.-1.0 up:standby seq 1
4111:    172.20.133.65:6808/16012 'a-s' mds.-1.0 up:standby seq 1
--
4107:    172.20.133.69:6808/20631 'a' mds.-1.0 up:standby seq 1
4123:    172.20.133.65:6809/16012 'a-s' mds.-1.0 up:standby seq 1
--
4107:    172.20.133.69:6808/20631 'a' mds.-1.0 up:standby seq 1
4126:    172.20.133.65:6810/16012 'a-s' mds.-1.0 up:standby seq 1
--
4107:    172.20.133.69:6808/20631 'a' mds.-1.0 up:standby seq 1
4129:    172.20.133.65:6811/16012 'a-s' mds.-1.0 up:standby seq 1
--
4107:    172.20.133.69:6808/20631 'a' mds.-1.0 up:standby seq 1
4129:    172.20.133.65:6811/16012 'a-s' mds.-1.0 up:standby seq 1
--
4107:    172.20.133.69:6808/20631 'a' mds.-1.0 up:standby seq 1
4129:    172.20.133.65:6811/16012 'a-s' mds.0.5 up:replay seq 1

(the '4129' GID is the one that is then crashing, as a result of being sent back into standby after having been active)

#5 Updated by John Spray about 4 years ago

The mons appear to be rather unhappy in this test run, but there are no thrashers or messenger failure settings turned on. So I wonder if this was an existing bug, and we're only seeing it now because of something nasty with the hosts or network in the new lab that's causing lots of thrashing of the monitors?

The MDSs are also getting restarted for some reason (before we see the failure condition), but there's nothing in the logs to indicate why.

Was there ever a mechanism in MDSMonitor to protect against seeing a message from a dead MDS after we see the message from a live MDS that replaces it on the same host with the same ID?

#6 Updated by Greg Farnum about 4 years ago

  • Regression set to No

For some reason the leader is not responding to all of the MDSBeacon messages it receives (but it does respond to others), and so the ones it didn't respond to are getting re-routed on each election. This is deliberate behavior on the part of the peon.

Moreover, there is a seq value in the MDSBeacon and the monitors are supposed to ignore any of them with a seq value lower than what's in the map.

Based on C_Updated and the comment in MDSMonitor::_updated, it looks to me like we're expecting the MDS to get a beacon reply during update_from_paxos, but there's absolutely nothing that would do that. It looks to me like it's just very broken and I'm not sure why we wouldn't have seen this in the past — perhaps our timeouts are just generous enough it usually doesn't cause issues. :/

#7 Updated by Greg Farnum about 4 years ago

John, can you do something so the MDS resets itself if it gets moved out of the active state? I think there might be ad-hoc mechanisms to do that in some places already, and although it should be either actually dead or else blacklisted whenever that transition happens, it's always good to be resilient.

I've created #11590 to track the monitor beacon replies bit of this. I'm still unclear on how the monitor could have accepted any outdated MDSBeacon messages though — and also on how it's skipping all the debugging output I'd expect to see on which paths it's taking (these logs appear to be at mon debug 20).

#8 Updated by Zheng Yan about 4 years ago

  • Status changed from New to Resolved

#10 Updated by Greg Farnum almost 3 years ago

  • Component(FS) MDS added

Also available in: Atom PDF