Bug #11481: "mds/MDSTable.cc: 146: FAILED assert(is_undef())" on standby->replay transition - CephFS - Ceph

Actions

Copy link

Bug #11481

closed

"mds/MDSTable.cc: 146: FAILED assert(is_undef())" on standby->replay transition

Added by Greg Farnum almost 9 years ago. Updated almost 8 years ago.

Status:

Resolved

Priority:

High

Assignee:

John Spray

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

hammer

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

http://pulpito-rdu.front.sepia.ceph.com/teuthology-2015-04-22_11:17:57-fs-hammer-testing-basic-typica/2417/

2015-04-22 15:36:47.891526 7fde5b0cd700  1 -- 172.20.133.65:6811/16012 <== mon.1 172.20.133.69:6789/0 54 ==== mdsmap(e 23) v1 ==== 867+0+0 (3456897465 0 0) 0x5a88480 con 0x39ef6e0
2015-04-22 15:36:47.891561 7fde5b0cd700  5 mds.-1.0 handle_mds_map epoch 23 from mon.1
2015-04-22 15:36:47.891578 7fde5b0cd700 10 mds.-1.0      my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table}
2015-04-22 15:36:47.891584 7fde5b0cd700 10 mds.-1.0  mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table}
2015-04-22 15:36:47.891587 7fde5b0cd700 10 mds.0.5 map says i am 172.20.133.65:6811/16012 mds.0.5 state up:replay
2015-04-22 15:36:47.891591 7fde5b0cd700  1 mds.0.5 handle_mds_map i am now mds.0.5
2015-04-22 15:36:47.891593 7fde5b0cd700  1 mds.0.5 handle_mds_map state change up:standby --> up:replay
2015-04-22 15:36:47.891594 7fde5b0cd700 10 mds.0.5 set_want_state up:standby -> up:replay
2015-04-22 15:36:47.891597 7fde5b0cd700  1 mds.0.5 replay_start
2015-04-22 15:36:47.891599 7fde5b0cd700  7 mds.0.cache set_recovery_set 
2015-04-22 15:36:47.891601 7fde5b0cd700  1 mds.0.5  recovery set is 
2015-04-22 15:36:47.891604 7fde5b0cd700  2 mds.0.5 boot_start 0: opening inotable
2015-04-22 15:36:47.891608 7fde5b0cd700 10 mds.0.inotable: load
2015-04-22 15:36:47.894085 7fde5b0cd700 -1 mds/MDSTable.cc: In function 'void MDSTable::load(MDSInternalContextBase*)' thread 7fde5b0cd700 time 2015-04-22 15:36:47.891613
mds/MDSTable.cc: 146: FAILED assert(is_undef())

 ceph version 0.94.1-6-g8a58d83 (8a58d83b0d039d2c2be353fee9c57c4e6181b662)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x966faf]
 2: (MDSTable::load(MDSInternalContextBase*)+0x4a6) [0x7b7366]
 3: (MDS::boot_start(MDS::BootStep, int)+0x381) [0x5bfc91]
 4: (MDS::replay_start()+0x127) [0x5c0ea7]
 5: (MDS::handle_mds_map(MMDSMap*)+0x2fb7) [0x5c4107]
 6: (MDS::handle_core_message(Message*)+0x26b) [0x5c5bbb]
 7: (MDS::_dispatch(Message*)+0x2b) [0x5c626b]
 8: (MDS::ms_dispatch(Message*)+0x1e4) [0x5c7704]
 9: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0xa5a357]
 10: (DispatchQueue::entry()+0x44a) [0xa574fa]
 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x94facd]
 12: (()+0x7e9a) [0x7fde60197e9a]
 13: (clone()+0x6d) [0x7fde5eb5e2ed]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

This probably is because of our recent changes to MDSTable handling.

Actions

Copy link

Updated by Greg Farnum almost 9 years ago

Description updated (diff)

Actions

Copy link

Updated by Zheng Yan almost 9 years ago

I can't access http://pulpito-rdu.front.sepia.ceph.com/. can anyone post full mds log here

Actions

Copy link

Updated by John Spray almost 9 years ago

Weird! This is an MDS going from state active to state standby, then back again. In the final stage it notices that its already initialized inotable, so asserts out.

Actions

Copy link

Updated by John Spray almost 9 years ago

This looks like a mon bug of some kind. After a leader election, a bunch of routed messages are being resent to the leader by another mon. These include boot beacons from various MDS instances. The daemon states in the MDSMap flap around like this:

--
4107:    172.20.133.69:6808/20631 'a' mds.-1.0 up:standby seq 1
4129:    172.20.133.65:6811/16012 'a-s' mds.0.4 up:active seq 10
--
4107:    172.20.133.69:6808/20631 'a' mds.-1.0 up:standby seq 1
4111:    172.20.133.65:6808/16012 'a-s' mds.-1.0 up:standby seq 1
--
4107:    172.20.133.69:6808/20631 'a' mds.-1.0 up:standby seq 1
4123:    172.20.133.65:6809/16012 'a-s' mds.-1.0 up:standby seq 1
--
4107:    172.20.133.69:6808/20631 'a' mds.-1.0 up:standby seq 1
4126:    172.20.133.65:6810/16012 'a-s' mds.-1.0 up:standby seq 1
--
4107:    172.20.133.69:6808/20631 'a' mds.-1.0 up:standby seq 1
4129:    172.20.133.65:6811/16012 'a-s' mds.-1.0 up:standby seq 1
--
4107:    172.20.133.69:6808/20631 'a' mds.-1.0 up:standby seq 1
4129:    172.20.133.65:6811/16012 'a-s' mds.-1.0 up:standby seq 1
--
4107:    172.20.133.69:6808/20631 'a' mds.-1.0 up:standby seq 1
4129:    172.20.133.65:6811/16012 'a-s' mds.0.5 up:replay seq 1

(the '4129' GID is the one that is then crashing, as a result of being sent back into standby after having been active)

Actions

Copy link

Updated by John Spray almost 9 years ago

The mons appear to be rather unhappy in this test run, but there are no thrashers or messenger failure settings turned on. So I wonder if this was an existing bug, and we're only seeing it now because of something nasty with the hosts or network in the new lab that's causing lots of thrashing of the monitors?

The MDSs are also getting restarted for some reason (before we see the failure condition), but there's nothing in the logs to indicate why.

Was there ever a mechanism in MDSMonitor to protect against seeing a message from a dead MDS after we see the message from a live MDS that replaces it on the same host with the same ID?

Actions

Copy link

Updated by Greg Farnum almost 9 years ago

Regression set to No

For some reason the leader is not responding to all of the MDSBeacon messages it receives (but it does respond to others), and so the ones it didn't respond to are getting re-routed on each election. This is deliberate behavior on the part of the peon.

Moreover, there is a seq value in the MDSBeacon and the monitors are supposed to ignore any of them with a seq value lower than what's in the map.

Based on C_Updated and the comment in MDSMonitor::_updated, it looks to me like we're expecting the MDS to get a beacon reply during update_from_paxos, but there's absolutely nothing that would do that. It looks to me like it's just very broken and I'm not sure why we wouldn't have seen this in the past — perhaps our timeouts are just generous enough it usually doesn't cause issues. :/

Actions

Copy link

Updated by Greg Farnum almost 9 years ago

John, can you do something so the MDS resets itself if it gets moved out of the active state? I think there might be ad-hoc mechanisms to do that in some places already, and although it should be either actually dead or else blacklisted whenever that transition happens, it's always good to be resilient.

I've created #11590 to track the monitor beacon replies bit of this. I'm still unclear on how the monitor could have accepted any outdated MDSBeacon messages though — and also on how it's skipping all the debugging output I'd expect to see on which paths it's taking (these logs appear to be at mon debug 20).

Actions

Copy link