Project

General

Profile

Actions

Bug #4390

closed

mds: zapping named mds causes client assertion

Added by Sam Lang about 11 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hit the following assertion on the client with backtrace testing:

../../src/mds/MDSMap.h: In function 'const entity_inst_t MDSMap::get_inst(int)' thread 7f565b911700 time 2013-03-07 17:32:14.584546
../../src/mds/MDSMap.h: 466: FAILED assert(up.count(m))
ceph version 0.56-1060-gf907468 (f907468bbadf129a66c9bf07b854eec2beca1a2d)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x7f565e2ef101]
2: (MDSMap::get_inst(int)+0x51) [0x7f565e11f837]
3: (Client::send_cap(Inode*, int, Cap*, int, int, int, int)+0xa48) [0x7f565e0df554]
4: (Client::check_caps(Inode*, bool)+0xc3c) [0x7f565e0e02b4]
5: (Client::tick()+0x41b) [0x7f565e0ef20d]
6: (C_C_Tick::finish(int)+0x1f) [0x7f565e12724f]
7: (SafeTimer::timer_thread()+0x36b) [0x7f565e2dea61]
8: (SafeTimerThread::entry()+0x1c) [0x7f565e2dff14]
9: (Thread::_entry_func(void*)+0x23) [0x7f565e2dbef5]
10: (()+0x7e9a) [0x7f5673658e9a]
11: (clone()+0x6d) [0x7f5672a6bcbd]

The problem seems to be in the unique name enforcement code (2e112333). A beacon comes in from the new mds and zaps the old mds from the mdsmap up list, but the new mds isn't added to the up list itself until the next tick. This results in a window where the mdsmap can have no members in up, that instance of the map is sent to the client, and the client hits the above assertion.

Actions

Also available in: Atom PDF