Actions
Bug #18646
closedmds: rejoin_import_cap FAILED assert(session)
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2017-01-17T01:22:49.274 INFO:tasks.ceph.mds.b.mira101.stderr:/mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-6678-gda73c09/rpm/el7/BUILD/ceph-11.1.0-6678-gda73c09/src/mds/MDCache.cc: In function 'Capability* MDCache::rejoin_import_cap(CInode*, client_t, const cap_reconnect_t&, mds_rank_t)' thread 7f28bb8fd700 time 2017-01-17 01:22:49.270835 2017-01-17T01:22:49.274 INFO:tasks.ceph.mds.b.mira101.stderr:/mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-6678-gda73c09/rpm/el7/BUILD/ceph-11.1.0-6678-gda73c09/src/mds/MDCache.cc: 5555: FAILED assert(session) 2017-01-17T01:22:49.274 INFO:tasks.ceph.mds.b.mira101.stderr: ceph version 11.1.0-6678-gda73c09 (da73c09995c9be5fca8d078223e0e9f3d071b2ab) 2017-01-17T01:22:49.274 INFO:tasks.ceph.mds.b.mira101.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7e) [0x7f28c1db3b0e] 2017-01-17T01:22:49.275 INFO:tasks.ceph.mds.b.mira101.stderr: 2: (MDCache::rejoin_import_cap(CInode*, client_t, cap_reconnect_t const&, int)+0x23d) [0x5632a76f2ead] 2017-01-17T01:22:49.275 INFO:tasks.ceph.mds.b.mira101.stderr: 3: (MDCache::handle_cache_rejoin_weak(MMDSCacheRejoin*)+0x1991) [0x5632a772dd11] 2017-01-17T01:22:49.275 INFO:tasks.ceph.mds.b.mira101.stderr: 4: (MDCache::handle_cache_rejoin(MMDSCacheRejoin*)+0x24b) [0x5632a77322ab] 2017-01-17T01:22:49.275 INFO:tasks.ceph.mds.b.mira101.stderr: 5: (MDCache::dispatch(Message*)+0xa5) [0x5632a7737685] 2017-01-17T01:22:49.275 INFO:tasks.ceph.mds.b.mira101.stderr: 6: (MDSRank::handle_deferrable_message(Message*)+0x5bc) [0x5632a762db2c] 2017-01-17T01:22:49.275 INFO:tasks.ceph.mds.b.mira101.stderr: 7: (MDSRank::_dispatch(Message*, bool)+0x20c) [0x5632a76372bc] 2017-01-17T01:22:49.275 INFO:tasks.ceph.mds.b.mira101.stderr: 8: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x5632a7638485] 2017-01-17T01:22:49.275 INFO:tasks.ceph.mds.b.mira101.stderr: 9: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x5632a7625d03] 2017-01-17T01:22:49.275 INFO:tasks.ceph.mds.b.mira101.stderr: 10: (DispatchQueue::entry()+0x7a2) [0x7f28c1e0fff2] 2017-01-17T01:22:49.276 INFO:tasks.ceph.mds.b.mira101.stderr: 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f28c1ea046d] 2017-01-17T01:22:49.276 INFO:tasks.ceph.mds.b.mira101.stderr: 12: (()+0x7dc5) [0x7f28c06b6dc5] 2017-01-17T01:22:49.276 INFO:tasks.ceph.mds.b.mira101.stderr: 13: (clone()+0x6d) [0x7f28bf58c73d]
To me it looks like this bug is caused by a stopping MDS that has removed a client session but (due to another MDS failing) gets a MMDSCacheRejoin message for the client that's been removed. This causes a session lookup failure in rejoin_import_cap:
Session *session = mds->sessionmap.get_session(entity_name_t::CLIENT(client.v)); assert(session);
[I think we could also see this if a client hasn't contacted an MDS which is importing caps (without any MDS failures). Is that reasonable?]
Actions