Actions
Bug #41187
closedsrc/mds/Server.cc: 958: FAILED assert(in->snaprealm)
Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
We saw this on a cluster where two out of three mds servers needed to be rebooted. After the reboot both mds dumped core, while the remaining mds keeps running.
2019-07-17 09:33:01.655344 7fc31c040700 -1 /home/abuild/rpmbuild/BUILD/ceph-12.2.8-467-g080f2248ff/src/mds/Server.cc: In function 'void Server::handle_client_reconnect(MClientReconnect*)' thread 7fc31c040700 time 2019-07-17 09:33:01.653041 /home/abuild/rpmbuild/BUILD/ceph-12.2.8-467-g080f2248ff/src/mds/Server.cc: 958: FAILED assert(in->snaprealm) ceph version 12.2.8-467-g080f2248ff (080f2248ff90306f7b2c50b2ddd8da094e5ca3a7) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x10e) [0x555635e18c0e] 2: (Server::handle_client_reconnect(MClientReconnect*)+0x1b67) [0x555635b61287] 3: (Server::dispatch(Message*)+0x305) [0x555635b86ef5] 4: (MDSRank::handle_deferrable_message(Message*)+0x7ec) [0x555635aff72c] 5: (MDSRank::_dispatch(Message*, bool)+0x1d3) [0x555635b0d4e3] 6: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x555635b0e335] 7: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x555635af7053] 8: (DispatchQueue::entry()+0x78b) [0x5556360f3deb] 9: (DispatchQueue::DispatchThread::entry()+0xd) [0x555635e9936d] 10: (()+0x8724) [0x7fc321508724] 11: (clone()+0x6d) [0x7fc32057de8d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Clients were ceph-fuse, snapshots were not used (as reported by the user).
I realize this is a rather old luminous, but noticed that
commit 34682e447522b94068425b761772d6a52634477c Author: Yan, Zheng <zyan@redhat.com> Date: Mon Jul 31 21:12:25 2017 +0800 mds: rollback snaprealms when rolling back slave request
removed that particular assertion in mimic. So this might be in newer luminous point releases.
Unfortunately this cluster has since been scrapped. Any insight into what can cause this?
Updated by Patrick Donnelly over 4 years ago
- Status changed from New to Rejected
Sorry Jan, snapshots are not stable in Luminous and we don't spend time looking at snapshot related failures for that release.
Actions