Bug #39026

Updated by Patrick Donnelly almost 2 years ago

On version 12.2.10


1767 2019-03-11 18:21:16.251278 7fe2cc325700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/ huge/release/12.2.10/rpm/el7/BUILD/ceph-12.2.10/src/mds/ In function 'void Server::handle_client_reconnect(MClientReconnect*)' thread 7fe2cc325700 time 2019-03-11 18:21:16.248739
1768 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.10/rpm/el7/BUILD/ceph-12. 2.10/src/mds/ 948: FAILED assert(session)

Normarly, after reconnect timeout, we killed all the session that hadn't received and handled reconnect msg. But some reconnect msgs were received and not yet handled before the relevant session was killed. So when the msg was being handled the session would be null. That's why mds crashed.
After knowing the session was killed ,the client wouldn't wait for the ack for reconnect msg,so omitting the reconnect msg may be the proper way.