Project

General

Profile

Bug #39026

mds: crash during mds restart

Added by shen hang 3 months ago. Updated about 20 hours ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
nautilus,mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:

Description

On version 12.2.10

1767 2019-03-11 18:21:16.251278 7fe2cc325700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/     huge/release/12.2.10/rpm/el7/BUILD/ceph-12.2.10/src/mds/Server.cc: In function 'void Server::handle_client_reconnect(MClientReconnect*)' thread 7fe2cc325700 time 2019-03-11      18:21:16.248739
1768 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.10/rpm/el7/BUILD/ceph-12.     2.10/src/mds/Server.cc: 948: FAILED assert(session)

Normarly, after reconnect timeout, we killed all the session that hadn't received and handled reconnect msg. But some reconnect msgs were received and not yet handled before the relevant session was killed. So when the msg was being handled the session would be null. That's why mds crashed.
After knowing the session was killed ,the client wouldn't wait for the ack for reconnect msg,so omitting the reconnect msg may be the proper way.


Related issues

Copied to fs - Backport #39191: luminous: mds: crash during mds restart Resolved
Copied to fs - Backport #39192: nautilus: mds: crash during mds restart Resolved
Copied to fs - Backport #39193: mimic: mds: crash during mds restart Resolved

History

#2 Updated by Patrick Donnelly 3 months ago

  • Subject changed from mds:we encountered crash when mds restart. to mds: crash during mds restart
  • Status changed from New to Need Review
  • Assignee set to shen hang
  • Target version set to v15.0.0
  • Start date deleted (03/29/2019)
  • Source set to Community (dev)
  • Tags deleted (cephfs mds )
  • Backport set to nautilus,mimic,luminous
  • Pull request ID set to 27256

#3 Updated by Patrick Donnelly 3 months ago

  • Project changed from Ceph to fs
  • Component(FS) MDS added

#4 Updated by Patrick Donnelly 3 months ago

  • Description updated (diff)
  • Status changed from Need Review to Pending Backport
  • Affected Versions v12.2.10 added

#5 Updated by Nathan Cutler 3 months ago

#6 Updated by Nathan Cutler 3 months ago

#7 Updated by Nathan Cutler 3 months ago

#8 Updated by Nathan Cutler about 20 hours ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF