Bug #39026: mds: crash during mds restart - CephFS - Ceph

Actions

Copy link

Bug #39026

closed

mds: crash during mds restart

Added by shen hang about 5 years ago. Updated almost 5 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

shen hang

Category:

Target version:

Ceph - v15.0.0

% Done:

Source:

Community (dev)

Tags:

Backport:

nautilus,mimic,luminous

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v12.2.10

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

27256

Crash signature (v1):

Crash signature (v2):

Description

On version 12.2.10

1767 2019-03-11 18:21:16.251278 7fe2cc325700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/     huge/release/12.2.10/rpm/el7/BUILD/ceph-12.2.10/src/mds/Server.cc: In function 'void Server::handle_client_reconnect(MClientReconnect*)' thread 7fe2cc325700 time 2019-03-11      18:21:16.248739
1768 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.10/rpm/el7/BUILD/ceph-12.     2.10/src/mds/Server.cc: 948: FAILED assert(session)

Normarly, after reconnect timeout, we killed all the session that hadn't received and handled reconnect msg. But some reconnect msgs were received and not yet handled before the relevant session was killed. So when the msg was being handled the session would be null. That's why mds crashed.
After knowing the session was killed ,the client wouldn't wait for the ack for reconnect msg,so omitting the reconnect msg may be the proper way.

Related issues 4 (0 open — 4 closed)

Actions

Copy link

Updated by shen hang about 5 years ago

https://github.com/ceph/ceph/pull/27256

Actions

Copy link

Updated by Patrick Donnelly about 5 years ago

Subject changed from mds:we encountered crash when mds restart. to mds: crash during mds restart
Status changed from New to Fix Under Review
Assignee set to shen hang
Target version set to v15.0.0
Start date deleted (~~03/29/2019~~)
Source set to Community (dev)
Tags deleted (~~cephfs mds~~ )
Backport set to nautilus,mimic,luminous
Pull request ID set to 27256

Actions

Copy link

Updated by Patrick Donnelly about 5 years ago

Project changed from Ceph to CephFS
Component(FS) MDS added

Actions

Copy link

Updated by Patrick Donnelly about 5 years ago

Description updated (diff)
Status changed from Fix Under Review to Pending Backport
Affected Versions v12.2.10 added

Actions

Copy link

Updated by Nathan Cutler about 5 years ago

Copied to Backport #39191: luminous: mds: crash during mds restart added

Actions

Copy link

Updated by Nathan Cutler about 5 years ago

Copied to Backport #39192: nautilus: mds: crash during mds restart added

Actions

Copy link

Updated by Nathan Cutler about 5 years ago

Copied to Backport #39193: mimic: mds: crash during mds restart added

Actions

Copy link

Updated by Nathan Cutler almost 5 years ago

Status changed from Pending Backport to Resolved

Actions

Copy link

Updated by Nathan Cutler almost 5 years ago

Related to Bug #40588: mimic: mds: msg weren't destroyed before handle_client_reconnect returned, if the reconnect msg was from non-existent session added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #39026

mds: crash during mds restart

Updated by shen hang about 5 years ago

Updated by Patrick Donnelly about 5 years ago

Updated by Patrick Donnelly about 5 years ago

Updated by Patrick Donnelly about 5 years ago

Updated by Nathan Cutler about 5 years ago

Updated by Nathan Cutler about 5 years ago

Updated by Nathan Cutler about 5 years ago

Updated by Nathan Cutler almost 5 years ago

Updated by Nathan Cutler almost 5 years ago