Project

General

Profile

Bug #23452

mds: assertion in MDSRank::validate_sessions

Added by John Spray about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
fsck/damage handling
Target version:
Start date:
03/23/2018
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:

Description

This function is meant to make the MDS more resilient by killing any client sessions that have prealloc_inos that are inconsistent with the inotable. We only touch this path (and see this crash) if the metadata is already inconsistent.

However, it's getting called in the MDS_BOOT_REPLAY_DONE path, which happens while the MDS is still in the replay state, so when it tries to kill a session (which involves writing to mdlog), we assert out (from ceph-users thread "[ceph-users] MDS Bug/Problem" today):

>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x555879e88942]
>  2: (MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)+0x567) [0x555879ddd4e7]
>  3: (Server::journal_close_session(Session*, int, Context*)+0x963) [0x555879b8d603]
>  4: (Server::kill_session(Session*, Context*)+0x1fd) [0x555879b8e56d]
>  5: (MDSRank::validate_sessions()+0x2dc) [0x555879b4c2dc]
>  6: (MDSRank::boot_start(MDSRank::BootStep, int)+0xc08) [0x555879b4d228]
>  7: (MDSInternalContextBase::complete(int)+0x18b) [0x555879dc56db]
>  8: (C_GatherBase<MDSInternalContextBase, MDSInternalContextGather>::sub_finish(MDSInternalContextBase*, int)+0x127) [0x555879b66627]
>  9: (C_GatherBase<MDSInternalContextBase, MDSInternalContextGather>::C_GatherSub::complete(int)+0x21) [0x555879b668d1]
>  10: (MDLog::_replay_thread()+0x43c) [0x555879ddac8c]
>  11: (MDLog::ReplayThread::entry()+0xd) [0x555879b56fcd]
>  12: (()+0x76ba) [0x7fd57f3f06ba]
>  13: (clone()+0x6d) [0x7fd57e45c41d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues

Copied to fs - Backport #23637: luminous: mds: assertion in MDSRank::validate_sessions Resolved

History

#1 Updated by Zheng Yan about 1 year ago

  • Status changed from New to Need Review

#2 Updated by Patrick Donnelly about 1 year ago

  • Subject changed from Assertion in MDSRank::validate_sessions to mds: assertion in MDSRank::validate_sessions
  • Assignee set to Zheng Yan
  • Target version set to v13.0.0
  • Source set to Community (user)
  • Tags set to crash
  • Backport set to luminous
  • Component(FS) MDS added

#3 Updated by Patrick Donnelly about 1 year ago

  • Status changed from Need Review to Pending Backport
  • Tags deleted (crash)
  • Labels (FS) crash added

#4 Updated by Nathan Cutler about 1 year ago

  • Copied to Backport #23637: luminous: mds: assertion in MDSRank::validate_sessions added

#6 Updated by Nathan Cutler about 1 year ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF