Project

General

Profile

Actions

Bug #23452

closed

mds: assertion in MDSRank::validate_sessions

Added by John Spray about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
fsck/damage handling
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This function is meant to make the MDS more resilient by killing any client sessions that have prealloc_inos that are inconsistent with the inotable. We only touch this path (and see this crash) if the metadata is already inconsistent.

However, it's getting called in the MDS_BOOT_REPLAY_DONE path, which happens while the MDS is still in the replay state, so when it tries to kill a session (which involves writing to mdlog), we assert out (from ceph-users thread "[ceph-users] MDS Bug/Problem" today):

>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x555879e88942]
>  2: (MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)+0x567) [0x555879ddd4e7]
>  3: (Server::journal_close_session(Session*, int, Context*)+0x963) [0x555879b8d603]
>  4: (Server::kill_session(Session*, Context*)+0x1fd) [0x555879b8e56d]
>  5: (MDSRank::validate_sessions()+0x2dc) [0x555879b4c2dc]
>  6: (MDSRank::boot_start(MDSRank::BootStep, int)+0xc08) [0x555879b4d228]
>  7: (MDSInternalContextBase::complete(int)+0x18b) [0x555879dc56db]
>  8: (C_GatherBase<MDSInternalContextBase, MDSInternalContextGather>::sub_finish(MDSInternalContextBase*, int)+0x127) [0x555879b66627]
>  9: (C_GatherBase<MDSInternalContextBase, MDSInternalContextGather>::C_GatherSub::complete(int)+0x21) [0x555879b668d1]
>  10: (MDLog::_replay_thread()+0x43c) [0x555879ddac8c]
>  11: (MDLog::ReplayThread::entry()+0xd) [0x555879b56fcd]
>  12: (()+0x76ba) [0x7fd57f3f06ba]
>  13: (clone()+0x6d) [0x7fd57e45c41d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #23637: luminous: mds: assertion in MDSRank::validate_sessionsResolvedPrashant DActions
Actions #1

Updated by Zheng Yan about 6 years ago

  • Status changed from New to Fix Under Review
Actions #2

Updated by Patrick Donnelly about 6 years ago

  • Subject changed from Assertion in MDSRank::validate_sessions to mds: assertion in MDSRank::validate_sessions
  • Assignee set to Zheng Yan
  • Target version set to v13.0.0
  • Source set to Community (user)
  • Tags set to crash
  • Backport set to luminous
  • Component(FS) MDS added
Actions #3

Updated by Patrick Donnelly about 6 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Tags deleted (crash)
  • Labels (FS) crash added
Actions #4

Updated by Nathan Cutler about 6 years ago

  • Copied to Backport #23637: luminous: mds: assertion in MDSRank::validate_sessions added
Actions #6

Updated by Nathan Cutler almost 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF