Project

General

Profile

Bug #54760

crash: void CDir::try_remove_dentries_for_stray(): assert(dn->get_linkage()->is_null())

Added by Telemetry Bot 9 months ago. Updated about 2 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Telemetry
Tags:
Backport:
quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):

36ea59d6dccbdffb38c85f02556cb3d9c0187609bb4e3d3567e5179063578bb9
748efe27891d82ed2cd877df4f84c86adc1ae85de2ec26017e1c10e6d76ae41c
d0e130ed06fdb3167377559bd5f14974737198bc15d9dcaa0baaa296a5b9e5f5


Description

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=3a7fca349de2f63168745e7a9831f9322c967df7e9a371bc732c51fa10342f8a

Assert condition: dn->get_linkage()->is_null()
Assert function: void CDir::try_remove_dentries_for_stray()

Sanitized backtrace:

    CDir::try_remove_dentries_for_stray()
    MDCache::clear_dirty_bits_for_stray(CInode*)
    StrayManager::_eval_stray(CDentry*)
    StrayManager::eval_stray(CDentry*)
    Server::_unlink_local_finish(boost::intrusive_ptr<MDRequestImpl>&, CDentry*, CDentry*, unsigned long)
    MDSContext::complete(int)
    MDSIOContextBase::complete(int)
    MDSLogContextBase::complete(int)
    Finisher::finisher_thread_entry()

Crash dump sample:
{
    "assert_condition": "dn->get_linkage()->is_null()",
    "assert_file": "mds/CDir.cc",
    "assert_func": "void CDir::try_remove_dentries_for_stray()",
    "assert_line": 769,
    "assert_msg": "mds/CDir.cc: In function 'void CDir::try_remove_dentries_for_stray()' thread 7f4f3c09e700 time 2022-01-11T03:44:37.359449-0600\nmds/CDir.cc: 769: FAILED ceph_assert(dn->get_linkage()->is_null())",
    "assert_thread_name": "MR_Finisher",
    "backtrace": [
        "/lib/x86_64-linux-gnu/libpthread.so.0(+0x12980) [0x7f4f4a2de980]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x19c) [0x7f4f4a98611e]",
        "(ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f4f4a9862a8]",
        "(CDir::try_remove_dentries_for_stray()+0x34c) [0x556abac39b8c]",
        "(MDCache::clear_dirty_bits_for_stray(CInode*)+0x113) [0x556abab34e63]",
        "(StrayManager::_eval_stray(CDentry*)+0x650) [0x556abab95c00]",
        "(StrayManager::eval_stray(CDentry*)+0x1f) [0x556abab9611f]",
        "(Server::_unlink_local_finish(boost::intrusive_ptr<MDRequestImpl>&, CDentry*, CDentry*, unsigned long)+0x313) [0x556abaa61833]",
        "(MDSContext::complete(int)+0x52) [0x556abacefab2]",
        "(MDSIOContextBase::complete(int)+0x51c) [0x556abacf024c]",
        "(MDSLogContextBase::complete(int)+0x40) [0x556abacf0630]",
        "(Finisher::finisher_thread_entry()+0x195) [0x7f4f4a9e7265]",
        "/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f4f4a2d36db]",
        "clone()" 
    ],
    "ceph_version": "16.2.7",
    "crash_id": "2022-01-11T09:44:37.364725Z_f2474b08-81e6-420a-8332-82dca32237bb",
    "entity_name": "mds.f6b66becb969a44093445d3ee66fe274f97a9dce",
    "os_id": "ubuntu",
    "os_name": "Ubuntu",
    "os_version": "18.04.6 LTS (Bionic Beaver)",
    "os_version_id": "18.04",
    "process_name": "ceph-mds",
    "stack_sig": "d0e130ed06fdb3167377559bd5f14974737198bc15d9dcaa0baaa296a5b9e5f5",
    "timestamp": "2022-01-11T09:44:37.364725Z",
    "utsname_machine": "x86_64",
    "utsname_release": "4.15.0-162-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#170-Ubuntu SMP Mon Oct 18 11:38:05 UTC 2021" 
}

History

#1 Updated by Telemetry Bot 9 months ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
  • Affected Versions v14.2.15, v14.2.19, v16.2.7 added

#2 Updated by Venky Shankar 7 months ago

  • Project changed from RADOS to CephFS
  • Target version set to v18.0.0
  • Backport set to quincy, pacific
  • Crash signature (v1) updated (diff)
  • Component(FS) MDS added

#3 Updated by Venky Shankar 6 months ago

  • Category set to Correctness/Safety
  • Assignee set to Venky Shankar

#4 Updated by Venky Shankar 3 months ago

This looks like a race between an unlink and openc (open w/ O_CREAT) in the MDS -- the unlink RPC projects the old and the new (stray) dentry linkages. The projected linkage for the old dentry would be a null dentry. The projected linkages are "popped" after journaling and sending the early reply to the client (Server::_unlink_local_finish()). An openc from another client after the early reply, will use the null dentry and project it with a new inode. At this point, thew old dentry is not null anymore, which trips the check in CDir::try_remove_dentries_for_stray():

-> Server::_unlink_local_finish()
  -> MDCache::notify_stray()
    -> StrayManager::eval_stray()
      -> StrayManager::_eval_stray()
        -> MDCache::clear_dirty_bits_for_stray()
          -> CDir::try_remove_dentries_for_stray()
            -> ceph_assert(dn->get_linkage()->is_null())

#5 Updated by Venky Shankar 3 months ago

I think https://github.com/ceph/ceph/pull/46331 would mitigate this issue, however, the unlink and openc are from different clients in this case.

#6 Updated by Venky Shankar about 2 months ago

  • Status changed from New to Closed

Venky Shankar wrote:

I think https://github.com/ceph/ceph/pull/46331 would mitigate this issue, however, the unlink and openc are from different clients in this case.

PR merged.

Also available in: Atom PDF