Bug #54760
closedcrash: void CDir::try_remove_dentries_for_stray(): assert(dn->get_linkage()->is_null())
0%
36ea59d6dccbdffb38c85f02556cb3d9c0187609bb4e3d3567e5179063578bb9
748efe27891d82ed2cd877df4f84c86adc1ae85de2ec26017e1c10e6d76ae41c
d0e130ed06fdb3167377559bd5f14974737198bc15d9dcaa0baaa296a5b9e5f5
Description
Assert condition: dn->get_linkage()->is_null()
Assert function: void CDir::try_remove_dentries_for_stray()
Sanitized backtrace:
CDir::try_remove_dentries_for_stray() MDCache::clear_dirty_bits_for_stray(CInode*) StrayManager::_eval_stray(CDentry*) StrayManager::eval_stray(CDentry*) Server::_unlink_local_finish(boost::intrusive_ptr<MDRequestImpl>&, CDentry*, CDentry*, unsigned long) MDSContext::complete(int) MDSIOContextBase::complete(int) MDSLogContextBase::complete(int) Finisher::finisher_thread_entry()
Crash dump sample:
{ "assert_condition": "dn->get_linkage()->is_null()", "assert_file": "mds/CDir.cc", "assert_func": "void CDir::try_remove_dentries_for_stray()", "assert_line": 769, "assert_msg": "mds/CDir.cc: In function 'void CDir::try_remove_dentries_for_stray()' thread 7f4f3c09e700 time 2022-01-11T03:44:37.359449-0600\nmds/CDir.cc: 769: FAILED ceph_assert(dn->get_linkage()->is_null())", "assert_thread_name": "MR_Finisher", "backtrace": [ "/lib/x86_64-linux-gnu/libpthread.so.0(+0x12980) [0x7f4f4a2de980]", "gsignal()", "abort()", "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x19c) [0x7f4f4a98611e]", "(ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f4f4a9862a8]", "(CDir::try_remove_dentries_for_stray()+0x34c) [0x556abac39b8c]", "(MDCache::clear_dirty_bits_for_stray(CInode*)+0x113) [0x556abab34e63]", "(StrayManager::_eval_stray(CDentry*)+0x650) [0x556abab95c00]", "(StrayManager::eval_stray(CDentry*)+0x1f) [0x556abab9611f]", "(Server::_unlink_local_finish(boost::intrusive_ptr<MDRequestImpl>&, CDentry*, CDentry*, unsigned long)+0x313) [0x556abaa61833]", "(MDSContext::complete(int)+0x52) [0x556abacefab2]", "(MDSIOContextBase::complete(int)+0x51c) [0x556abacf024c]", "(MDSLogContextBase::complete(int)+0x40) [0x556abacf0630]", "(Finisher::finisher_thread_entry()+0x195) [0x7f4f4a9e7265]", "/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f4f4a2d36db]", "clone()" ], "ceph_version": "16.2.7", "crash_id": "2022-01-11T09:44:37.364725Z_f2474b08-81e6-420a-8332-82dca32237bb", "entity_name": "mds.f6b66becb969a44093445d3ee66fe274f97a9dce", "os_id": "ubuntu", "os_name": "Ubuntu", "os_version": "18.04.6 LTS (Bionic Beaver)", "os_version_id": "18.04", "process_name": "ceph-mds", "stack_sig": "d0e130ed06fdb3167377559bd5f14974737198bc15d9dcaa0baaa296a5b9e5f5", "timestamp": "2022-01-11T09:44:37.364725Z", "utsname_machine": "x86_64", "utsname_release": "4.15.0-162-generic", "utsname_sysname": "Linux", "utsname_version": "#170-Ubuntu SMP Mon Oct 18 11:38:05 UTC 2021" }
Updated by Telemetry Bot about 2 years ago
Updated by Venky Shankar almost 2 years ago
- Project changed from RADOS to CephFS
- Target version set to v18.0.0
- Backport set to quincy, pacific
- Crash signature (v1) updated (diff)
- Component(FS) MDS added
Updated by Venky Shankar almost 2 years ago
- Category set to Correctness/Safety
- Assignee set to Venky Shankar
Updated by Venky Shankar over 1 year ago
This looks like a race between an unlink and openc (open w/ O_CREAT) in the MDS -- the unlink RPC projects the old and the new (stray) dentry linkages. The projected linkage for the old dentry would be a null dentry. The projected linkages are "popped" after journaling and sending the early reply to the client (Server::_unlink_local_finish()). An openc from another client after the early reply, will use the null dentry and project it with a new inode. At this point, thew old dentry is not null anymore, which trips the check in CDir::try_remove_dentries_for_stray():
-> Server::_unlink_local_finish() -> MDCache::notify_stray() -> StrayManager::eval_stray() -> StrayManager::_eval_stray() -> MDCache::clear_dirty_bits_for_stray() -> CDir::try_remove_dentries_for_stray() -> ceph_assert(dn->get_linkage()->is_null())
Updated by Venky Shankar over 1 year ago
I think https://github.com/ceph/ceph/pull/46331 would mitigate this issue, however, the unlink and openc are from different clients in this case.
Updated by Venky Shankar over 1 year ago
- Status changed from New to Closed
Venky Shankar wrote:
I think https://github.com/ceph/ceph/pull/46331 would mitigate this issue, however, the unlink and openc are from different clients in this case.
PR merged.