Project

General

Profile

Bug #56633

mds: crash during construction of internal request

Added by Patrick Donnelly 5 months ago. Updated 4 months ago.

Status:
Need More Info
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Support
Tags:
Backport:
quincy,pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Jul 07 18:05:32 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: debug 2022-07-07T23:05:32.021+0000 7fee4122f700  0 mds.1.cache discover_reply not yet active(|still rejoining), delaying
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: /builddir/build/BUILD/ceph-16.2.7/src/mds/MDCache.cc: In function 'MDRequestRef MDCache::request_start_internal(int)' thread 7fee3d227700 time 2022-07-07T23:05:47.047615+0000
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: /builddir/build/BUILD/ceph-16.2.7/src/mds/MDCache.cc: 9466: FAILED ceph_assert(active_requests.count(mdr->reqid) == 0)
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  ceph version 16.2.7-98.el8cp (b20d33c3b301e005bed203d3cad7245da3549f80) pacific (stable)
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7fee49c5acfe]
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  2: /usr/lib64/ceph/libceph-common.so.2(+0x276f18) [0x7fee49c5af18]
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  3: (MDCache::request_start_internal(int)+0x25d) [0x555e9811f1ad]
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  4: (Migrator::export_dir(CDir*, int)+0xf8b) [0x555e982053fb]
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  5: (Migrator::export_empty_import(CDir*)+0x6fa) [0x555e982062ba]
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  6: (MDCache::trim(unsigned long)+0x390) [0x555e98116ab0]
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  7: (MDCache::upkeep_main()+0x8aa) [0x555e9814bc7a]
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  8: /lib64/libstdc++.so.6(+0xc2ba3) [0x7fee4807aba3]
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  9: /lib64/libpthread.so.0(+0x817a) [0x7fee48c3e17a]
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  10: clone()
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: *** Caught signal (Aborted) **

Unfortunately, that's all the debugging we have, presently.


Subtasks

Bug #57044: mds: add some debug logs for "crash during construction of internal request"Fix Under ReviewXiubo Li


Related issues

Copied to CephFS - Bug #57044: mds: add some debug logs for "crash during construction of internal request" Fix Under Review

History

#1 Updated by Greg Farnum 4 months ago

  • Assignee set to Xiubo Li

Xiubo volunteered yesterday and said he's started work on this in standup today.

#2 Updated by Xiubo Li 4 months ago

  • Status changed from New to Need More Info

Locally I couldn't reproduce it. And by reading the code I couldn't figure out in which case will the internal requests will conflict except that the seq number is overflowed and there has one old request get stuck, which seems impossible.

So for now add more debug log to get to know which two internal requests will conflict in [1].

[1] https://github.com/ceph/ceph/pull/47384

#3 Updated by Xiubo Li 4 months ago

  • Copied to Bug #57044: mds: add some debug logs for "crash during construction of internal request" added

Also available in: Atom PDF