Project

General

Profile

Actions

Bug #56633

open

mds: crash during construction of internal request

Added by Patrick Donnelly over 1 year ago. Updated 7 months ago.

Status:
Need More Info
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
Support
Tags:
Backport:
quincy,pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Jul 07 18:05:32 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: debug 2022-07-07T23:05:32.021+0000 7fee4122f700  0 mds.1.cache discover_reply not yet active(|still rejoining), delaying
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: /builddir/build/BUILD/ceph-16.2.7/src/mds/MDCache.cc: In function 'MDRequestRef MDCache::request_start_internal(int)' thread 7fee3d227700 time 2022-07-07T23:05:47.047615+0000
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: /builddir/build/BUILD/ceph-16.2.7/src/mds/MDCache.cc: 9466: FAILED ceph_assert(active_requests.count(mdr->reqid) == 0)
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  ceph version 16.2.7-98.el8cp (b20d33c3b301e005bed203d3cad7245da3549f80) pacific (stable)
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7fee49c5acfe]
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  2: /usr/lib64/ceph/libceph-common.so.2(+0x276f18) [0x7fee49c5af18]
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  3: (MDCache::request_start_internal(int)+0x25d) [0x555e9811f1ad]
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  4: (Migrator::export_dir(CDir*, int)+0xf8b) [0x555e982053fb]
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  5: (Migrator::export_empty_import(CDir*)+0x6fa) [0x555e982062ba]
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  6: (MDCache::trim(unsigned long)+0x390) [0x555e98116ab0]
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  7: (MDCache::upkeep_main()+0x8aa) [0x555e9814bc7a]
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  8: /lib64/libstdc++.so.6(+0xc2ba3) [0x7fee4807aba3]
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  9: /lib64/libpthread.so.0(+0x817a) [0x7fee48c3e17a]
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]:  10: clone()
Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: *** Caught signal (Aborted) **

Unfortunately, that's all the debugging we have, presently.


Subtasks 1 (0 open1 closed)

Bug #57044: mds: add some debug logs for "crash during construction of internal request"ResolvedXiubo Li

Actions

Related issues 1 (0 open1 closed)

Copied to CephFS - Bug #57044: mds: add some debug logs for "crash during construction of internal request"ResolvedXiubo Li

Actions
Actions #1

Updated by Greg Farnum over 1 year ago

  • Assignee set to Xiubo Li

Xiubo volunteered yesterday and said he's started work on this in standup today.

Actions #2

Updated by Xiubo Li over 1 year ago

  • Status changed from New to Need More Info

Locally I couldn't reproduce it. And by reading the code I couldn't figure out in which case will the internal requests will conflict except that the seq number is overflowed and there has one old request get stuck, which seems impossible.

So for now add more debug log to get to know which two internal requests will conflict in [1].

[1] https://github.com/ceph/ceph/pull/47384

Actions #3

Updated by Xiubo Li over 1 year ago

  • Copied to Bug #57044: mds: add some debug logs for "crash during construction of internal request" added
Actions #4

Updated by Patrick Donnelly 7 months ago

  • Target version deleted (v18.0.0)
Actions

Also available in: Atom PDF