Actions
Bug #56633
openmds: crash during construction of internal request
% Done:
100%
Source:
Support
Tags:
Backport:
quincy,pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Jul 07 18:05:32 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: debug 2022-07-07T23:05:32.021+0000 7fee4122f700 0 mds.1.cache discover_reply not yet active(|still rejoining), delaying Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: /builddir/build/BUILD/ceph-16.2.7/src/mds/MDCache.cc: In function 'MDRequestRef MDCache::request_start_internal(int)' thread 7fee3d227700 time 2022-07-07T23:05:47.047615+0000 Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: /builddir/build/BUILD/ceph-16.2.7/src/mds/MDCache.cc: 9466: FAILED ceph_assert(active_requests.count(mdr->reqid) == 0) Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: ceph version 16.2.7-98.el8cp (b20d33c3b301e005bed203d3cad7245da3549f80) pacific (stable) Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7fee49c5acfe] Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: 2: /usr/lib64/ceph/libceph-common.so.2(+0x276f18) [0x7fee49c5af18] Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: 3: (MDCache::request_start_internal(int)+0x25d) [0x555e9811f1ad] Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: 4: (Migrator::export_dir(CDir*, int)+0xf8b) [0x555e982053fb] Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: 5: (Migrator::export_empty_import(CDir*)+0x6fa) [0x555e982062ba] Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: 6: (MDCache::trim(unsigned long)+0x390) [0x555e98116ab0] Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: 7: (MDCache::upkeep_main()+0x8aa) [0x555e9814bc7a] Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: 8: /lib64/libstdc++.so.6(+0xc2ba3) [0x7fee4807aba3] Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: 9: /lib64/libpthread.so.0(+0x817a) [0x7fee48c3e17a] Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: 10: clone() Jul 07 18:05:47 lwtxe04hpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04hpapd1i-lldiqm[3746615]: *** Caught signal (Aborted) **
Unfortunately, that's all the debugging we have, presently.
Updated by Greg Farnum over 1 year ago
- Assignee set to Xiubo Li
Xiubo volunteered yesterday and said he's started work on this in standup today.
Updated by Xiubo Li over 1 year ago
- Status changed from New to Need More Info
Locally I couldn't reproduce it. And by reading the code I couldn't figure out in which case will the internal requests will conflict except that the seq number is overflowed and there has one old request get stuck, which seems impossible.
So for now add more debug log to get to know which two internal requests will conflict in [1].
Updated by Xiubo Li over 1 year ago
- Copied to Bug #57044: mds: add some debug logs for "crash during construction of internal request" added
Actions