Actions
Bug #49507
closedqa: mds removed because trimming for too long with valgrind
% Done:
0%
Source:
Q/A
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2021-02-26T14:10:20.478 INFO:tasks.ceph.mds.e.smithi105.stderr:==00:01:07:23.378 37140== execve(0x18546740(/proc/self/exe), 0x18546670, 0x133e13c0) failed, errno 2 2021-02-26T14:10:20.478 INFO:tasks.ceph.mds.e.smithi105.stderr:==00:01:07:23.378 37140== EXEC FAILED: I can't recover from execve() failing, so I'm dying. 2021-02-26T14:10:20.479 INFO:tasks.ceph.mds.e.smithi105.stderr:==00:01:07:23.378 37140== Add more stringent tests in PRE(sys_execve), or work out how to recover.
From: /ceph/teuthology-archive/pdonnell-2021-02-26_05:55:31-fs-wip-pdonnell-testing-20210226.035303-distro-basic-smithi/5916017/teuthology.log
MDS calls execve because mons removed it due to missed heartbeats. MDS was spending a long time trimming:
2021-02-26T14:09:58.938+0000 1854b700 7 mds.0.cache trim_lru trimming 8253 items from LRU size=8252 mid=5114 pintail=14 pinned=946 ... 2021-02-26T14:10:19.478+0000 1854b700 7 mds.0.cache trim_lru trimmed 8246 items
We have the MDS heartbeat config only specified for the MDS which is why the mons were using the default of 15s heartbeat grace.
Updated by Patrick Donnelly about 3 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 39724
Updated by Patrick Donnelly about 3 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport changed from pacific,octopus to pacific
Updated by Backport Bot about 3 years ago
- Copied to Backport #49610: pacific: qa: mds removed because trimming for too long with valgrind added
Updated by Loïc Dachary about 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
Actions