Actions
Bug #49507
closedqa: mds removed because trimming for too long with valgrind
% Done:
0%
Source:
Q/A
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2021-02-26T14:10:20.478 INFO:tasks.ceph.mds.e.smithi105.stderr:==00:01:07:23.378 37140== execve(0x18546740(/proc/self/exe), 0x18546670, 0x133e13c0) failed, errno 2 2021-02-26T14:10:20.478 INFO:tasks.ceph.mds.e.smithi105.stderr:==00:01:07:23.378 37140== EXEC FAILED: I can't recover from execve() failing, so I'm dying. 2021-02-26T14:10:20.479 INFO:tasks.ceph.mds.e.smithi105.stderr:==00:01:07:23.378 37140== Add more stringent tests in PRE(sys_execve), or work out how to recover.
From: /ceph/teuthology-archive/pdonnell-2021-02-26_05:55:31-fs-wip-pdonnell-testing-20210226.035303-distro-basic-smithi/5916017/teuthology.log
MDS calls execve because mons removed it due to missed heartbeats. MDS was spending a long time trimming:
2021-02-26T14:09:58.938+0000 1854b700 7 mds.0.cache trim_lru trimming 8253 items from LRU size=8252 mid=5114 pintail=14 pinned=946 ... 2021-02-26T14:10:19.478+0000 1854b700 7 mds.0.cache trim_lru trimmed 8246 items
We have the MDS heartbeat config only specified for the MDS which is why the mons were using the default of 15s heartbeat grace.
Actions