Project

General

Profile

Bug #49507

qa: mds removed because trimming for too long with valgrind

Added by Patrick Donnelly about 2 months ago. Updated 19 days ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-02-26T14:10:20.478 INFO:tasks.ceph.mds.e.smithi105.stderr:==00:01:07:23.378 37140== execve(0x18546740(/proc/self/exe), 0x18546670, 0x133e13c0) failed, errno 2
2021-02-26T14:10:20.478 INFO:tasks.ceph.mds.e.smithi105.stderr:==00:01:07:23.378 37140== EXEC FAILED: I can't recover from execve() failing, so I'm dying.
2021-02-26T14:10:20.479 INFO:tasks.ceph.mds.e.smithi105.stderr:==00:01:07:23.378 37140== Add more stringent tests in PRE(sys_execve), or work out how to recover.

From: /ceph/teuthology-archive/pdonnell-2021-02-26_05:55:31-fs-wip-pdonnell-testing-20210226.035303-distro-basic-smithi/5916017/teuthology.log

MDS calls execve because mons removed it due to missed heartbeats. MDS was spending a long time trimming:

2021-02-26T14:09:58.938+0000 1854b700  7 mds.0.cache trim_lru trimming 8253 items from LRU size=8252 mid=5114 pintail=14 pinned=946
...
2021-02-26T14:10:19.478+0000 1854b700  7 mds.0.cache trim_lru trimmed 8246 items

We have the MDS heartbeat config only specified for the MDS which is why the mons were using the default of 15s heartbeat grace.


Related issues

Copied to CephFS - Backport #49610: pacific: qa: mds removed because trimming for too long with valgrind Resolved

History

#1 Updated by Patrick Donnelly about 2 months ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 39724

#2 Updated by Patrick Donnelly about 2 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport changed from pacific,octopus to pacific

#3 Updated by Backport Bot about 2 months ago

  • Copied to Backport #49610: pacific: qa: mds removed because trimming for too long with valgrind added

#4 Updated by Loïc Dachary 19 days ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF