Project

General

Profile

Actions

Bug #49507

closed

qa: mds removed because trimming for too long with valgrind

Added by Patrick Donnelly about 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-02-26T14:10:20.478 INFO:tasks.ceph.mds.e.smithi105.stderr:==00:01:07:23.378 37140== execve(0x18546740(/proc/self/exe), 0x18546670, 0x133e13c0) failed, errno 2
2021-02-26T14:10:20.478 INFO:tasks.ceph.mds.e.smithi105.stderr:==00:01:07:23.378 37140== EXEC FAILED: I can't recover from execve() failing, so I'm dying.
2021-02-26T14:10:20.479 INFO:tasks.ceph.mds.e.smithi105.stderr:==00:01:07:23.378 37140== Add more stringent tests in PRE(sys_execve), or work out how to recover.

From: /ceph/teuthology-archive/pdonnell-2021-02-26_05:55:31-fs-wip-pdonnell-testing-20210226.035303-distro-basic-smithi/5916017/teuthology.log

MDS calls execve because mons removed it due to missed heartbeats. MDS was spending a long time trimming:

2021-02-26T14:09:58.938+0000 1854b700  7 mds.0.cache trim_lru trimming 8253 items from LRU size=8252 mid=5114 pintail=14 pinned=946
...
2021-02-26T14:10:19.478+0000 1854b700  7 mds.0.cache trim_lru trimmed 8246 items

We have the MDS heartbeat config only specified for the MDS which is why the mons were using the default of 15s heartbeat grace.


Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #49610: pacific: qa: mds removed because trimming for too long with valgrindResolvedPatrick DonnellyActions
Actions #1

Updated by Patrick Donnelly about 3 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 39724
Actions #2

Updated by Patrick Donnelly about 3 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport changed from pacific,octopus to pacific
Actions #3

Updated by Backport Bot about 3 years ago

  • Copied to Backport #49610: pacific: qa: mds removed because trimming for too long with valgrind added
Actions #4

Updated by Loïc Dachary about 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF