Project

General

Profile

Actions

Bug #45835

closed

mds: OpenFileTable::prefetch_inodes during rejoin can cause out-of-memory

Added by Dan van der Ster almost 4 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
octopus,nautilus
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We just upgraded from mimic v13.2.6 to nautilus v14.2.9 and the single active MDS was going out-of-memory during the rejoin step.
MDS cache size was 4GB, but the OOM occured on 16GB and even 32GB VMs.

`mds_wipe_sessions = true` did not help.

To recover the cluster I did `rados -p cephfs_metadata_pool rm mds0_openfiles.0` then the MDS could activate quickly.


Related issues 2 (0 open2 closed)

Copied to CephFS - Backport #47608: octopus: mds: OpenFileTable::prefetch_inodes during rejoin can cause out-of-memoryResolvedZheng YanActions
Copied to CephFS - Backport #47609: nautilus: mds: OpenFileTable::prefetch_inodes during rejoin can cause out-of-memoryRejectedActions
Actions #1

Updated by Patrick Donnelly almost 4 years ago

  • Subject changed from OpenFileTable::prefetch_inodes during rejoin can cause out-of-memory to mds: OpenFileTable::prefetch_inodes during rejoin can cause out-of-memory
  • Status changed from New to Triaged
  • Assignee set to Zheng Yan
  • Target version set to v16.0.0
  • Source set to Community (dev)
  • Backport set to octopus,nautilus
Actions #2

Updated by Zheng Yan almost 4 years ago

  • Status changed from Triaged to Fix Under Review
  • Pull request ID set to 36089
Actions #3

Updated by Dan van der Ster over 3 years ago

The fix was merged. Something needed to start the backports process?

Actions #4

Updated by Patrick Donnelly over 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #5

Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #47608: octopus: mds: OpenFileTable::prefetch_inodes during rejoin can cause out-of-memory added
Actions #6

Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #47609: nautilus: mds: OpenFileTable::prefetch_inodes during rejoin can cause out-of-memory added
Actions #7

Updated by Nathan Cutler over 3 years ago

Dan van der Ster wrote:

The fix was merged. Something needed to start the backports process?

@Dan, the "backporting process" has started, but it might take some time to finish, because the changeset looks pretty challenging to backport.

Actions #8

Updated by Loïc Dachary almost 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF