Bug #45835: mds: OpenFileTable::prefetch_inodes during rejoin can cause out-of-memory - CephFS - Ceph

Actions

Copy link

Bug #45835

closed

mds: OpenFileTable::prefetch_inodes during rejoin can cause out-of-memory

Added by Dan van der Ster almost 4 years ago. Updated almost 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Zheng Yan

Category:

Target version:

Ceph - v16.0.0

% Done:

Source:

Community (dev)

Tags:

Backport:

octopus,nautilus

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v14.2.9

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

36089

Crash signature (v1):

Crash signature (v2):

Description

We just upgraded from mimic v13.2.6 to nautilus v14.2.9 and the single active MDS was going out-of-memory during the rejoin step.
MDS cache size was 4GB, but the OOM occured on 16GB and even 32GB VMs.

`mds_wipe_sessions = true` did not help.

To recover the cluster I did `rados -p cephfs_metadata_pool rm mds0_openfiles.0` then the MDS could activate quickly.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Patrick Donnelly almost 4 years ago

Subject changed from OpenFileTable::prefetch_inodes during rejoin can cause out-of-memory to mds: OpenFileTable::prefetch_inodes during rejoin can cause out-of-memory
Status changed from New to Triaged
Assignee set to Zheng Yan
Target version set to v16.0.0
Source set to Community (dev)
Backport set to octopus,nautilus

Actions

Copy link

Updated by Zheng Yan almost 4 years ago

Status changed from Triaged to Fix Under Review
Pull request ID set to 36089

Actions

Copy link

Updated by Dan van der Ster over 3 years ago

The fix was merged. Something needed to start the backports process?

Actions

Copy link

Updated by Patrick Donnelly over 3 years ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

Updated by Nathan Cutler over 3 years ago

Copied to Backport #47608: octopus: mds: OpenFileTable::prefetch_inodes during rejoin can cause out-of-memory added

Actions

Copy link

Updated by Nathan Cutler over 3 years ago

Copied to Backport #47609: nautilus: mds: OpenFileTable::prefetch_inodes during rejoin can cause out-of-memory added

Actions

Copy link

Updated by Nathan Cutler over 3 years ago

Dan van der Ster wrote:

The fix was merged. Something needed to start the backports process?

@Dan, the "backporting process" has started, but it might take some time to finish, because the changeset looks pretty challenging to backport.

Actions

Copy link

Updated by Loïc Dachary almost 3 years ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #45835

mds: OpenFileTable::prefetch_inodes during rejoin can cause out-of-memory

Updated by Patrick Donnelly almost 4 years ago

Updated by Zheng Yan almost 4 years ago

Updated by Dan van der Ster over 3 years ago

Updated by Patrick Donnelly over 3 years ago

Updated by Nathan Cutler over 3 years ago

Updated by Nathan Cutler over 3 years ago

Updated by Nathan Cutler over 3 years ago

Updated by Loïc Dachary almost 3 years ago