Bug #45835
closedmds: OpenFileTable::prefetch_inodes during rejoin can cause out-of-memory
0%
Description
We just upgraded from mimic v13.2.6 to nautilus v14.2.9 and the single active MDS was going out-of-memory during the rejoin step.
MDS cache size was 4GB, but the OOM occured on 16GB and even 32GB VMs.
`mds_wipe_sessions = true` did not help.
To recover the cluster I did `rados -p cephfs_metadata_pool rm mds0_openfiles.0` then the MDS could activate quickly.
Updated by Patrick Donnelly almost 4 years ago
- Subject changed from OpenFileTable::prefetch_inodes during rejoin can cause out-of-memory to mds: OpenFileTable::prefetch_inodes during rejoin can cause out-of-memory
- Status changed from New to Triaged
- Assignee set to Zheng Yan
- Target version set to v16.0.0
- Source set to Community (dev)
- Backport set to octopus,nautilus
Updated by Zheng Yan almost 4 years ago
- Status changed from Triaged to Fix Under Review
- Pull request ID set to 36089
Updated by Dan van der Ster over 3 years ago
The fix was merged. Something needed to start the backports process?
Updated by Patrick Donnelly over 3 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler over 3 years ago
- Copied to Backport #47608: octopus: mds: OpenFileTable::prefetch_inodes during rejoin can cause out-of-memory added
Updated by Nathan Cutler over 3 years ago
- Copied to Backport #47609: nautilus: mds: OpenFileTable::prefetch_inodes during rejoin can cause out-of-memory added
Updated by Nathan Cutler over 3 years ago
Dan van der Ster wrote:
The fix was merged. Something needed to start the backports process?
@Dan, the "backporting process" has started, but it might take some time to finish, because the changeset looks pretty challenging to backport.
Updated by Loïc Dachary almost 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".