Project

General

Profile

Bug #49617

mds: race of fetching large dirfrag

Added by Erqi Chen about 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
pacific,octopus,nautilus
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
task(medium)
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When a dirfrag contains more than 'mds_dir_keys_per_op' items, MDS needs to send multiple 'omap-get-vals' requests to fetch the dirfrag completely.

There is a race if MDS commits the dirfrag in the middle of these 'omap-get-vals' requests. For example:
- MDS fetches a dirfrag, sending 'omap-get-vals' request to osd.
- MDS commits the dirfrag, removing a key that corresponds to null dentry 'X'.
- MDS got omap-get-vals reply. The returned omap is not complete, but contains kv that corresponds to dentry 'X'. MDS send 'omap-get-vals'request to fetch the rest omap.
- dirfrag is committed. MDS marks null dentry 'X' clean and removes it from its cache.
- MDS got omap-get-vals reply. Now the returned omap is complete. MDS calls CDir::omap_fetched(), re-adds dentry 'X' to its cache.

The fix can be re-fetch from the beginning if dirfrag get committed in the middle of omap-get-vals requests.


Related issues

Copied to CephFS - Backport #49851: octopus: mds: race of fetching large dirfrag Resolved
Copied to CephFS - Backport #49852: pacific: mds: race of fetching large dirfrag Resolved
Copied to CephFS - Backport #49853: nautilus: mds: race of fetching large dirfrag Resolved

History

#1 Updated by Patrick Donnelly about 3 years ago

  • Status changed from New to Triaged
  • Assignee set to Xiubo Li
  • Target version set to v17.0.0
  • Source set to Community (dev)
  • Backport set to pacific,octopus,nautilus
  • Labels (FS) task(medium) added

#2 Updated by Zheng Yan about 3 years ago

  • Pull request ID set to 49617

#3 Updated by Zheng Yan about 3 years ago

  • Status changed from Triaged to Fix Under Review

#4 Updated by Zheng Yan about 3 years ago

  • Pull request ID changed from 49617 to 39848

#5 Updated by Patrick Donnelly about 3 years ago

  • Assignee changed from Xiubo Li to Erqi Chen

#6 Updated by Patrick Donnelly about 3 years ago

  • Status changed from Fix Under Review to Pending Backport

#7 Updated by Backport Bot about 3 years ago

  • Copied to Backport #49851: octopus: mds: race of fetching large dirfrag added

#8 Updated by Backport Bot about 3 years ago

  • Copied to Backport #49852: pacific: mds: race of fetching large dirfrag added

#9 Updated by Backport Bot about 3 years ago

  • Copied to Backport #49853: nautilus: mds: race of fetching large dirfrag added

#10 Updated by Loïc Dachary almost 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF