Project

General

Profile

Actions

Bug #49617

closed

mds: race of fetching large dirfrag

Added by Erqi Chen about 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
pacific,octopus,nautilus
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
task(medium)
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When a dirfrag contains more than 'mds_dir_keys_per_op' items, MDS needs to send multiple 'omap-get-vals' requests to fetch the dirfrag completely.

There is a race if MDS commits the dirfrag in the middle of these 'omap-get-vals' requests. For example:
- MDS fetches a dirfrag, sending 'omap-get-vals' request to osd.
- MDS commits the dirfrag, removing a key that corresponds to null dentry 'X'.
- MDS got omap-get-vals reply. The returned omap is not complete, but contains kv that corresponds to dentry 'X'. MDS send 'omap-get-vals'request to fetch the rest omap.
- dirfrag is committed. MDS marks null dentry 'X' clean and removes it from its cache.
- MDS got omap-get-vals reply. Now the returned omap is complete. MDS calls CDir::omap_fetched(), re-adds dentry 'X' to its cache.

The fix can be re-fetch from the beginning if dirfrag get committed in the middle of omap-get-vals requests.


Related issues 3 (0 open3 closed)

Copied to CephFS - Backport #49851: octopus: mds: race of fetching large dirfragResolvedNathan CutlerActions
Copied to CephFS - Backport #49852: pacific: mds: race of fetching large dirfragResolvedsinguliere _Actions
Copied to CephFS - Backport #49853: nautilus: mds: race of fetching large dirfragResolvedNathan CutlerActions
Actions

Also available in: Atom PDF