Bug #49617
mds: race of fetching large dirfrag
0%
Description
When a dirfrag contains more than 'mds_dir_keys_per_op' items, MDS needs to send multiple 'omap-get-vals' requests to fetch the dirfrag completely.
There is a race if MDS commits the dirfrag in the middle of these 'omap-get-vals' requests. For example:
- MDS fetches a dirfrag, sending 'omap-get-vals' request to osd.
- MDS commits the dirfrag, removing a key that corresponds to null dentry 'X'.
- MDS got omap-get-vals reply. The returned omap is not complete, but contains kv that corresponds to dentry 'X'. MDS send 'omap-get-vals'request to fetch the rest omap.
- dirfrag is committed. MDS marks null dentry 'X' clean and removes it from its cache.
- MDS got omap-get-vals reply. Now the returned omap is complete. MDS calls CDir::omap_fetched(), re-adds dentry 'X' to its cache.
The fix can be re-fetch from the beginning if dirfrag get committed in the middle of omap-get-vals requests.
Related issues
History
#1 Updated by Patrick Donnelly about 3 years ago
- Status changed from New to Triaged
- Assignee set to Xiubo Li
- Target version set to v17.0.0
- Source set to Community (dev)
- Backport set to pacific,octopus,nautilus
- Labels (FS) task(medium) added
#2 Updated by Zheng Yan about 3 years ago
- Pull request ID set to 49617
#3 Updated by Zheng Yan about 3 years ago
- Status changed from Triaged to Fix Under Review
#4 Updated by Zheng Yan about 3 years ago
- Pull request ID changed from 49617 to 39848
#5 Updated by Patrick Donnelly about 3 years ago
- Assignee changed from Xiubo Li to Erqi Chen
#6 Updated by Patrick Donnelly about 3 years ago
- Status changed from Fix Under Review to Pending Backport
#7 Updated by Backport Bot about 3 years ago
- Copied to Backport #49851: octopus: mds: race of fetching large dirfrag added
#8 Updated by Backport Bot about 3 years ago
- Copied to Backport #49852: pacific: mds: race of fetching large dirfrag added
#9 Updated by Backport Bot about 3 years ago
- Copied to Backport #49853: nautilus: mds: race of fetching large dirfrag added
#10 Updated by Loïc Dachary almost 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".