Bug #15399
MDS incarnation get lost after remove filesystem
0%
Description
If we remove a filesystem, then create new filesystem with old data/metadata pools. OSD may drop requests from MDS of new filesystem, because OSD thinks the requests are duplicated.
A related issue is that MDS in different filesystems can have same client ID. This is a little weird
Related issues
History
#1 Updated by John Spray almost 8 years ago
Here's a reproducer for the incarnation issue:
https://github.com/ceph/ceph-qa-suite/tree/wip-15399
I note that when OSDs construct their objecters they just use the OSDMap epoch as the incarnation. I wonder why the MDS has a per-rank incarnation counter at all? Perhaps we can just remove it and use the MDSMap epoch instead.
#2 Updated by Greg Farnum almost 8 years ago
- MDSes A and B come up during the same epoch
- A becomes active and B becomes standby
- A fails
- B starts replaying operations
Then B needs to have an incarnation which differs from A's. The OSDs are each a distinct daemon entity (unlike the MDSes).
There may be ways to simplify it, though!
#3 Updated by Greg Farnum almost 8 years ago
So we could probably reset our network connections with an incarnation based on the last MDSMap where our role changed...I think that should work; maybe it's what you meant.
#4 Updated by John Spray almost 8 years ago
We only do objecter->set_client_incarnation(incarnation); in MDSRank::init (after we've been assigned an active role)
So epoch should be sufficient (it's always incremented when a rank assignment has changed) as long as we remember to set it when a standby replay MDS is promoted (as well as when MDSRank is initialized).
#5 Updated by Zheng Yan almost 8 years ago
I think using the MDSMap epoch as the incarnation is good idea
#6 Updated by Greg Farnum almost 8 years ago
- Status changed from New to In Progress
- Assignee set to John Spray
For Jewel: https://github.com/ceph/ceph/pull/8484
But a more comprehensive one (that works with pools shared between FSes) is still in progress.
#7 Updated by John Spray almost 8 years ago
- Status changed from In Progress to Fix Under Review
#8 Updated by John Spray almost 8 years ago
- Status changed from Fix Under Review to Pending Backport
#9 Updated by Nathan Cutler almost 8 years ago
- Backport set to jewel
#10 Updated by Nathan Cutler almost 8 years ago
- Copied to Backport #15732: jewel: MDS incarnation get lost after remove filesystem added
#11 Updated by Greg Farnum almost 8 years ago
- Status changed from Pending Backport to Resolved