Bug #13166
closedMDS: standby-replay does not change client_incarnation properly
0%
Description
2015-09-17 17:40:57.876236 7fbfb374f700 10 mds.b-s-a handle_mds_map: handling map as rank 0 2015-09-17 17:40:57.876251 7fbfb374f700 1 mds.0.0 handle_mds_map i am now mds.4109.0replaying mds.0.0 2015-09-17 17:40:57.876254 7fbfb374f700 1 mds.0.0 handle_mds_map state change up:boot --> up:standby-replay 2015-09-17 17:40:57.876261 7fbfb374f700 10 mds.beacon.b-s-a set_want_state: up:standby -> up:standby-replay 2015-09-17 17:40:57.876265 7fbfb374f700 1 mds.0.0 replay_start
2015-09-17 17:41:23.463233 7fbfb374f700 5 mds.b-s-a handle_mds_map epoch 8 from mon.2 2015-09-17 17:41:23.463265 7fbfb374f700 10 mds.b-s-a my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table} 2015-09-17 17:41:23.463274 7fbfb374f700 10 mds.b-s-a mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table} 2015-09-17 17:41:23.463280 7fbfb374f700 10 mds.b-s-a peer mds gid 4116 removed from map 2015-09-17 17:41:23.463284 7fbfb374f700 1 -- 10.214.132.17:6812/28040 mark_down 10.214.134.104:6812/17763 -- pipe dne 2015-09-17 17:41:23.463293 7fbfb374f700 10 mds.b-s-a map says i am 10.214.132.17:6812/28040 mds.0.2 state up:replay 2015-09-17 17:41:23.463300 7fbfb374f700 10 mds.b-s-a handle_mds_map: handling map as rank 0 2015-09-17 17:41:23.463303 7fbfb374f700 1 mds.0.0 handle_mds_map i am now mds.0.0 2015-09-17 17:41:23.463306 7fbfb374f700 1 mds.0.0 handle_mds_map state change up:standby-replay --> up:replay 2015-09-17 17:41:23.463316 7fbfb374f700 10 mds.beacon.b-s-a set_want_state: up:standby-replay -> up:replay 2015-09-17 17:41:23.463324 7fbfb374f700 10 mds.0.0 Monitor activated us! Deactivating replay loop
I think we must have broken this handling when splitting up MDS into Rank and Daemon.
Updated by Greg Farnum over 8 years ago
- Subject changed from mds: damaged journal to MDS: standby-replay does not change client_incarnation properly
- Description updated (diff)
- Category set to 47
- Status changed from New to 12
Updated by Greg Farnum over 8 years ago
Hmm, I don't think we should actually be doing operations as mds.0.0 when we're a standby for the real mds.0.0 either! That is liable to confuse things as well.
Updated by Greg Farnum over 8 years ago
Obvious fix is to have MDSRank check the incarnation and update, but I want us to look more deeply at how the replaying works. I think we used to have different IDs entirely when replaying, rather than sending stuff off while pretending to be the active MDS. :/
Updated by Greg Farnum over 8 years ago
If we need more logs, I copied the standby MDS log to ubuntu-2015-09-17_16:55:52-fs-greg-fs-testing---basic-multi/1061724/ceph-mds.b-s-a.log.
Updated by Zheng Yan over 8 years ago
- Status changed from 12 to Fix Under Review
Updated by Greg Farnum over 8 years ago
Zheng, can you dig up a firefly test run and make sure the behavior of standby-replay daemons there is the same as it is with this branch? (In particular, the rank and invocation it's telling OSDs it is.)
Updated by Zheng Yan over 8 years ago
For firely, standby-replay MDS also uses 0 as client_incarnation, its ID is MDS.x.0.