Bug #22626
closedmds: sessionmap version mismatch when replay esessions
0%
Description
We used ceph 10.2.10 and backported this PR: https://github.com/ceph/ceph/commit/a49726e10ef23be124d92872470fd258a1938d9e#diff-23bc98a965757649a7e2d936e1eb7092 long time ago. Our cluster had been running well under multi-MDS for a long time until we hit following crash recently and had to reset journal to start MDS again.
2018-01-07 15:45:21.353734 7f4e7e6d2700 10 mds.0.sessionmap _load_finish loaded version 450929328 2018-01-07 15:45:21.356874 7f4e7e6d2700 10 mds.0.sessionmap _load_finish: continue omap load from 'client.156699143' 2018-01-07 15:45:21.359749 7f4e7e6d2700 10 MDSIOContextBase::complete: 12C_IO_SM_Load 2018-01-07 15:45:21.360497 7f4e7e6d2700 10 mds.0.sessionmap _load_finish: omap load complete 2018-01-07 15:45:21.360528 7f4e7e6d2700 10 mds.0.sessionmap _load_finish: v 450929328, 1360 sessions ... 2018-01-07 15:45:24.134781 7f4e7c6ce700 10 mds.0.journal ESession.replay inotable 909867 < 909868 remove 2018-01-07 15:45:24.134783 7f4e7c6ce700 10 mds.0.inotable: replay_release_ids [1001a2e4dd7~134,1001a2f320f~1f5] 2018-01-07 15:45:24.134787 7f4e7c6ce700 10 mds.0.log _replay 8133887349887~198 / 8133887835152 2018-01-07 14:37:24.997607: ESession client.138079503 x.x.x.x:0/679872855 close cmapv 450928332 2018-01-07 15:45:24.134790 7f4e7c6ce700 10 mds.0.journal ESession.replay sessionmap 450929328 >= 450928332, noop 2018-01-07 15:45:24.134792 7f4e7c6ce700 10 mds.0.log _replay 8133887350105~198 / 8133887835152 2018-01-07 14:37:24.997614: ESession client.155696931 x.x.x.x:0/4292818376 close cmapv 450928333 2018-01-07 15:45:24.134796 7f4e7c6ce700 10 mds.0.journal ESession.replay sessionmap 450929328 >= 450928333, noop 2018-01-07 15:45:24.134797 7f4e7c6ce700 10 mds.0.log _replay 8133887350323~198 / 8133887835152 2018-01-07 14:37:24.997618: ESession client.155699679 x.x.x.x:0/375524559 close cmapv 450928334 2018-01-07 15:45:24.134801 7f4e7c6ce700 10 mds.0.journal ESession.replay sessionmap 450929328 >= 450928334, noop 2018-01-07 15:45:24.134803 7f4e7c6ce700 10 mds.0.log _replay 8133887350541~198 / 8133887835152 2018-01-07 14:37:24.997622: ESession client.156692858 x.x.x.x:0/383900987 close cmapv 450928335 2018-01-07 15:45:24.134810 7f4e7c6ce700 10 mds.0.journal ESession.replay sessionmap 450929328 >= 450928335, noop 2018-01-07 15:45:24.135005 7f4e7c6ce700 10 mds.0.log _replay 8133887350759~155947 / 8133887835152 2018-01-07 14:37:25.553532: ESessions 1019 opens cmapv 450929354 2018-01-07 15:45:24.135010 7f4e7c6ce700 10 mds.0.journal ESessions.replay sessionmap 450929328 < 450929354 2018-01-07 15:45:24.135273 7f4e7c6ce700 10 mds.0.journal ESessions.replay after open_sessions sessionmap 450930347 cmapv 450929354 2018-01-07 15:45:24.136229 7f4e7c6ce700 -1 mds/journal.cc: In function 'virtual void ESessions::replay(MDSRank*)' thread 7f4e7c6ce700 time 2018-01-07 15:45:24.135277 mds/journal.cc: 1850: FAILED assert(mds->sessionmap.get_version() == cmapv) ceph version 10.2.10-102-g0b468fc (0b468fcd815759a473385384686d6a3ee6063f41) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f4e8a3301ab] 2: (ESessions::replay(MDSRank*)+0x9f6) [0x7f4e8a210b66] 3: (MDLog::_replay_thread()+0x5df) [0x7f4e8a1a6eaf] 4: (MDLog::ReplayThread::entry()+0xd) [0x7f4e89f6ed4d] 5: (()+0x7df3) [0x7f4e8912adf3] 6: (clone()+0x6d) [0x7f4e87bf81bd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
We have seen this crash before which was the sessionmap version only increased once after open_sessions. That is why we backported above PR into 10.2.x. But right now as seen above, sessionmap version after open_sessions is larger than cmapv.
Updated by Patrick Donnelly over 6 years ago
- Status changed from New to Rejected
Zhang, we are not accepting bugs for multimds clusters on jewel. You can still seek help/advice on ceph-users if you like.
We would recommend upgrading to Luminous if that is possible.
Updated by Patrick Donnelly about 5 years ago
- Category deleted (
90) - Labels (FS) multimds added