Actions
Bug #35848
closedMDSMonitor: lookup of gid in prepare_beacon that has been removed will cause exception
Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Correctness/Safety
Target version:
% Done:
0%
Source:
other
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDSMonitor
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2018-09-07 06:28:53.829359 7fe856397700 1 mon.ceph-sshreeka-1536308179377-node14-monmgr@1(leader).mds e166 fail_mds_gid 4864 mds.ceph-sshreeka-1536308179377-node6-mds role 0 2018-09-07 06:28:53.829589 7fe856397700 5 mon.ceph-sshreeka-1536308179377-node14-monmgr@1(leader).mds e166 prepare_beacon pending map now: 2018-09-07 06:28:53.829601 7fe856397700 5 mon.ceph-sshreeka-1536308179377-node14-monmgr@1(leader).mds e166 preprocess_beacon mdsbeacon(4885/ceph-sshreeka-1536308179377-node6-mds up:boot seq 15 v166) v7 from mds.? 172.16.115.21:6800/3695643259 compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2} 2018-09-07 06:28:53.829639 7fe856397700 5 mon.ceph-sshreeka-1536308179377-node14-monmgr@1(leader).mds e166 preprocess_beacon mdsbeacon(4885/ceph-sshreeka-1536308179377-node6-mds up:boot seq 16 v166) v7 from mds.? 172.16.115.21:6800/3695643259 compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2} 2018-09-07 06:28:53.829658 7fe856397700 5 mon.ceph-sshreeka-1536308179377-node14-monmgr@1(leader).mds e166 preprocess_beacon mdsbeacon(4885/ceph-sshreeka-1536308179377-node6-mds up:boot seq 17 v166) v7 from mds.? 172.16.115.21:6800/3695643259 compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2} 2018-09-07 06:28:53.829667 7fe856397700 5 mon.ceph-sshreeka-1536308179377-node14-monmgr@1(leader).mds e166 preprocess_beacon mdsbeacon(4885/ceph-sshreeka-1536308179377-node6-mds up:boot seq 2 v166) v7 from mds.? 172.16.115.21:6800/3695643259 compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2} 2018-09-07 06:28:53.829677 7fe856397700 5 mon.ceph-sshreeka-1536308179377-node14-monmgr@1(leader).mds e166 preprocess_beacon mdsbeacon(4885/ceph-sshreeka-1536308179377-node6-mds up:boot seq 3 v166) v7 from mds.? 172.16.115.21:6800/3695643259 compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2} 2018-09-07 06:28:53.829694 7fe856397700 5 mon.ceph-sshreeka-1536308179377-node14-monmgr@1(leader).mds e166 preprocess_beacon mdsbeacon(4885/ceph-sshreeka-1536308179377-node6-mds up:boot seq 4 v166) v7 from mds.? 172.16.115.21:6800/3695643259 compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2} 2018-09-07 06:28:53.832094 7fe856397700 4 mon.ceph-sshreeka-1536308179377-node14-monmgr@1(leader).mds e166 filesystem_command prefix='mds fail' 2018-09-07 06:28:53.832105 7fe856397700 1 mon.ceph-sshreeka-1536308179377-node14-monmgr@1(leader).mds e166 gid_from_arg: rank/GID 0 not a existent rank or GID 2018-09-07 06:28:53.832107 7fe856397700 4 mon.ceph-sshreeka-1536308179377-node14-monmgr@1(leader).mds e166 prepare_command done, r=0 2018-09-07 06:28:53.832138 7fe856397700 5 mon.ceph-sshreeka-1536308179377-node14-monmgr@1(leader).mds e166 preprocess_beacon mdsbeacon(4864/ceph-sshreeka-1536308179377-node6-mds down:damaged seq 9 v166) v7 from mds.0 172.16.115.21:6800/2937610127 compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2} 2018-09-07 06:28:53.832158 7fe856397700 5 mon.ceph-sshreeka-1536308179377-node14-monmgr@1(leader).mds e166 _note_beacon mdsbeacon(4864/ceph-sshreeka-1536308179377-node6-mds down:damaged seq 9 v166) v7 noting time 2018-09-07 06:28:53.839276 7fe856397700 -1 *** Caught signal (Aborted) ** in thread 7fe856397700 thread_name:fn_monstore ceph version 12.2.4-42.1.hotfix.nvidia.el7cp (4a72ecd06cdc5a049945b166073ce39fbe631308) luminous (stable) 1: (()+0x931071) [0x5611702d4071] 2: (()+0xf680) [0x7fe86390e680] 3: (gsignal()+0x37) [0x7fe860c49207] 4: (abort()+0x148) [0x7fe860c4a8f8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fe8615587d5] 6: (()+0x5e746) [0x7fe861556746] 7: (()+0x5e773) [0x7fe861556773] 8: (()+0x5e993) [0x7fe861556993] 9: (std::__throw_out_of_range(char const*)+0x77) [0x7fe8615ab857] 10: (FSMap::get_info_gid(mds_gid_t) const+0xfc) [0x56116ff5e1ac] 11: (MDSMonitor::prepare_beacon(boost::intrusive_ptr<MonOpRequest>)+0x77d) [0x56116ff5190d] 12: (MDSMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>)+0x257) [0x56116ff58d97] 13: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0xaf8) [0x56116feb49d8] 14: (PaxosService::C_RetryMessage::_finish(int)+0x5e) [0x56116fdee3fe] 15: (Context::complete(int)+0x9) [0x56116fd9b7b9] 16: (void finish_contexts<Context>(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xac) [0x56116fda514c] 17: (Paxos::finish_round()+0x11e) [0x56116fea5a0e] 18: (Paxos::commit_finish()+0x71d) [0x56116fea6b0d] 19: (C_Committed::finish(int)+0x31) [0x56116feae961] 20: (Context::complete(int)+0x9) [0x56116fd9b7b9] 21: (MonitorDBStore::C_DoTransaction::finish(int)+0xa7) [0x56116feadb57] 22: (Context::complete(int)+0x9) [0x56116fd9b7b9] 23: (Finisher::finisher_thread_entry()+0x198) [0x56116ffd0558] 24: (()+0x7dd5) [0x7fe863906dd5] 25: (clone()+0x6d) [0x7fe860d11b3d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Problem introduced by this change: https://github.com/ceph/ceph/commit/624efc64323f99b2e843f376879c1080276e036f#diff-6c4f848e4bb0fe57e9c0f9bc67b14beaL354
The beacons are no longer dropped if the gid was removed from the pending_fsmap. We need to do a new check in prepare_beacon which operates on pending_fsmap.
Actions