Bug #36035
closedmds: MDCache.cc: 11673: abort()
0%
Description
2018-09-16T08:41:01.302 INFO:tasks.ceph.mds.i.smithi155.stderr:/build/ceph-14.0.0-3252-g561ad6d/src/mds/MDCache.cc: 11673: abort() 2018-09-16T08:41:01.303 INFO:tasks.ceph.mds.i.smithi155.stderr: 2018-09-16T08:41:01.303 INFO:tasks.ceph.mds.i.smithi155.stderr: ceph version 14.0.0-3252-g561ad6d (561ad6d7a7950727f2a31290c28698fcd1355c37) nautilus (dev) 2018-09-16T08:41:01.303 INFO:tasks.ceph.mds.i.smithi155.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x82) [0x7f8ea4ac8140] 2018-09-16T08:41:01.303 INFO:tasks.ceph.mds.i.smithi155.stderr: 2: (MDCache::handle_fragment_notify(boost::intrusive_ptr<MMDSFragmentNotify const> const&)+0x380) [0x5d1a90] 2018-09-16T08:41:01.303 INFO:tasks.ceph.mds.i.smithi155.stderr: 3: (MDCache::dispatch(boost::intrusive_ptr<Message const> const&)+0x147) [0x5f75e7] 2018-09-16T08:41:01.303 INFO:tasks.ceph.mds.i.smithi155.stderr: 4: (MDSRank::handle_deferrable_message(boost::intrusive_ptr<Message const> const&)+0x171) [0x4df551] 2018-09-16T08:41:01.303 INFO:tasks.ceph.mds.i.smithi155.stderr: 5: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x68b) [0x4e940b] 2018-09-16T08:41:01.303 INFO:tasks.ceph.mds.i.smithi155.stderr: 6: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x15) [0x4e9bd5] 2018-09-16T08:41:01.303 INFO:tasks.ceph.mds.i.smithi155.stderr: 7: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xff) [0x4d727f] 2018-09-16T08:41:01.303 INFO:tasks.ceph.mds.i.smithi155.stderr: 8: (DispatchQueue::entry()+0xe6a) [0x7f8ea4ca6d7a] 2018-09-16T08:41:01.303 INFO:tasks.ceph.mds.i.smithi155.stderr: 9: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f8ea4d3da1d] 2018-09-16T08:41:01.303 INFO:tasks.ceph.mds.i.smithi155.stderr: 10: (()+0x76ba) [0x7f8ea438d6ba] 2018-09-16T08:41:01.304 INFO:tasks.ceph.mds.i.smithi155.stderr: 11: (clone()+0x6d) [0x7f8ea3bb641d]
From: /ceph/teuthology-archive/pdonnell-2018-09-13_04:59:57-multimds-wip-pdonnell-testing-20180913.024004-distro-basic-smithi/3014469/teuthology.log
No coredumps/logs available unfortunately.
Updated by Patrick Donnelly over 5 years ago
Another: /ceph/teuthology-archive/pdonnell-2018-10-09_01:07:48-multimds-wip-pdonnell-testing-20181008.224656-distro-basic-smithi/3119047/teuthology.log
Updated by Zheng Yan over 5 years ago
I reproduce this locally.
Dirfrag A is subtree root, its parent inode is indoe A. Auth mds of dirfrag A is mds.a. auth mds of inode A is mds.b. dirfrag A and inode A are replicated to mds.c. Following sequence of events can trigger the crash.
1. mds.a finishes fragmenting dirfrag A. It send fragment_notify to mds.c.
2. mds.b wants to readlock fragtreelock of inode A, it sends lock(a=sync idft...) to mds.a (lock state is mix->sync)
3. mds.a receive the lock sync message, send lock(a=syncack idft...) to mds.b
4. mds.b receive the lock synack message, it calls Locker::scatter_writebehind()
5. mds.b sends lock(a=sync idft...) to mds.a and mds.c (tiggered by scatter_writebehind_finish())
6. mds.c receives the lock ack message from mds.b. fragtreelock state of inode A becomes to SYNC state
7. mds.c trim inode A
8. mds.c receives the fragment_notify message (sent at the first step)
Updated by Zheng Yan over 5 years ago
- Status changed from New to Fix Under Review
Updated by Patrick Donnelly over 5 years ago
In Mimic: /ceph/teuthology-archive/yuriw-2018-10-18_15:37:57-multimds-wip-yuri4-testing-2018-10-17-2308-mimic-testing-basic-smithi/3158009/teuthology.log
Updated by Patrick Donnelly over 5 years ago
- Status changed from Fix Under Review to Pending Backport
- Assignee set to Zheng Yan
- Pull request ID set to 24580
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #37480: mimic: mds: MDCache.cc: 11673: abort() added
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #37481: luminous: mds: MDCache.cc: 11673: abort() added
Updated by Nathan Cutler about 5 years ago
- Status changed from Pending Backport to Resolved