Bug #1417
closedmds: failed assert on xlock
0%
Description
mds/SimpleLock.h: 494: FAILED assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE || is_locallock() || state == LOCK_LOCK) ceph version 0.33-205-geb8925a (commit:eb8925a730e735624562dad67894dc373079b934) 1: (Locker::xlock_finish(SimpleLock*, Mutation*, bool*)+0x5f4) [0x5e1624] 2: (Locker::_drop_non_rdlocks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x5d) [0x5e95cd] 3: (Locker::drop_non_rdlocks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x51) [0x5e9811] 4: (Server::reply_request(MDRequest*, MClientReply*, CInode*, CDentry*)+0x139) [0x4fdc69] 5: (C_MDS_inode_update_finish::finish(int)+0x1dc) [0x53ae2c] 6: (Context::complete(int)+0xa) [0x48e64a] 7: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xda) [0x6d8eca] 8: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x20e) [0x6d006e] 9: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x9a3) [0x6b1f63] 10: (MDS::handle_core_message(Message*)+0x85f) [0x4ad5cf] 11: (MDS::_dispatch(Message*)+0x2c) [0x4ad66c] 12: (MDS::ms_dispatch(Message*)+0x71) [0x4aeef1] 13: (SimpleMessenger::dispatch_entry()+0x879) [0x709629] 14: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x48872c] 15: (()+0x7971) [0x7fb1097cb971] 16: (clone()+0x6d) [0x7fb10825f92d]
It looks to be a problem with roots earlier in an incorrect switch from xlockdone to prexlock, I think.
Updated by Greg Farnum over 12 years ago
Okay:
1) dispatch client1 request, gets xlock on filelock (lock_xlock)
2) early_reply to client1 request, which calls set_xlock_done on filelock (lock_xlock_done)
3) dispatch client2 request, try to get ifile xlock (lock_xlock_done -> lock_lock_xlock)
4) Wait on client2 request, since we can't get xlock
...inconsequential bits...
5) request_finish client1 request
6) put_xlock on ifile: ASSERT because lock is in disallowed state
Obviously we can fix the assert by just adding lock_lock_xlock to the allowed state. But that is super icky, since client1 still really has the xlock but client2 is allowed to change its state because client1 has half put it away. I'm not quite sure why there's this split to begin with, though. Will investigate and discuss.
Updated by Greg Farnum over 12 years ago
- Status changed from New to 7
Testing that fix I worked out with Sage.
Updated by Greg Farnum over 12 years ago
- Status changed from 7 to Resolved
Well, I hit a path_traverse bug instead. I'm going to mark this particular one as resolved unless it pops up again.
Updated by John Spray over 7 years ago
- Project changed from Ceph to CephFS
- Category deleted (
1) - Target version deleted (
v0.34)
Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.