Bug #1417: mds: failed assert on xlock - CephFS - Ceph

Actions

Copy link

Bug #1417

closed

mds: failed assert on xlock

Added by Greg Farnum over 12 years ago. Updated over 7 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Greg Farnum

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

mds/SimpleLock.h: 494: FAILED assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE || is_locallock() || state == LOCK_LOCK)
 ceph version 0.33-205-geb8925a (commit:eb8925a730e735624562dad67894dc373079b934)
 1: (Locker::xlock_finish(SimpleLock*, Mutation*, bool*)+0x5f4) [0x5e1624]
 2: (Locker::_drop_non_rdlocks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x5d) [0x5e95cd]
 3: (Locker::drop_non_rdlocks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x51) [0x5e9811]
 4: (Server::reply_request(MDRequest*, MClientReply*, CInode*, CDentry*)+0x139) [0x4fdc69]
 5: (C_MDS_inode_update_finish::finish(int)+0x1dc) [0x53ae2c]
 6: (Context::complete(int)+0xa) [0x48e64a]
 7: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xda) [0x6d8eca]
 8: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x20e) [0x6d006e]
 9: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x9a3) [0x6b1f63]
 10: (MDS::handle_core_message(Message*)+0x85f) [0x4ad5cf]
 11: (MDS::_dispatch(Message*)+0x2c) [0x4ad66c]
 12: (MDS::ms_dispatch(Message*)+0x71) [0x4aeef1]
 13: (SimpleMessenger::dispatch_entry()+0x879) [0x709629]
 14: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x48872c]
 15: (()+0x7971) [0x7fb1097cb971]
 16: (clone()+0x6d) [0x7fb10825f92d]

It looks to be a problem with roots earlier in an incorrect switch from xlockdone to prexlock, I think.

Actions

Copy link

Updated by Greg Farnum over 12 years ago

Okay:
1) dispatch client1 request, gets xlock on filelock (lock_xlock)
2) early_reply to client1 request, which calls set_xlock_done on filelock (lock_xlock_done)
3) dispatch client2 request, try to get ifile xlock (lock_xlock_done -> lock_lock_xlock)
4) Wait on client2 request, since we can't get xlock
...inconsequential bits...
5) request_finish client1 request
6) put_xlock on ifile: ASSERT because lock is in disallowed state

Obviously we can fix the assert by just adding lock_lock_xlock to the allowed state. But that is super icky, since client1 still really has the xlock but client2 is allowed to change its state because client1 has half put it away. I'm not quite sure why there's this split to begin with, though. Will investigate and discuss.

Actions

Copy link