Project

General

Profile

Actions

Bug #1417

closed

mds: failed assert on xlock

Added by Greg Farnum over 12 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

mds/SimpleLock.h: 494: FAILED assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE || is_locallock() || state == LOCK_LOCK)
 ceph version 0.33-205-geb8925a (commit:eb8925a730e735624562dad67894dc373079b934)
 1: (Locker::xlock_finish(SimpleLock*, Mutation*, bool*)+0x5f4) [0x5e1624]
 2: (Locker::_drop_non_rdlocks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x5d) [0x5e95cd]
 3: (Locker::drop_non_rdlocks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x51) [0x5e9811]
 4: (Server::reply_request(MDRequest*, MClientReply*, CInode*, CDentry*)+0x139) [0x4fdc69]
 5: (C_MDS_inode_update_finish::finish(int)+0x1dc) [0x53ae2c]
 6: (Context::complete(int)+0xa) [0x48e64a]
 7: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xda) [0x6d8eca]
 8: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x20e) [0x6d006e]
 9: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x9a3) [0x6b1f63]
 10: (MDS::handle_core_message(Message*)+0x85f) [0x4ad5cf]
 11: (MDS::_dispatch(Message*)+0x2c) [0x4ad66c]
 12: (MDS::ms_dispatch(Message*)+0x71) [0x4aeef1]
 13: (SimpleMessenger::dispatch_entry()+0x879) [0x709629]
 14: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x48872c]
 15: (()+0x7971) [0x7fb1097cb971]
 16: (clone()+0x6d) [0x7fb10825f92d]

It looks to be a problem with roots earlier in an incorrect switch from xlockdone to prexlock, I think.

Actions #1

Updated by Greg Farnum over 12 years ago

Okay:
1) dispatch client1 request, gets xlock on filelock (lock_xlock)
2) early_reply to client1 request, which calls set_xlock_done on filelock (lock_xlock_done)
3) dispatch client2 request, try to get ifile xlock (lock_xlock_done -> lock_lock_xlock)
4) Wait on client2 request, since we can't get xlock
...inconsequential bits...
5) request_finish client1 request
6) put_xlock on ifile: ASSERT because lock is in disallowed state

Obviously we can fix the assert by just adding lock_lock_xlock to the allowed state. But that is super icky, since client1 still really has the xlock but client2 is allowed to change its state because client1 has half put it away. I'm not quite sure why there's this split to begin with, though. Will investigate and discuss.

Actions #2

Updated by Greg Farnum over 12 years ago

  • Status changed from New to 7

Testing that fix I worked out with Sage.

Actions #3

Updated by Greg Farnum over 12 years ago

  • Status changed from 7 to Resolved

Well, I hit a path_traverse bug instead. I'm going to mark this particular one as resolved unless it pops up again.

Actions #4

Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)
  • Target version deleted (v0.34)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Also available in: Atom PDF