Bug #1425
closedmds: stuck in prexlock
0%
Description
See mds.a.log on sepia78.
- setattr request starts locking
- auth_pins auth stuff
- rdlocks parent dirs, does not auto_pin (not ours)
- starts xlock on target inode (lock->prexlock gather)
- parent migrates onto our node
- gather completes, request goes through acquire_locks
- see's rdlock on parent, needs to auth_pin, ambiguous auth, drops locks + auth_pins
-> target is in prexlock (an unstable state), with associated auth_pin, but stays there.
The problem is that a transition to prexlock does not reliable result in someone holding it, or kicking it back to a stable state if it fails. All the other lock types are taken from a stable state, so this is not an issue. (Although they may leave them in a non-optimal state since we wouldn't eval() them in this scenario either.)
Maybe a taking_lock pointer in MDRequest that is eval()ed/cleaned up by drop_locks()? (That should capture the request_cancel() callers too...)