Bug #1425: mds: stuck in prexlock - CephFS - Ceph

Actions

Copy link

Bug #1425

closed

mds: stuck in prexlock

Added by Sage Weil over 12 years ago. Updated over 7 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Sage Weil

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

See mds.a.log on sepia78.

- setattr request starts locking
- auth_pins auth stuff
- rdlocks parent dirs, does not auto_pin (not ours)
- starts xlock on target inode (lock->prexlock gather)
- parent migrates onto our node
- gather completes, request goes through acquire_locks
- see's rdlock on parent, needs to auth_pin, ambiguous auth, drops locks + auth_pins
-> target is in prexlock (an unstable state), with associated auth_pin, but stays there.

The problem is that a transition to prexlock does not reliable result in someone holding it, or kicking it back to a stable state if it fails. All the other lock types are taken from a stable state, so this is not an issue. (Although they may leave them in a non-optimal state since we wouldn't eval() them in this scenario either.)

Maybe a taking_lock pointer in MDRequest that is eval()ed/cleaned up by drop_locks()? (That should capture the request_cancel() callers too...)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #1425

mds: stuck in prexlock

Updated by Sage Weil over 12 years ago

Updated by Sage Weil over 12 years ago

Updated by Sage Weil over 12 years ago

Updated by Sage Weil over 12 years ago

Updated by Sage Weil over 12 years ago

Updated by Sage Weil over 12 years ago

Updated by John Spray over 7 years ago